But their comment is about 2 years out of date, and AI image gen has got exponentially better at text than when the models and LoRAs they mentioned were SOTA.
I agree we probably won't magically scale current techniques to AGI, but I also think the local maxima for creative output is going to be high enough that it changes how we approach it the way computers changed how we approach knowledge work.
I've been working on post-training models for tasks that require EQ, so it's validating to see OpenAI working towards that too.
That being said, this is very expensive.
- Input: $75.00 / 1M tokens
- Cached input: $37.50 / 1M tokens
- Output: $150.00 / 1M tokens
One of the most interesting applications of models with higher EQ is personalized content generation, but the size and cost here are at odds with that.
(You can click on the dice at the bottom to turn on D&D mode)
I've taken the approach of starting with the #1 problem with Gen AI for this application: that it writes bland prose with not much going on by default.
From there you can layer on systems that address things like object permanence, but even with a basic engine capable of generating fun to read pages of text I think you already get a pretty fun experience
It's a bit more open ended though, are the constraints on actions intentional (ie. they're predetermined), or is the model just adamant on picking from options provided
For this demo, the app architecture really depends on users sticking (more or less) to the scripted options – if they want to progress with the story. I’ve included something in the prompt to encourage that.
There are also some ‘hidden’ choices, though. For example, you can attack the merchant and the blacksmith. Those options aren’t enumerated by the GPT when it describes the scene, but they’re equally valid paths in the backend. (That gives me an opportunity to script some of the more popular transgressions.)
How did you set up Spellbound? Do you have one longer prompt, or did you split it up?
It writes content that's worth reading, but it's extremely expensive to run. It requires chain of thought, a RAG pipeline, self-revision and more.
I spent most of yesterday testing it and pushed it to beta, but the writing feels stilted and clearly LLM generated. The inflection point will come for content people actually want to read, but it's not going to be GPT-4o mini.
The point isn't to generate good content "that's worth reading". The point is to generate an endless stream of slop which looks plausible enough to get you ad impressions.
That's picking up pennies in front of a steam roller: Google is incentivized to punish you when the content is garbage, and people are disincentivized to share what you generate.
It's an entirely different game once you can generate useful content worth reading with AI. People will even pay you good money for it.
I don't know what makes you think that Google is incentivised to punish garbage. They certainly don't seem to mind seeing up an endless stream of slop for certain kinds of queries. I don't understand why they'd be more incentivised to serve SEO'd human generated slop than SEO'd machine generated slop.
If you actually look into the "SEO slop", they're constantly fighting a battle with Google to keep their place.
It's all garbage so no one notices when some of it suddenly disappears off the face of the earth and gets replaced with other garbage: but for the ones making it, their revenue essentially goes to 0 overnight.
I had a lot of fun with NovelAI. I believe at the time it was using GPT2, and I loaded in fine tuned models for the canon of choice I wanted to experience (trained on fanfic, and things of that sort).
Spellbound is an instruct model to NovelAI's completion model: you enter commands which in turn dictate what happens to your character, then the AI models how others would react to you
I'm working on a website [1] that's essentially "Choose your own adventure with AI NPCs" and I've found two things:
a) LLMs are excellent at keeping a "linear enough" storyline without being linear. They'll let you do outlandish things, but given the assignment of "tell a cohesive story" they manage to corral the story back to something sensible unless the player intentionally keeps pushing at the boundary (in which case they probably do want things to go off the rails)
b) LLMs can do delightfully colorful dialogue, they just need to be grounded in a character. Everyone thinks of factual grounding, but given enough description of speech patterns, character motivations, etc. they're capable of dialogue that's lively and completely rid of "GPT-isms", which are what tend to break immersion
I actually trained an open model [2] on the task of grounding LLMs in characters and actions as opposed to factual things like RAG, and eventually I want to build a game demo out of it
I've experimented with 30 or models so far, my general finding is closed source models like Claude have GPT-isms, while open source models do have a little less of a default tone but their ability to understand existing worlds is directly tied to how many tokens they were trained on.
Since existing worlds are (currently) where most of the stories are set, it's worth it to use a closed source models and wrangle their issues with dialogue.
To it's credit though, Llama 3 is the first OSS model trained on enough tokens to not feel lost for most worlds, so I've started routing some traffic to it for free users
The output format the site uses is also really really hard for most models to follow without fine-tuning, but fine-tuning then causes them to pick up the vocabulary of whichever model they were fine tuned on, which is a bit unfortunate
Really cool project. When I got to the sign in page, the email address I would have given my (edit: Google account) info seemed fishy, like it was a random string of letters. Any way to make it seem more…inviting?
Unfortunately Supabase charges extra for the luxury of setting that URL, and the site is wildly unprofitable right now so I'm sticking to their free offering for the time being
I recently built this at a hackathon: similar to website-to-chatbot products it crawls a webpage to understand a business and provide a model
But instead of a chatbot, this generates a set of guardrails for a chatbot based on your webpage
-
For example, if your website has information about a hotel, an LLM using RAG would attempt to answer most questions about hotels.
But by default there's no real-time information on things like weather or traffic conditions.
Rather than risk the chatbot hallucinating an answer, the guardrail model would detect a query likely to result in a hallucination and preemptively block it from reaching the underlying model
Yep, plenty of these that have come up over time. Can't think of the names of them at this point but I am sure I recall at least 5+ incidents of exactly this caused by incorrect caching.
I don’t think this form of generative AI needs to become a source of spam, carefully designed platforms can let people enjoy their niche content without making them feel isolated
Not really useful to give up the fight in the infancy of something with as much surface area as generative AI.
Is being used to create spam is not the same as needs to be spam, and we mostly just need platforms that leverage generative AI natively to bridge the gap.
My users don't find what these tools generate to be spam. They're enjoying a classic format with a novel level of flexibility and (understandably) find that very fun.
I agree we probably won't magically scale current techniques to AGI, but I also think the local maxima for creative output is going to be high enough that it changes how we approach it the way computers changed how we approach knowledge work.
That's why I focus on it at least.