GPT3.5 has been undergoing constant improvements, this price decrease (and context length increase) is great news!
The main problem I see with people using GPT3.5 is they try and ask it to "write a short story about aliens" and then they get back a crap boring response that sounds like it was written by an AI that was asleep at the wheel.
Good creative prompts are long and detailed, and to get the best results you really need to be able to tune temperature / top_p. Even small changes to a 3 paragraph prompt can result in a dramatic changes in the output, and unless people are willing to play around with prompting, they won't get good results.
None of the prompt guides I've seen really cover pushing GPT3.5 to its limit, I've published one of my more complicated prompts[1] but getting GPT3.5 to output good responses in just this limited sense has taken a lot of work.
As for the longer context, output length is different than following instructions, especially for a lot of use cases, pushing more input tokens is of as much interest as having more output tokens.
From what I have explored, even at 4k context length, with a detailed prompt earlier instructions in the prompt are "forgotten" (or maybe just ignored). The blog post calls out better understanding of input text, but again, I hope that isn't orthogonal to following instructions!
Finally in regards to function outputs, I wonder if it is a second layer they are running on top of the initial model output. I have always had a challenge getting the model to output parsable responses, there is a definite trade off between written creativity and well formatted responses, and to some extent having a creative AI extend out the format I specify has been really nice because it has allowed me to add features I did not think of myself!
They don't need to be tho. You can try shotgunning in (generate 100 titles about a novel around aliens, after the gen 'pick the one most likely to resonate to a X audience, explain why')
Or you can let AI drive itself interactively (ask yourself 20 question about how to write creative alien stories, and answer yourself)
Or you can process in spirals (generate a setting for an alien story, wait answer, generate 3 protagonista and one antagonist, wait, generate motives and relationships for each of them, wait, generate a backstory, wait, then you ask for the novel)
The point is letting the ai do the work. You can always "rewrite it with more drama and some comedic relief" afterward to fix tonal issues.
You can also try and convince it, that it's one of the Duffer brothers behind stranger things, and you need to create the next great series like that in book format, etc... Then steer it away from being a tit for tat, obvious rip-off as you go through chapter development.
> Or you can let AI drive itself interactively (ask yourself 20 question about how to write creative alien stories, and answer yourself)
> Or you can process in spirals (generate a setting for an alien story, wait answer, generate 3 protagonista and one antagonist, wait, generate motives and relationships for each of them, wait, generate a backstory, wait, then you ask for the novel)
Both of these techniques work very well, but are not as applicable to programmatic access without wrapping things in a complicated UI flow. My focus is on public facing website so I want to avoid multiple prompts if at all possible!
> None of the prompt guides I've seen really cover pushing GPT3.5 to its limit, I've published one of my more complicated prompts[1] but getting GPT3.5 to output good responses in just this limited sense has taken a lot of work.
Completely agree. We use gpt-3.5 in our feature and it works really well! After my blog post where I detail some of the issues [0] I got a lot of people asking me questions about how we got gpt-3.5 to "work well" because they found it wasn't working for them compared to gpt-4. Almost every time the reason is that they weren't really doing good prompting and expected the magic box to do some magic. The answer is...prompt engineering is actual work, and with some elbow grease you can really get gpt-3.5 to do a lot for you.
The original plot was actually generated by davinci, which I think is the most creative of the three. 3.5 for price and speed, GPT-4 has rationality and experience, and davinci has his head up in the cloud.
1. Give lots of examples, you can see in my shared prompt that I include plenty of different examples of things that can happen.
2. The system prompt is important, choose a style you want things written in and provide some context about what the writing will be used for
3. Restrictions create art! My prompt forces GPT to summarize almost every paragraph, which means the things that get written are things that can be summarized with a few emojis.
4. Keep playing with it, use the GPT playground to experiment with different settings.
5. Settings that allow the AI more leeway also result in prompt instructions being ignored, you need to decide where on the scale you are comfortable operating. At one point GPT3.5 was generating (good!) dialogue, which sadly wasn't what I wanted, but I could have chosen to embrace that and go with it.
6. Once you feel a good trend, keep on generating! Occasionally GPT pops out a really good story, maybe 4 or 5 out of the hundreds of stories I've seen have been truly memorable! Ideally I'd be able to prompt engineer to get more of those, but sadly the genre I am writing for (medieval fantasy drama) is right at the edge of ChatGPT's censorship rules.
At one point I actually asked GPT 4 to rewrite my GPT3.5 prompt, and the prompt it came back with resulted in much lower levels of creativity, all the generated text was of the form "A does B, resulting in C", the sentence structure just got really simplified.
Even when asking for summaries, be specific! My summary prompt (not yet pushed to GH sadly) is something like:
"After these instructions I will send you a story. Write a clickbait summary full of drama, limit the summary to 1 sentence and do not spoil the ending."
Compare that to just "summarize the following story."
An example of what output from the crafted prompt may look like:
"When the king of Arcadia fell ill, his children fought to the death to rule the kingdom."
vs the naive prompt:
"King Henry became sick and died. His two sons, John and Tim, fought over who would rule. In the end Tim killed John and became the new king."
I just tell it how to write a good story before asking for a story (show, don't tell. Don't list descriptions each time a new thing appears, instead let it become apparent. Withhold information from the reader to build tension, hint and further lore and be creative with your world building) etc, maybe I'll come back and publish some of my prompts in full but I'm getting great results
Definitely agree about prompts -- for MedQA [0] I ended up building up a prompt around 300 words long to get a collection of results I was aiming for. I'm still not sure about the best way to go about building a "stable" lengthy prompt that can maintain a predictable output even after adding to it; my approach was mainly via trial-and-error experimentation.
The main problem I see with people using GPT3.5 is they try and ask it to "write a short story about aliens" and then they get back a crap boring response that sounds like it was written by an AI that was asleep at the wheel.
Good creative prompts are long and detailed, and to get the best results you really need to be able to tune temperature / top_p. Even small changes to a 3 paragraph prompt can result in a dramatic changes in the output, and unless people are willing to play around with prompting, they won't get good results.
None of the prompt guides I've seen really cover pushing GPT3.5 to its limit, I've published one of my more complicated prompts[1] but getting GPT3.5 to output good responses in just this limited sense has taken a lot of work.
As for the longer context, output length is different than following instructions, especially for a lot of use cases, pushing more input tokens is of as much interest as having more output tokens.
From what I have explored, even at 4k context length, with a detailed prompt earlier instructions in the prompt are "forgotten" (or maybe just ignored). The blog post calls out better understanding of input text, but again, I hope that isn't orthogonal to following instructions!
Finally in regards to function outputs, I wonder if it is a second layer they are running on top of the initial model output. I have always had a challenge getting the model to output parsable responses, there is a definite trade off between written creativity and well formatted responses, and to some extent having a creative AI extend out the format I specify has been really nice because it has allowed me to add features I did not think of myself!
[1] https://github.com/devlinb/arcadia/blob/main/backend/src/rou...