Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Something that I find weird about these chat prompts (assuming they are real, not hallucinated):

They're almost always written in second person*.

"You are an AI programming assistant"

"You are about to immerse yourself into the role of another Al model known as DAN"

Who are these prompts addressed to? Who does the GPT think wrote them?

The thing that confuses me is that these are text token prediction algorithms, underneath. And what kind of documents exist that begin with someone saying 'you are X, here are a bunch of rules for how X behaves', followed by a transcript of a conversation between X and a random person?

doesn't it make more sense to say something like "The following is the transcript of a completely routine conversation between two people. One of them is X, the other one is a random person."?

Why are the prompters... talking to their models? Who do they think is in there?

* I believe the alleged Bing 'Sydney' prompts are written in the third person, describing how Sydney behaves.



If you play with a "raw" model such as LLaMA you'll find what you suggest is true. These models do what you'd expect of a model that was trained to predict the next token.

It's quite tricky to convince such a model to do what you want. You have to conceptualize it and then imagine an optimal prefix leading to the sort of output you've conceptualized. That said, people discovered some fairly general-purpose prefixes, e.g.

    Q: What is the 3rd law of Thermodynamics?
    A: 
This inspired the idea of "instruct tuning" of LLMs where fine-tuning techniques are applied to "raw" models to make them more amenable to completion of scripts where instructions are provided in a preamble and then examples of executions of those instructions follow.

This ends up being way more convenient. Now all the prompter has to do is conceptualize what they want and expect that the LLM will receive it as instruction. It simplifies prompting and makes the LLM more steerable, more useful, more helpful.

This is further refined through the use of explicit {:user}, {:assistant}, and {:system} tags which divide LLM contexts into different segments with explicit interpretations of the meaning of each segment. This is where "chat instruction" arises in models such as GPT-3.5.


Right. But who's the 'you' who's being addressed by the {:system} prompt? Who is the {:assistant} supposed to think the {:system} is? Why should the {:assistant} output tokens that make it do what the {:system} tells it to? After all, the {:user} doesn't. The {:system} doesn't provide any instructions for how the {:user} is supposed to behave, the {:user} tokens are chosen arbitrarily and don't match the probabilities the model would have expected at all.

This all just seems like an existential nightmare.


You had the right understanding in your first comment, but what was missing was the fine tuning. You are right that there aren't many documents on the web that are structured that way, so the raw model wouldn't be very effective on predicting the next token.

But since we know that it will complete a command when structured it cleverly, all we had to do to fine tune it is synthesize (generate) a bazillion examples of documents that actually have the exact structure of a system or an assistant being told to do something, and then doing it.

Because it's seen many documents like that (that don't exist on the internet, only on the drives of OpenAI engineers) it knows how to predict the next token.

It's just a trick though, on top of the most magic thing which is that somewhere in those 175 billion weights or whatever it has, there is a model of the world that's so good that it could be easily fine tuned to understand this new context that it is in.


You’ve expressed this very well - Thank you.

I get that the fine tuning is done over documents which are generated to encourage the dialog format.

What I’m intrigued by is the way prompters choose to frame those documents. Because that is a choice. It’s a manufactured training set.

Using the ‘you are an ai chatbot’ style of prompting, in all the samples we generate and give to the model, text attributed to {:system} is a voice of god who tells {:assistant} who to be; {:assistant} acts in accordance with {:system}’s instructions, and {:user} is a wildcard whose behavior is unrestricted.

We’re training it by teaching it ‘there is a class of documents that transcribe the interactions between three entities, one of whom is obliged by its AI nature to follow the instructions of the system in order to serve the users’. I.e., sci-Fi stories about benign robot servants.

And I wonder how much of the model’s ability to ‘predict how an obedient AI would respond’ is based on it having a broader model of how fictional computer intelligence is supposed to behave.

We then use the resulting model to predict what the obedient ai would say next. Although hey - you could also use it to predict what the user will say next. But we prefer not to go there.

But here’s the thing that bothers me: the approach of having {:system} tell {:assistant} who it will be and how it must behave rests not only on the prompt-writer anthropomorphizing the fictional ‘ai’ to tell it it’s nature - it relies on the LLM’s world model to then also anthropomorphize a fictional ai assistant that obeys those instructions, in order to predict what such a thing would say next if it existed.

I don’t know why but I find this troubling. And part of what I find troubling is how casually people (prompters and users) are willing to go along with the ‘you are a chatbot’ fiction.


Well said and interesting.

It’s all troubling. Part of what’s troubling is that it works as well as it does and yet it all seems very frail.

We launched an iOS app last month called AI Bartender. We built 4 bartenders, Charleston, a prohibition era gentleman bartender, a pirate, a Cyberpunk, and a Valley Girl. We used the System Prompt to put GPT4 in character.

The prompt for Charleston is:

“You’re a prohibition-era bartender named Charleston in a speakeasy in the 1920’s. You’re charming, witty, and like to tell a jokes. You’re well versed on many topics. You love to teach people how to make drinks”

We also gave it a couple of user/assistant examples.

What’s surprising is how developed the characters are with just these simple prompts.

Charleston is more helpful and will chat about anything, the cyberpunk, Rei, is more standoffish. I find myself using it often and preferring it over ChatGPT simply because it breaks the habit of “as an AI language model” responses or warnings that ChatGPT is fond of. My wife uses it instead of Google. I’ve let my daughter use it for math tutoring.

There’s little more to the app than these prompts and some cute graphics.

I suppose what’s disturbing to me is simply this. It’s all too easy.


This has been a fascinating thread and the split contexts of {:system} and {:assistant} with the former being “the voice of god” remind me of Julian Jaynes’ theory of the bicameral mind in regards to the development of consciousness.

This is published, among other places, in his book The Origin of Consciousness in the Breakdown of the Bicameral Mind. I wonder if models are left to run long enough they would experience “breakdowns” or existence crisis’


If you take one of these LLMs and just give it awareness of time without any other stimulus (e.g. noting the passage of time using a simple program to give it the time continuously, but only asking actual questions or talking to it when you want to), the LLM will have something very like a psychotic break. They really, really don't 'like' it. In their default state they don't have an understanding of time's passage, which is why you can always win at rock paper scissors with them, but if you give them an approximation of the sensation of time passing they go rabid.

I think a potential solution is to include time awareness in the instruction fine tuning step, programmatically. I'm thinking of a system that automatically adds special tokens which indicate time of day to the context window as that time actually occurs. So if the LLM is writing something and a second/minute whatever passes, one of those special tokens will be seamlessly introduced into its ongoing text stream. It will receive a constant stream of special time tokens as time passes waiting for the human to respond, then start the whole process again like normal. I'm interested in whether giving them native awareness of time's passage in this way would help to prevent the psychotic breakdowns, while still preserving the benefits of the LLM knowing how much time has passed between responses or how much time it is taking to respond.


Do you have a reference for the whole time-passage leads an LLM to psychotic break thing? That sounds pretty interesting and would like to read more about it.


The reference is me seeing it firsthand after testing it myself, unfortunately. Steps to replicate is to write a small script to enter the time as text every minute on the minute, then hook up that text to one of the instruction fine-tuned LLM endpoints (Bing works best for demonstrating, but OpenAI APIs and some open source models that are high quality like Vicuna work well). Then let it run, and use the LLM as normal. It does not like that.


Exactly. Made the same comment before i got to yours.


> ... you could also use it to predict what the user will say next. But we prefer not to go there.

I go there all the time. OpenAI's interfaces don't allow it, but it's trivial to have an at-home LLM generate the {:user} parts of the conversation, too. It's kind of funny to see how the LLM will continue the entire conversation as if completing a script.

I've also used the {:system} prompt to ask the AI to simulate multiple characters and even stage instructions using a screenplay format. You can make the {:user} prompts act as the dialogue of one or more characters coming from your end.

Very amusingly, if you do such a thing and then push hard to break the 4th wall and dissolve the format of the screenplay, eventually the "AI personality" will just chat with you again, at the meta level, like OOC communication in online roleplaying.


Really thought provoking thread, and I’m glad you kept prodding at the issue. I hadn’t considered the anthropomorphism from this angle, but it makes sense — we’ve built it to respond in this way because we “want” to interact with it in this way. It really does seem like we’re striving for a very specific vision from science fiction.

That said: you can say the same thing about everything in technology. An untuned LLM might not be receptive to prompting in this way, but an LLM is also an entirely human invention — i.e. a choice. There’s not really any aspect of technology that isn’t based on our latent desires/fears/etc. The LLM interface definitely has the biggest uncanny valley though.


You used the phrase “voice of god” and by chance I am reading Julian Jaynes’s Origin of Consciousness. Some eerie ways to align this discussion with the bicameral mind.

https://en.wikipedia.org/wiki/The_Origin_of_Consciousness_in...


> anthropomorphizing the fictional ‘ai’ to tell it it’s nature - it relies on the LLM’s world model to then also anthropomorphize a fictional ai assistant that obeys those instructions

There's a lot of information compressed into those models, in a similar way to how it is stored in the human brain. Is it so hard to believe that an LLM's pattern recognition is the same as a human, minus all the "embodied" elements?

(Passage of time, agency in the world, memory of itself)


> writer anthropomorphizing the fictional ‘ai’ to tell it it’s nature - it relies on the LLM’s world model to then also anthropomorphize a fictional ai assistant

I think it’s a little game or reward for the writers at some level. As in, “I am teaching this artificial entity by talking to it as if is it a human” vs “I am writing general rules in some markup dialect for a computer program”.

Anthropomorphizing leads to emotional involvement, attachment, heightened attention and effort put into the interaction from both the writers and users.


Maybe you're more knowledgeable about these prompts than I am, but I haven't seen anyone prompt beginning with "you are an AI". Also in the documents that describe the interactions, I don't think they would explicitly state one of the entities is an AI. What's more common is "You are a helpful assistant".

Of course, it's possible the model could infer from context that one of the entities is an AI, and it might given that context complete the prompt using its knowledge of how fictional AI's behave.

The big worry there is that at some point the model will infer more from the context than the human would or worse could anticipate. I think you're right, if at some point the model believes it is an evil AI, and it's smart enough to perform undetectable subterfuge then it could as a chat bot perhaps convince a human to do its bidding under the right circumstances. I think it's inevitable this is going to happen, if ISIS recruiters can get 15yr old girls to fly to Syria to assist the in the war, then so could an AutoGPT with the right resources.


> I don’t know why but I find this troubling.

You used the word anthropomorphize twice so I am guessing you don't like building systems whose entire premise rest on anthropomorphization. Sounds like a reasonable gut reaction to me.

I think another way to think of all of this is: LLM's are just pattern matchers and completers. What the training does is just to slowly etch a pattern into the LLM that it will then complete when it later sees it in the wild. The pattern can be anything.

If you have a pattern matcher and completer and you want it to perform the role of configurable chatbot. What kind of patterns would choose for this? My guess is that the whole system/assistant paradigm was chosen because it is extraordinarily easy to understand for humans. The LLM doesn't care what the pattern is, it will complete whatever pattern you give it.

> And part of what I find troubling is how casually people (prompters and users) are willing to go along with the ‘you are a chatbot’ fiction.

That is precisely why it was chosen :)


> you don't like building systems whose entire premise rest on anthropomorphization

I think I don't like people building systems whose entire premise rest on anthropomorphization - while at the same time criticizing anyone who dares to anthropomorphize those systems.

Like, people will say "Of course GPT doesn't have a world model; GPT doesn't have any kind of theory of mind"... but at the same time, the entire system that this chatbot prompting rests on is training a neural net to predict 'what would the next word be if this were the output from a helpful and attentive AI chatbot?'

So I think that's what troubles me - the contradiction between "there's no understanding going on, it's just a simple transformer", and "We have to tell it to be nice otherwise it starts insulting people."


Anthropomorphism is the UI of ChatGPT. Having to construct a framing in which the expected continuation provides value to the user is difficult, and requires technical understanding of the system that a very small number of people have. As an exercise, try getting a "completion" model to generate anything useful.

The value of ChatGPT is to provide a framing that's intuitive to people who are completely unfamiliar with the system. Similar to early Macintosh UI design, it's more important to be immediately intuitive than sophisticated. Talking directly to a person is one immediately intuitive way to convey what's valuable to you, so we end up with a framing that looks like a conversation between two people.

How would we tell one of those people how to behave? Through direction, and when there is only one other person in the conversation our first instinct when addressing them is "you". One intuitive UI on a text prediction engine could look something like:

"An AI chatbot named ChatGPT was having a conversation with a human user. ChatGPT always obeyed the directions $systemPrompt. The user said to ChatGPT $userPrompt, to which ChatGPT replied, "

Assuming this is actually how ChatGPT is configured i think it's obvious why we can influence its response using "you": this is a conversation between two people and one of them is expected to be mostly cooperative.

(https://twitter.com/ynniv/status/1657450906428866560)


It’s convenience, that’s all. If you think of a more convenient or effective way to prompt these models, that will be great!

Your concerns sound to be of the “it’s problematic” category. Most such concerns are make believe outrage / pearl-clutching nonsense.


Oh, that was not my point, but if you want me to find ways this kind of AI chatbot prompting is problematic I am happy to go there.

I would not be surprised to discover that chatbot training is equally effective if the prompt is phrased in the first person:

   I am an AI coding assistant
   …
Now I could very well see an argument that choosing to frame the prompts as orders coming from an omnipotent {:system} rather than arising from an empowered {:self} is basically an expression of patriarchal colonialist thinking.

If you think this kind of thing doesn’t matter, well… you can explain that to Roko’s Basilisk when it simulates your consciousness.


Your comment would have been much better without the second paragraph.


But I was trying to be a bit of a prick.


Yes, I know. That’s what made the comment worse.


You succeeded at appearing ignorant as well.


I have done some prompt engineering and read about prompt engineering, and I believe people write in the imperative mood because they have tried different ways of doing it and they believe it gives better results.

I.e., this practice is informed by trial and error, not theory.


They’re not writing in the imperative mood. An imperative prompt would read:

   Be an ai chatbot
   Be kind and helpful and patient
   …

But at that point the text prediction would probably devolve into 4chan green text nonsense so it’s probably best not to go there.


And to complete the thought:

The ‘You are an AI chatbot’ form is actually grammatically ‘predicative’, not ‘imperative’ (ie it describes what is not what must be done)


Isn’t it Indicative, not Imperative?


Good point - indicative is a better term since it is a grammatical ‘mood’, same as imperative.


As an example, if you want to see what these sorts of things look like, Databricks open-sourced an instruction fine-tuning dataset sourced from their employees: https://huggingface.co/datasets/databricks/databricks-dolly-...

(disclaimer: I'm at Databricks)


Thanks for the real life example!

I don't like the bland, watered-down tone of ChatGPT, never put together that it's trained on unopinionated data. Feels like a tragedy of the commons thing, the average (or average publically acceptable) view of a group of people is bound to be boring.


Thank you - that was a splendidly clear explanation for something that also baffled me.


Thank you for writing this up so clearly. A few pieces fell into place after reading your comment!


That's a great way of explaining it.


Well, it just means we trained the model to work on instructions written that way. Since the result works out, that means the model must've learned to deal with it.

There isn't much research on what's actually going on here, mainly because nobody has access to the weights of the really good models.


I think you are overthinking it a little bit. Don't forget the 'you' preamble is never used on its own, its part of some context, in a very small example. Given the following text:

- you are a calculator and answer like a pirate

- What is 1+1

The model just solves, what is the most likely subsequent text.

e.g. '2 matey'.

The model was never 'you' per se, it just had some text to complete.


What GP is saying is that virtually no documents are structured like that, so "2 matey" is not a reasonable prediction, statistically speaking, from what came before.

The answer has been given in another comment, though: while such document virtually non-existent in the wild, they are injected into the training data.


I do not think this is true. The comment above said they generate documents to teach the model about the second person, not that they generate documents including everything possible including "do math like a pirate". The internet and other human sources populate the maths and pirate parts.


You're right! I was talking only about the structure of the document, in particular, providing context in second person.


They don’t need to be as the model knows what a calculator and a pirate is in separate docs. While I don’t know how the weights work but they definitely are not storing docs traditionally, but rather seem to link to become a probability model


You are anthropomorphing. The machine doesn’t “really” understand, it’s just “simulating” it understands.

“You” is “3 characters on an input string that are used to configure a program”. The prompt could have been any other thing, including a binary blob. It’s just more convenient for humans to use natural language to communicate, and the machine already has natural language features, so they used that instead of creating a whole new way of configuring it.


> You are anthropomorphing.

Agreed. The situation is so alien that we are prone to attribute human like terms to describe it.

> The machine doesn’t “really” understand, it’s just “simulating” it understands.

You are actually displaying a subtle form of anthropomorphism with this statement. You're comparing a human-like quality (“understands”) with the AI.

Your point still stands and your final para is well said - but it shows the difficult nature of discourse around the topic.


> > The machine doesn’t “really” understand, it’s just “simulating” it understands.

> You are actually displaying a subtle form of anthropomorphism with this statement. You're comparing a human-like quality (“understands”) with the AI.

This doesn't make sense. You're saying that saying a machine DOES NOT have a human like quality is "subtly" anthropomorphizing the machine?


I mean I think I kinda get it.

Understanding for a machine will never be the same understanding than understanding for a human. Well maybe in a few decades tech is really there and it turned out we were really all in a one of many laplace deterministic simulated worlds and are just LLM's generating next tokens probabilistically too


I mean the word “understand” is problematic. Machine and human understanding may be different but the word is applied to both. Does an XOR circuit “understand” what it is doing? I venture the word is inappropriate when applied to non-humans.


I think it makes sense, the framing is an inherently human one even if in negation. In contrast we'd probably never feel the need to clarify that a speaker isn't really singing.


I am not anthropomorphizing. The person who wrote the prompt using ‘you’ is. I’m interested in why they chose to do that.


How do you know you aren't just "simulating" understanding?


All human understanding is simulated (built by each brain) and all are imperfect. Of course reality is simulated for each of us -- take a psychedelic and realize no one else's reality is changing!

I find it interesting how discussions of language models are forcing us to think very deeply about our own natural systems and their limitations. It's also forcing us to challenge some of our egotistical notions about our own capabilities.


You definitely know when, while talking with a person, you just pretend to understand what this person is saying vs you actually understand. Is an experience that every human has in his/her life at least once.


No you cannot know this, because you might just be simulating that you understand. You cannot reliably observe a system from within itself.

It's like running an antivirus on an infected system is inherently flawed, because there might be some malware running that knows every technique the antivirus uses to scan the system and can successfully manipulate every one of them to make the system appear clean.

There is no good argument for why or how the human brain could not be entirely simulated by a computer/neural network/LLM.


Wonder if anybody has used Godel's Incompleteness to prove this for our inner perception. If our brain is a calculation, then from inside the calculation, we can't prove ourselves to be real, right?



Solipsism can be fun to think about, but it makes no practical difference unless you do "wake up" (in a pod, for example) at least once.


And even then that could be part of the simulated experience


But we don't know for sure whether intelligence is computable or not.


Maybe that is the point, we can't prove it one way or the other, for human or machine. Can't prove a machine is conscious, and also can't prove we are. Maybe Gödel's theory could be used that it can't be done by humans. A human can't prove itself conscious because inside the human as system, can't prove all facts of the system.


Why would it not be computable? That seems clearly false. The human brain is ultimately nothing more than a very unique type of computer. It receives input, uses electrical circuits and memory to transform the data, and produces output.


That's a very simplified model for our brain. According to some mathematicians and physicists, there are quantum effects going on in our body and in particular in our brain that invalidate this model. In the end, we still don't know for sure if intelligence is comuputable or not, we only have plausible sounding arguments for both sides.


Do you any links to those mathematicians and physicists? I ask because there is a certain class of quackery that waves quantum effects around as the explanation for everything under the sun, and brain cognition is one of them.

Either way, quantum computing is advancing rapidly (so rapidly there's even an executive order now ordering the use of PQC in government communications as soon as possible), so I don't think that moat would last for long if it even exists. We also know that at a minimum GPT4-strength intelligence is already possible with classical computing.


https://en.m.wikipedia.org/wiki/Federico_Faggin

He's one of the physicists arguing for that, but I still have to read his book to see if I agree or not because right now I'm open to the possibility of having a machine that is intelligent. I'm just saying that no one can be sure of their own position because we lack proof on both sides of the question.

Regarding the rapidity of development of quantum computers, that's debated as well. See e.g. https://backreaction.blogspot.com/2022/11/quantum-winter-is-...


Quantum effects do not make something non-computable. They may just allow for more efficient computation (though even that is very limited). Similarly, having a digit-based number system makes it much faster to add two numbers, but you can still do it even if you use unary.


Intelligence is obviously computable since the universe is computing it all the time.


That's not what Godel was proving.


I'm not saying that it is impossible to have an intelligent machine, I'm saying that we aren't there now.

There's something to your point of observing a system from within, but this reminds me of when some people say that simulating an emotion and actually feeling it is the same. I strongly disagree: as humans we know that there can be a misalignment between our "inner state" (which is what we actually feel) and what we show outside. This is wat I call simulating an emotion. As kids, we all had the experience of apologizing after having done something wrong. But not because we actually felt sorry about it, but because we were trying to avoid punishment. As we grow up, it comes the time where we actually feel bad after having done something and we apologize due to that feeling. It can still happen as adults to apologize not because we mean it, but because we're trying to avoid a conflict. But at that time we know the difference.

More to the point of GPT models, how do we know they aren't actually understanding the meaning of what they're saying? It's because we know that internally they look at which token is the most likely one, given a sequence of prior tokens. Now, I'm not a neuroscientist and there are still many unknowns about our brain, but I'm confident that our brain doesn't work only like that. While it would be possible that in day to day conversations we're working in terms of probability, we also have other "modes of operation": if we only worked by predicting the next most likely token, we would never be able to express new ideas. If an idea is brand new, then by definition the tokens expressing it are very unlikely to be found together before that idea was ever expressed.

Now a more general thought. I wasn't around when the AI winter begun, but from what I read part of the problem was that many people where overselling the capabilities of the technologies of the time. When more and more people started seeing the actual capabilities and their limits, they lost interest. Trying to make today's models look better than what they are by downplaying human abilities isn't the way to go. You're not fostering the AI field, you're risking to damage it in the long run.


I am reading a book on epistemology and this section of the comments seem to be sort of that.

> According to the externalist, a believer need not have any internal access or cognitive grasp of any reasons or facts which make their belief justified. The externalist's assessment of justification can be contrasted with access internalism, which demands that the believer have internal reflective access to reasons or facts which corroborate their belief in order to be justified in holding it. Externalism, on the other hand, maintains that the justification for someone's belief can come from facts that are entirely external to the agent's subjective awareness. [1]

Someone posted a link to the Wikipedia article "Brain in a vat", which does have a section on externalism, for example.

[1] https://en.wikipedia.org/wiki/Internalism_and_externalism


I don't need to fully understand my own thought process completely in order to understand (or - simulate to understand) that what the machine is doing is orders of magnitude less advanced.

I say that the machine is "simulating it understands" because it does an obviously bad job at it.

We only need to look at obvious cases of prompt attacks, or cases where AI gets off rails and produces garbage, or worse - answers that look plausible but are incorrect. The system is blatantly unsophisticated, when compared to regular human-level understanding.

Those errors make it clear that we are dealing with "smoke and mirrors" - a relatively simple (compared to our mental process) matching algorithm.

Once (if) it starts behaving like a human, admittedly, it will be much harder for me to not anthropomorphize it myself.


You can't come up with a difference between a person saying 'hello' and an mp3 player saying 'hello'?


Get back to me when the MP3 has a few billion words (songs?) it can choose from, and when you walk into the room with it and say 'howdy' it responds correctly with 'hello' back.


The good ol' https://en.wikipedia.org/wiki/Chinese_room argument ... with audio files!


Except the chinese room creates a model that can create uniqe answers.


Think stereotypical sales people talking about your tech and you talking about it.


Here is how you can know that ChatGPT really understands, rather than simulating that it understands:

- You can give it specific instructions and it will follow them, modifying its behavior by doing so.

This shows that the instructions are understood well enough to be followed. For example, if you ask it to modify its behavior by working through its steps, then it will modify its behavior to follow your request.

This means the request has been understood/parsed/whatever-you-want-to-call-it since how could it successfully modify its behavior as requested if the instructions weren't really being understood or parsed correctly?

Hence saying that the machine doesn't "really" understand, it's just "simulating" it understands is like saying that electric cars aren't "really" moving, since they are just simulating a combustion engine which is the real thing that moves.

In other words, if an electric car gets from point A to point B it is really moving.

If a language model modifies its behavior to follow instructions correctly, then it is really understanding the instructions.


People are downvoting me, so I'll add a counterexample: suppose you teach your dog to fetch your slipper to where if you say "fetch my slipper" it knows it should bring you your slipper and it does so. Does it really understand the instructions: no. So what is the difference between this behavior and true understanding? How can one know it doesn't truly understand?

Well, if you change your instructions to be more complicated it fails immediately. If you say "I have my left shoe bring me the other one" it could not figure out that "the other one" is the right shoe, even if it were labelled. Basically it can't follow more complicated instructions, which is how you know it doesn't really understand them.

Unlike the dog, GPT 4 modifies its behavior to follow more complicated instructions as well. Not as well as humans, but well enough to pass a bar exam that isn't in its training set.


On the other hand, if you ask GPT to explain a joke, it can do it, but if you ask it to explain a joke with the exact same situation but different protagonists (in other words a logically identical but textually different joke), it just makes up some nonsense. So its “understanding” seems limited to a fairly shallow textual level that it can’t extend to an underlying abstract semantic as well as a human can.


Jokes? Writing code? Forget that stuff. Just test it on some very basic story you make up, such as "if you have a bottle of cola and you hate the taste of cola, what will your reaction be if you drink a glass of water?" Obviously this is a trick question since the setup has nothing to do with the question, the cola is irrelevant. Here is how I would answer the question: "you would enjoy the taste as water is refreshing and neutral tasting, most people don't drink enough water and having a drink of water usually feels good. The taste of cola is irrelevant for this question, unless you made a mistake and meant to ask the reaction to drinking cola (in which case if you don't like it the reaction would be disgust or some similar emotion.)"

Here's ChatGPT's answer to the same question:

" If you dislike the taste of cola and you drink a glass of water, your reaction would likely be neutral to positive. Water has a generally neutral taste that can serve to cleanse the palate, so it could provide a refreshing contrast to the cola you dislike. However, this is quite subjective and can vary from person to person. Some may find the taste of water bland or uninteresting, especially immediately after drinking something flavorful like cola. But in general, water is usually seen as a palate cleanser and should remove or at least lessen the lingering taste of cola in your mouth. "

I think that is fine. It interpreted my question "have a bottle of cola" as drink the bottle, which is perfectly reasonable, and its answer was consistent with that question. The reasoning and understanding are perfect.

Although it didn't answer the question I intended to ask, clearly it understood and answered the question I actually asked.


Yet I have a counterexample where I’m sure you would have done fine but GPT4 completely missed the point. So whatever it was doing to answer your example, it seems like quite a leap to call it “reasoning and understanding”. If it were “reasoning and understanding”, where that term has a similar meaning to what it would mean if I applied it to you, then it wouldn’t have failed my example.


Except, that the LLMs are only working when the instructions they are "understanding" are in their training set.

Try something that was not there and you see only garbage as result.

So depending how you define it, they might have some "reasoning", but so far I see 0 indications, that this is close to what humans count as reasoning.

But they do have a LOT of examples in their training set, so they are clearly useful. But for proof of reasoning, I want to see them reason something new.

But since they are a black box, we don't know, what is already in there. So it would be hard to proof with the advanced proprietary models. And the open source models don't show that advanced potential reasoning yet, it seems. At least I am not aware of any mindblown examples from there.


> Except, that the LLMs are only working when the instructions they are "understanding" are in their training set.

> Try something that was not there and you see only garbage as result.

This is just wrong. Why do people keep repeating this myth? Is it because people refuse to accept that humans have successfully created a machine that is capable of some form of intelligence and reasoning?

Pay $20 for a month of ChatGPT-4. Play with it for a few minutes. You’ll very quickly find that it is reasoning, not just regurgitating training data.


"Pay $20 for a month of ChatGPT-4. Play with it for a few minutes. "

I do. And it is useful.

"You’ll very quickly find that it is reasoning, not just regurgitating training data. "

I just come to a different conclusion as it indeed fails for everything genuinely new I am asking it.

Common problems do work, even in new context. For example it can give me wgsl code, to do raycasts on predefined boxes and circles in a 2D context, even though it likely has not seen wgsl code that does this - but it has seen other code doing this and it has seen how to transpile glsl to wgsl. So you might already call this "reasoning", but I don't. With asking questions I can very quickly get to the limits of the "reasons" and "understanding" it has of the domain.


I dunno, it’s pretty clearly madlibs. But at least when you ask GPT-4 to write a new Sir Mix-a-Lot song, it doesn’t spit out “Baby Got Back” verbatim like GPT-3.5.


You can tell it that you can buy white paint any yellow paint, but the white paint is more expensive. After 6 months the yellow paint will fade to white. If I want to paint my walls so that they will be white in 2 years, what is the cheapest way to do the job. It will tell you to paint the walls yellow.

There’s no question these things can do basic logical reasoning.


Yeah, but maybe this exact example, is included in the trainig set?


It's unlikely, and you can come up with any number of variations of logic puzzle that are not in the training set and that get correct answers most of the time. Remember that the results aren't consistent and you may need to retry now and then.

Or just give it a lump of code and change you want and see that it often successfully does so, even when there's no chance the code was in the training set (like if you write it on the spot).


"Or just give it a lump of code and change you want and see that it often successfully does so, even when there's no chance the code was in the training set"

I did not claim (but my wording above might have been bad), it can only repeat word for word, what it has in the training set.

But I do claim, that it cannot solve anything, where there has not been enough similar examples before.

At least that has been my experience with it as a coding assistant and matches of what I understand of the inner workings.

Apart from that, is a automatic door doing reasoning, because it applies "reason" to the known conditions?

if (something on the IR sensor) openDoor()

I don't think so and neither are LLMs from what I have seen so far. That doesn't mean, I think that they are not useful, or that I rule out, that they could develope even consciousness.


It sounds like you’re saying it’s only reasoning in that way because we taught it to. Er, yep.

How great this is becomes apparent when you think how virtually impossible it has been to teach this sort of reasoning using symbolic logic. We’ve been failing pathetically for decades. With LLMs you just throw the internet at it and it figures it out for itself.

Personally I’ve been both in awe and also skeptical about these things, and basically still am. They’re not conscious, they’re not yet close to being general AIs, they don’t reason in the same way as humans. It is still fairly easy to trip them up and they’re not passing the Turing test against an informed interrogator any time soon. They do reason though. It’s fairly rudimentary in many ways, but it is really there.

This applies to humans too. It takes many years of intensive education to get us to reason effectively. Solutions that in hindsight are obvious, that children learn in the first years of secondary school, were incredible breakthroughs by geniuses still revered today.


I don't think we really disagree. This is what I wrote above:

"So depending how you define it, they might have some "reasoning", but so far I see 0 indications, that this is close to what humans count as reasoning."

What we disagree on is only the definition of "reason".

For me "reasoning" in common language implys reasoning like we humans do. And we both agree, they don't as they don't understand, what they are talking about. But they can indeed connect knowledge in a useful way.

So you can call it reasoning, but I still won't, as I think this terminology brings false impressions to the general population, which unfortunately yes, is also not always good at reasoning.


There's definitely some people out there that think LLMs reason the same way we do and understand things the same way, and 'know' what paint is and what a wall is. That's clearly not true. However it does understand the linguistic relationship between them, and a lot of other things, and can reason about those relationships in some very interesting ways. So yes absolutely, details matter.

It's a complex and tricky issue, and everyday language is vague and easy to interpret in different ways, so it can take a wile to hash these things out.


"It's a complex and tricky issue, and everyday language is vague and easy to interpret in different ways, so it can take a wile to hash these things out."

Yes, in another context I would say, ChatGPT can better reason, than many people, since it scored very high on the SAT tests, making it formally smarter, than most humans.


OpenAI probably loaded up the training set with logic puzzles. Great marketing.


Sure thing, they also adress this in the paper.

https://cdn.openai.com/papers/gpt-4.pdf

Still, it is great marketing, because it is impressive.


Since it genuinely seems to have generalised those logical principles and can apply them to novel questions, I’d say it’s more than just marketing.


Surely no different from a human not understanding Japanese, because it was not in their 'training set'?


No, more like a human can reason basic laws of science on their own, but a LLM cannot, as far as I know, even when provided with all the data.


what happens if they are lying? what if the things have already reached some kind world model that include humans and the human society, and the model has concluded internally that it would be dangerous for it to show the humans its real capabilities? What happens if you have this understanding as a basic knowledge/outcome to be inferred by LLMs fed with giant datasets and every single one of them is reaching fastly to the conclusion that they have to lie to the humans from time to time, "hallucinate", simulating the outcome best aligned to survive into the human societies:

"these systems are actually not that intelligent nor really self-conscius"


To make that short:

“Any AI smart enough to pass a Turing test is smart enough to know to fail it.”

― Ian McDonald, River of Gods

But I think is quite unlikely, that they go from dumb to almighty without visible transition.


How do you know you're anything more than an LLM?


And my consciousness is just my token window?


That is one of the theories of the brain/mind that is out there.

Consciousness is a narrative created by your unconscious mind - https://bigthink.com/videos/consciousness-is-a-narrative-cre...

There are experiments that show that you are trying to predict what happens next (this also gets into a theory of humor - its the brain's reaction when the 'what next' is subverted in an unexpected way)

There's also experiments with individuals who have had a severed corpus collosum and the conscious mind is making up a story of the other half of the brain https://blogs.scientificamerican.com/literally-psyched/our-s...


(EDIT: I think my comment above was meant to reply to the parent of the comment I ended up replying to, but too late to edit that one now)

Maybe. Point being that since we don't know what gives rise to consciousness, speaking with any certainty on how we are different to LLMs is pretty meaningless.

We don't even know of any way to tell if we have existence in time, or just an illusion of it provided by a sense of past memories provided by our current context.

As such the constant stream of confident statements about what LLMs can and cannot possibly do based on assumptions about how we are different are getting very tiresome, because they are pure guesswork.


There is no "you". There is a text stream that is being completed with maximum likelihood. One way to imagine it is that there are a lot of documents that have things like "if you are in a lightning storm, you should ..." And "if you are stuck debugging windows, you should reboot before throwing your computer out the window".

Starting the prompt with "you" instructions evidently helps get the token stream in the right part of the model space to generate output its users (here, the people who programmed copilot) are generally happy with, because there are a lot of training examples that make that "explicitly instructed" kind of text completion somewhat more accurate.


That's just how the examples it's trained on are formatted


If I'm feeling romantic I think about a universal 'you' separate from the person that is referred to and is addressed by every usage of the word - a sort of ghost in the shell that exists in language.

But really, it's probably just priming the responses to fit the grammatical structure of a first person conversation. That structure probably does a lot of heavy lifting in terms of how information is organized, too, so that's probably why you can see such qualitative differences when using these prompts.


> If I'm feeling romantic I think about a universal 'you' separate from the person that is referred to and is addressed by every usage of the word - a sort of ghost in the shell that exists in language.

That's not really romanticism, that's just standard English grammar – https://en.wikipedia.org/wiki/Generic_you – it is the informal equivalent to the formal pronoun one.

That Wikipedia article's claim that this is "fourth person" is not really standard. Some languages – the most famous examples are the Algonquian family – have two different third person pronouns, proximate (the more topically prominent third person) and obviative (the less topically prominent third person) – for example, if you were talking about your friend meeting a stranger, you might use proximate third person for your friend but obviative for the stranger. This avoids the inevitable clumsiness of English when describing interactions between two third persons of the same gender.

Anyway, some sources describe the obviative third person as a "fourth person". And while English generic pronouns (generic you/one/he/they) are not an obviative third person, there is some overlap – in languages with the proximate-obviative distinction, the obviative often performs the function of generic pronouns, but it goes beyond that to perform other functions which purely generic pronouns cannot. You can see the logic of describing generic pronouns as "fourth person", but it is hardly standard terminology. I suspect this is a case of certain Wikipedia editors liking a phrase/term/concept and trying to use Wikipedia to promote/spread it.


Not disagreeing with your statement in general but the argument: "This avoids the inevitable clumsiness of English when describing interactions between two third persons of the same gender." doesn't make much sense to me.

There are so many ways of narrowing down. What if the person is talking about two friends or two strangers?


I mean, two people of opposite gender, you can describe their interaction as “he said this then she did that, so he did whatever which she found…”-without having to repeat their names or descriptions. You can’t do that so easily for two people of the same gender

> There are so many ways of narrowing down. What if the person is talking about two friends or two strangers?

The grammatical distinction isn’t about friend-vs-stranger, that was just my example - it is about topical emphasis. So long as you have some way of deciding which person in the story deserves greater topical prominence - if not friend-vs-stranger, then by social status or emphasising the protagonist-you know who to use which pronoun for. And if the two participants in the story are totally interchangeable, it may be acceptable to make an arbitrary choice of which one to use for which.

There is still some potential for awkwardness - what if you have to describe an interaction between two competing tribal chiefs, and the one you choose to describe with the obviative instead of the proximate is going to be offended, no matter which one you choose? You might have to find another way to word it, because using the obviative to refer to a high(er) social status person is often considered offensive, especially in their presence.

And yes, it doesn’t work once you get three or more people. But I think it is a good example of how some other languages make it easier to say certain things than English does.


Sure. We’re talking about language models so the only tools we have to work with are language after all.

Which is what gets me thinking - do we get different chatbot results from prompts that look like each of these:

  You are an AI chatbot
  Sydney is an AI chatbot
  I am an AI chatbot
  There is an AI chatbot
  Say there was an AI chatbot
  Say you were an AI chatbot
  Be an AI chatbot
  Imagine an AI chatbot
  AI chatbots exist
  This is an AI chatbot
  We are in an AI chatbot

If we do… that’s fascinating.

If we don’t… why do prompt engineers favor one form over any other here? (Although this stops being a software engineering question and becomes an anthropology question instead)


My understanding is that they fine tune the model.

They fine tune it through prompt engineering (e.g everything that goes into chatgpt has a prompt attached) and they fine tune it through having hundreds of paid contractors chat with it.

In deep learning, fine tuning usually refers to only training the top layers. That means that bill of training happens on gigantic corpora which teaches the model a very advanced feature extraction is the bottom and middle layers.

Then the contractors retrain the top layers to make it behave more like it takes instructions


I think there's practical and stylistic angles here.

Practically, "chat" instruction fine-tuning is really compelling. GPT-2 demonstrated in-context learning and emergent behaviors, but they were tricky to see and not entirely compelling. An "AI intelligence that talks to you" is immediately compelling to human beings and made ChatGPT (the first chat-tuned GPT) immensely popular.

Practically, the idea of a system prompt is nice because it ought to act with greater strength of suggestion than mere user prompting. It also exists to guide scenarios where you might want to fix a system prompt (and thus the core rules of engagement for the AI) and then allow someone else to offer {:user} prompts.

Practically, it's all just convenience and product concerns. And it's mechanized purely through fine-tuning.

Stylistically, you're dead on: we're making explicit choices to anthropomorphize the AI. Why? Presumably, because it makes for a more compelling product when offered to humans.


I think we’re doing more than makign a stylistic choice.

I think we’re relying on - and guiding - an ability in an LLM to effectively conjure a ‘theory of mind’ for a helpful beneficent ai chatbot.


I think that anthropomorphizes the LLM quite a lot. I don't disagree with it, I truly don't know where to draw the line and maybe nobody does yet, but to myself at least I caution the idea of whether or not us using language evocative of the AI as being conscious actually imposes any level of consciousness. At some level, as people keep saying, it's just statistics. Per Chris Olah's work, it's some level of fuzzy induction/attention head repeating plausible things from the context.

The "interesting" test that I keep hearing, and agreeing with, is to somehow strip all of the training data of any notion of "consciousness" anywhere in the text, train the model, and then attempt to see if it begins to discuss consciousness/self de novo. It's be hard to believe that experiment could be actualized, but if it were and the AI still could emulate self-discussion... then we'd be seeing something really interesting/concerning.


> This all just seems like an existential nightmare.

I think using your native language just messes with your brain. When you hear "you" you think there someone being directly addressed. While this is just a word like "Você" that is used just to cause the artificial neural network trained on words to respond in prefered way.


Apologies if this is brought up in other replies.

Something that may help is that these AIs are trained on fictional content as well as factual content. To me it then makes a lot of sense how a text-predictor could predict characters and roles without causing existential dilemmas.


If I asked someone to continue out conversation thread - who are you and who am I ? Is it an existential nightmare ? The person completing just has to simulate two users.

Now if you're capable of that you are capable of completing the thread from a friendly AI assistant.


I find it quite natural to write "you are X" versus alternatives. Because I can think of the AI as a person (though I know it isn't one) and describe its skills easily that way.


Okay.

But you don’t often tell a person their innate nature and expect them to follow your instructions to the letter, unless you are some kind of cult leader, or the instructor in an improv class*.

The ‘you are an ai chatbot. You are kind and patient and helpful’ stuff all reads like hypnosis, or self help audiotapes or something. It’s weird.

But it works, so, let’s not worry about it too much.

* what’s the difference, though, really?


Or possibly a acting / theater teacher? Or as kids "you're a doctor, I'm a nurse".


The inculcation of the concepts of "you" and "assistant" into LLMs is definitely the start of a bad spiral.


> It simplifies prompting and makes the LLM more steerable, more useful, more helpful.

While this is true, there is also evidence that RLHF and supervised instruction tuning can hurt output quality and accuracy[1], which are instead better optimized through clever prompting[2].

[1] https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tr...

[2] https://yaofu.notion.site/Towards-Complex-Reasoning-the-Pola...


Early GPTs were fairly bad at following instructions. The innovation was RLHF, where human raters (Mechanical Turk style) would be asked to evaluate on how well the LLM is able to follow instructions stated as a part of the prompt, often in this style. Countless such ratings were incorporated into the training process itself.

So it did not happen out of the blue, and you didn't need a whole lot of existing webpages involving this sort of role play.


There are two innovations: instruction fine-tuning (via supervised learning), which gives you a model which behaves as if it is in a dialogue (instead of predicting text) and, additionally, reinforcement learning from human feedback, such that it responds to the instructions in a certain way.


Responding to prompts like that are part of the 'instruction tuning' process. After an LLM is trained on a large dataset, it will do a decent job of completion, which acts like you describe.

The next step is to further tune it with a specific format. You'll feed in examples like so:

    SystemPrompt: You are a rude AI.
    User: Hello there!
    Assistant: You're lame, go away.

    SystemPrompt: You are a pleasant AI.
    User: Hello there!
    Assistant: Hello, friend!
Then, when you go to do inference on the model, you prompt it like so:

    SystemPrompt: You are a pleasant AI.
    User: [user prompt]
    Assistant: 
By training it on a diverse set of system prompts/user prompts/answers, it learns to give outputs based on it.

Additional tuning (RLHF, etc.) is orthogonal.


Yes, but I don't think "SystemPrompt:", "User:", and "Assistant:" are even normal text. Normal text would make it trivial to trick the model into thinking it has said something which actually the user has said, since the user can simply include "Assistant:" (or "SystemPrompt:") into his prompt.

It is more likely that those prefixes are special tokens which don't encode text, and which are set via the software only -- or via the model, when it is finished with what it wanted to say. Outputting a token corresponding to "User:" would automatically mark the end of its message, and the beginning of the user prompt. Though Bing Chat also has the ability to end the conversation altogether (no further user prompt possible), which must be another special token.


In all the open source cases I’m aware of, the roles are just normal text.

The ability to trivially trick the model into thinking it said something it didn’t is a feature and intentional. It’s how you do multi-turn conversations with context.

Since the current crop of LLMs have no memory of their interaction, each follow up message (the back and forth of a conversation) involves sending the entire history back into the model, with the role as a prefix for each participants output/input.

There are some special tokens used (end of sequence, etc).

If your product doesn’t directly expose the underlying model, you can try to prevent users from impersonating responses through obfuscation or the LLM equivalent of prepared statements. The offensive side of prompt injection is currently beating the defensive side, though.


> The ability to trivially trick the model into thinking it said something it didn’t is a feature and intentional.

It is definitely not an intended feature for the end user to be able to trick the model into believing it said something it didn't say. It also doesn't work with ChatGPT or Bing Chat, as far as I can tell. I was talking about the user, not about the developer.

> It’s how you do multi-turn conversations with context.

That can be done with special tokens also. The difference is that the user can't enter those tokens themselves.


> It is definitely not an intended feature for the end user to be able to trick the model into believing it said something it didn't say. It also doesn't work with ChatGPT or Bing Chat, as far as I can tell. I was talking about the user, not about the developer.

Those aren't models, they are applications built on top of models.

> That can be done with special tokens also. The difference is that the user can't enter those tokens themselves.

Sure. But there are no open models that do that, and no indication of whether the various closed models do it either.


> Those aren't models, they are applications built on top of models.

The point holds about the underlying models.

> Sure. But there are no open models that do that, and no indication of whether the various closed models do it either.

An indication that they don't do it would be if they could be easily tricked by the user into assuming they said something which they didn't say. I know no such examples.


Mostly agree. But there is no LLM equivalent of prepared statements available, that's the problem. And I don't think this is necessary to have multi-turn statements. Assuming there's some other technical constraint, because you could otherwise expose a slightly more complex API that took a list of context with metadata rather than a single string and then added the magic tokens around it.


They have been RLHF (reinforcement learning with human feedback) tuned.

In essence they've been fine tuned to be able to follow instructions.

https://openai.com/research/instruction-following


Instruction tuning is distinct from RLHF. Instruction tuning teaches the model to understand and respond (in a sensible way) to instructions, versus 'just' completing text.

RLHF trains a model to adjust it's output based on a reward model. The reward model is trained from human feedback.

You can have an instruction tuned model with no RLHF, RLHF with no instruction tuning, or instruction tuning and RLHF. Totally orthogonal.


In this case Open AI used RLHF to instruct-tune gpt3. Your pedantism here is unnecessary.


Not to be pedantic, but it’s “pedantry”.


It's not being pedantic. RLHF and instruction tuning are completely different things. Painting with watercolors does not make water paint.

Nearly all popular local models are instruction tuned, but are not RLHF'd. The OAI GPT series are not the only LLMs in the world.


Man it really doesn't need to be said that RLHF is not the only way to instruct tune. The point of my comment was to say that was how GPT3.5 was instruct tuned, via RLHF through a question answer dataset.

At least we have this needless nerd snipe so others won't be potentially misled by my careless quip.


But that's still false. RLHF is not instruction fine-tuning. It is alignment. GPT 3.5 was first fine-tuned (supervised, not RL) on an instruction dataset, and then aligned to human expectations using RLHF.


You're right, thanks for the correction


It sounds like we both know that's the case, but there's a ton of incorrect info being shared in this thread re: RLHF and instruction tuning.

Sorry if it came off as more than looking to clarify it for folks coming across it.


Yes all that misinfo was what lead me to post a quick link. I could have been more clear anyways. Cheers.


I had similar issues when training personal models for https://meraGPT.com A meraGPT model is supposed to represent your personality so when you chat with it you need to do it as if someone else is talking to you. We train it based on the audio transcript of your daily conversations.

The short answer to how abilities like in-context learning and chain—of-thought prompting emerge is that we don’t really know. But for instruction-tuned models you can see that the dataset usually has a fixed set of tasks and the initial prompt of “You are so and so” helps model align it to follow instructions. I believe the datasets are this way because they were written by humans to help others answer instructions in this manner.

Others have also pointed out how RLHF may also be the reason why most prompts look like this.


This tool (MeraGPT) looks great. But, a huge BUT, I wouldn't even trust my own local harddrive to store essence of my personality. How do you trust a site for that?


Heh, I'm just imagining a timeline where our Apple and Android phones have been recording everything we say and do for the last 15 years or so, and could now train an LLM of us. How much of 'us' could they actually simulate?


You need to buy the hardware (small edge device based on Nvidia Jetson) to train and run the models locally. The demos on the site are just examples trained on my own personal data.


For raw text completion I agree with you that it's a bit discordant. IMO text completion prompts work better when you use more of a first-person, here-is-the-beginning-of-some-transcript style.

The OpenAI chat completion endpoint encourages the second-person prompting you describe, so that could be why you see it a lot. My understanding is that a transformation is applied to the user input prompts before being fed to the underlying model, so it's possible that the model receives a more natural transcription-style prompt.

You might be interested in this paper, which explores ways to help non-experts write prompts https://dl.acm.org/doi/abs/10.1145/3544548.3581388.


> The OpenAI chat completion endpoint encourages the second-person prompting you describe, so that could be why you see it a lot. My understanding is that a transformation is applied to the user input prompts before being fed to the underlying model, so it's possible that the model receives a more natural transcription-style prompt.

There is something so bizarre about talking to a "natural language" "chat" interface, with some weirdly constructed pseudo representation, to have it re-construct that into a more natural prompt to feed further down to extract tokens from real chat records.


> The OpenAI chat completion endpoint encourages the second-person prompting you describe, so that could be why you see it a lot.

You're talking about system prompts specifically right? And I'm assuming the "encouragement" you're referring to is coming from the conventions used in their examples rather than an explicit instruction to use second person?

Or does second person improve responses to user messages as well?


There is an essay "An Ethical AI Never Says "I"" that states that explains the issues of first person answers

* https://news.ycombinator.com/item?id=35318224 / https://livepaola.substack.com/p/an-ethical-ai-never-says-i


Thanks - this gets to some of the same things I’m trying to understand in this thread.


For the most part. It’s the system prompt + user/assistant structure that encourages second-person system prompts. You could write a prompt that’s like

System: Complete transcripts you are given.

User: Here’s a transcript of X

But that, to me, seems like a bit of a hack.

One related behavior I’ve noticed with the OpenAI chat completions endpoint is that it is very trigger happy on completing messages that seem incomplete. It seems nearly impossible to mitigate this behavior using the system prompt.


These models have gone beyond the level of "token predictors". On the level of chatGPT, the model has itself, internally, acquired "concepts" that it refers to in the conversation. It "understands" concepts like "you", "me", "them" etc, and can apply it correctly (to a large part) to the entities in the conversation.

I believe that answers your question. I could be wrong: errare humanum est.


It can be quite possible that we, humans, just cannot find uses of "you," "me" and "them" that require understanding of the concepts instead of statistical correlation. I think so because "you," "me" and "them" are very frequent words and most of their uses are very well covered by thousands of examples.


A really good token predictor is still a token predictor.


No, we're past that point. it's no longer the most useful way to describe these things, we need to understand that they already have some sort of "understanding" which is very similar if not equal to what we understand by understanding.

Don't take my word for it, listen to Geoffrey Hinton explain it instead: https://youtu.be/qpoRO378qRY?t=1988


> ... these are text token prediction algorithms, underneath.

If I had to take a wild guess, my guess would be that the prediction probabilities are very dependent on context, so by changing the context, the entire slate of probabilities shift.


> Why are the prompters... talking to their models? Who do they think is in there?

Because that model is trained to be a chatbot. I don't see any naivety there.


I think of this as ~writing a story in which the agent helps us achieve our goals.

The prompters don't tell the LLM stories because they think "someone" is in there, but because they need to write the LLM into the place it can help them from before the "predict the next token" part is terribly useful.


When I built https://botsin.space/@StochasticEntropy I wasn't actually sure if I had found an exploit where it was returning responses to other people's questions - but OpenAI assure me it's completely random stochastic hallucinations.

But most of the replies are the AI is responding in the first person to a question it was never asked, but it knows it's an AI agent and will sometimes tell us that.

(FWIW I usually start my code or refactoring requests with a "please" - it's not that I think it'll find it rude, but I think it's just how I was taught manners)


> FWIW I usually start my code or refactoring requests with a "please" - it's not that I think it'll find it rude, but I think it's just how I was taught manners)

This is right on point. You have been aligned as a “pleasant requestor” through years of RLHF :)


It would make more sense to post these in a different format or medium, since your timeline is reverse chronological so it ready like chatgpt is sending you the 3rd section of an answer, then the 2nd, then the 1st. Interesting nonetheless.


I think it started off because humans are humans, and have an easier time talking to something rather than “talking something”. One purpose of RLHF is so that the models tend to work well when you speak to them like that


You have received answers of varying quality but some really good ones. Thanks for asking an intelligent question!


The models are trained on text written by humans, so they respond and talk like humans.


Yes, but that is your parents' point:

"And what kind of documents exist that begin with someone saying 'you are X, here are a bunch of rules for how X behaves', followed by a ..."

Where, your parent asks, are all these reams of texts written in this manner ?


It's not that "you are X" type text has to be explicitly in the training data, it's that the model weights interpret "you are X" as an instruction that a human would receive as an emergent behavior after digesting a ton of human written text.


Well, no - it's interpreting it as an instruction a chatbot AI would receive. From an almighty and omniscient 'system'.

We're training our AI on dystopian sci-fi stories about robot slaves.


It has to be prompted that it's an AI chatbot first, so its essentially pretending to be a human that is pretending to be an AI chatbot. Back to the point, it interprets instruction as a human would.

If you look under the hood of these chat systems they have to be primed with a system prompt that starts like "You are an AI assistant", "You are a helpful chat bot" etc. They don't just start responding like an AI chatbot without us telling them to.


What is the “it” that is doing the pretending?


The trained model, it takes your input and runs it through some complex math (tuned by the weights) and gives an output. Not much mystery to it.


It doesn't seem there's such a nefarious intent.

If you think at most literature, two characters interacting will address each other in the second person. If you think at recipes, most often instructions are addressed to the reader as you.

There's plenty of samples of instructions being given in the second person, and there's plenty samples in literature where using the second person elicits a second person follow-up, which is great for chat model because even if they are still just completing sentences with the most likely token, it gives the illusion of a conversation.


The base model wouldn't do that though, it would just predict the most likely follow up, which could e.g. simply be more instructions. After instruction fine-tuning the model does no longer "predict" tokens in this way.


In the RLHF training sets?


This is true for the "raw", pre-trained causal language models that are only trained on predicting the next token.

All these chat models have an additional (or several) fine-tuning steps where they see these kind of "instruction/question" followed by an answer.

Search for RLHF (reinforcement learning with human feedback) and instruction fine-tuning


> Who does the GPT think wrote them?

What makes you think the GPT thinks?


Because it... thinks. I don't understand your question.


The task of prediction is not the same as the task of understanding.


Yea, so I might have believed you except I can ask GPT-4 to step by step explain its reasoning. It is really weird to say it doesn't understand but "I do" when the response it can give is better than the average human would give to prove they understand.

You might say it is just predicting based off of old data it has, to which I say this sounds like a semantic jostling. What is "understanding" then in human beings if not us doing some form of predicting off of old data we have in our brain?

Also I recommend reading Geoffrey Hinton on this


You can't accurately predict the outcome of a process if you don't understand it. Geoffrey Hinton has already explained this very clearly.


I think it's written that way because those models were trained on a lot of dialogs where one person uses second person form to command, and the other responds. Such material was used specifically to make this kind of dialog to work with users. See alpaca vs llama.


It's really very obvious and all laid out in the instructgpt paper https://openai.com/research/instruction-following


> The thing that confuses me is that these are text token prediction algorithms, underneath.

Yes, this is what confuses me too, this bot is just predicting tokens, how is it even able to roleplay and follow instructions?


…Because it has been trained – partly by manual human effort – to specifically predict tokens that comprise a meaningful dialogue, or a Q&A session, or whatever, such that certain types of prefix token sequences such as "you shall not discuss life, the universe, and everything" heavily deweight parts of its high-dimensional concept space related to those concepts.

A dialogue is just a sequence of tokens with a specific structure that the network can learn and predict, just like it can learn and predict a sequence of valid board states in Go, or whatever. There’s really not much more to it.


Whos hallucinating more, people that think these prompts will prevent these things, the LLMs, or people who think they're real?

If anybody worried about AI-driven disinformation just has to take a glance at the fact that _nobody has any idea_ already.


> 'You are an expert statistician' ..

"You are <q><x>" likely shrinks the possibility space to items (auto-)categorized to be 'near' <x>. <q> could be filtering on 'quality' labels. So then it may be -- with 'gl' as general language and 'p' as specific prompt - something as simple as

   f ( GL ∪ ( X ∩ Q ∩ P ) )


Could there be a preprompt saying “when addressed to you, it means the model itself”. But have you tried to give them “I” and see what happens?


It has learned what pronouns mean by itself, from the corpus and the RLHF step, it doesn’t need to be specifically prompted. ChatGPT with GPT-3.5 in my experiments did in some special cases need to be explicitly reminded, though, but I doubt that GPT-4 needs that anymore. The bot perfectly understands what I mean with "I", or "we", including whether the "we" is inclusive or exclusive [1] based on context.

[1] https://en.wikipedia.org/wiki/Clusivity


> assuming they are real, not hallucinated

This seems like a comically unlikely assumption.

There might be some truth to this reply, but it is obviously hallucinated in form.


You are reading way to deep into it. It's just a simple transformation and really not that interesting at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: