Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems you're already aware LLMs receive tokens not words.

Does a blind man not understand quantity because you asked him how many apples are in front of him and he failed ?



I do, but I think it shows it's limitations.

I don't think that test determines his understanding of quantity at all, he has other senses like touch to determine the correct answer. He doesn't make up a number and then give justification.

GPT was presented with everything it needed to answer the question.


Nobody said GPT was perfect. Everything has limitations.

>he has other senses like touch to determine the correct answer

And? In my hypothetical, you're not allowing him to use touch.

>I don't think that test determines his understanding of quantity at all

Obviously

>GPT was presented with everything it needed to answer the question.

No, it was not.


How was it not? It's a text interface. It was given text.

The deaf example now is like asking GPT "What am I pointing at?"


Please try to actually understand what og_kalu is saying instead of being obtuse about something any grade-schooler intuitively grasps.

Imagine a legally blind person, they can barely see anything; just general shapes flowing into one another. In front of them is a table onto which you place a number of objects. The objects are close together and small enough such that they merge into one blurred shape for our test person.

Now when you ask the person how many objects are on the table, they won't be able to tell you! But why would that be? After all, all the information is available to them! The photons emitted from the objects hit the retina of the person, the person has a visual interface and they were given all the visual information they need!

Information lies within differentiation, and if the granularity you require is higher than the granularity of your interface, then it won't matter whether or not the information is technically present; you won't be able to access it.


I think we agree. ChatGPT can't count, as the granularity that requires is higher than the granularity ChatGPT provides.

Also the blind person wouldn't confidently answer. A simple "the objects blur together" would be a good answer. I had ChatGPT telling me 5 different answers back to back above.


No, think about it. The granularity of the interface (the tokenizer) is the problem, the actual model could count just fine.

If the legally blind person never had had good vision or corrective instruments, had never been told that their vision is compromised and had no other avenue (like touch) to disambiguate and learn, then they would tell you the same thing ChatGPT told you. "The objects blur together" implies that there is already an understanding of the objects being separate present.

You can even see this in yourself. If you did not get an education in physics and were asked to describe of how many things a steel cube is made up, you wouldn't answer that you can't tell. You would just say one, because you don't even know that atoms are a thing.


I agree, but I don't think that changes anything, right?

ChatGPT can't count, the problem is the tokenizer.

I do find it funny we're trying to chat with an AI that is "equivalent to a legally blind person with no correction"

> You would just say one, because you don't even know that atoms are a thing.

My point also. I wouldnt start guessing "10" and then "11" and then "12" when asked to double check only to capitulate when told the correct answer.


You consistently refuse to take the necessary reasoning steps yourself. If your next reply also requires me to lead you every single millimeter to the conclusion you should have reached on your own, then I won't reply again.

First of all, it obviously changes everything. A shortsighted person requires prescription glasses, someone that is fundamentally unable to count is incurable from our perspective. LLMs could do all of these things if we either solve tokenization or simply adapt the tokenizer to relevant tasks. This is already being done for program code, it's just that aside from gotcha arguments, nobody really cares about letter counting that much.

Secondly, the analogy was meant to convey that the intelligence of a system is not at all related to the problems at its interface. No one would say that legally blind people are less insightful or intelligent, they just require you to transform input into representations accounting for their interface problems.

Thirdly, as I thought was obvious, the tokenizer is not a uniform blur. For example, a word like "count" could be tokenized as "c|ount" or " coun|t" (note the space) or ". count" depending on the surrounding context. Each of these versions will have tokens of different lengths, and associated different letter counts. If you've been told that the cube had 10, 11 or 12 trillion constituent parts by various people depending on the random circumstances you've talked to them in, then you would absolutely start guessing through the common answers you've been given.


I do agree I've been obtuse, apologies. I think I was just being too literal or something, as I do agree with you.


Apologies from me as well. I've been unnecessarily aggressive in my comments. Seeing very uninformed but smug takes on AI here over the last year has made me very wary of interactions like this, but you've been very calm in your replies and I should have been so as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: