I kind of feel like poking at the failings if ChatGPT misses the point a bit.
Yes it's certainly not an AGI or even super close but to even converse with humans at this level is mind boggling. 10 years before stable diffusion AI could just about label pictures, now it can do, well, stable diffusion.
The pace or progress is insane.
Like this, I feel we might engage in a naysaying dialogue with consecutive generations of GPT-like models, but finding increasingly minor nitpicks. "Ah but does it understand diminutives"? "It's handling of sarcasm isn't up to scratch". "I tried 10 languages to converse in and Esperanto was quite weak".
And then one day we might wake up to a world where we can't really nitpick anymore.
Yes, the AI effect is real. As soon as computers can do a thing it’s no longer “AI”.
But I don’t think this is a nitpick at all. GPT models hallucinate information. They are right surprisingly often, but they’re also wrong quite often too. And the problem is they are just as confident in either case.
This is a fundamental, irreconcilable issue with statistical language models. They have no grounding in auditable facts. They can memorize and generate in very plausible ways but they don’t seem to have a concrete model of the world.
Ask ChatGPT to play chess. It can generate a text based board and prompt you for moves, but it can’t reliably update its board correctly or even find legal moves. Note that I don’t expect it to play good moves, but the fact that it can’t even play legal moves should tell us something about its internal state.
Now that GPT3 has trained on the whole internet, we may have reached a practical limit to how far you can get by simply training on more data with 1 or 2 orders of magnitude more parameters. There’s only so far you can get by memorizing the textbook.
At a more practical level, for most professions “pretty good” isn’t good enough. It’s not good enough to have code that’s right 90% of the time but broken (or worse, has subtle bugs) the rest of the time.
> How many people are there in the world that can do all or even some of the above at a decent level of expertise?
If you tell them what to do, then correct them about all the things they are wrong about, then a lot of people can do all of those as long as they have access to Google.
And then once those people have done that a while they will be able to continue doing those things without your feedback. But ChatGPT can't. This makes it fundamentally different from any human.
If I am understanding correctly, your main point of differentiation is that the language model doesn't learn from its conversations.
Compared to the initial training of the model, this is a trivial amount of engineering effort and is likely something we will see within a year or less.
I disagree and I have worked on Google search ranking, making models that learn is ridiculously hard. This model is impressive, but it still hasn't solved this part, and until they do solve it the blocker isn't engineering effort but research effort with unknown timeframes.
When researches says a model "learns" all they mean is that they put the new data into the model, but the model is still as stupid as before so it doesn't really solve the real kind of learning humans do and the model would need to be able to do in order to be useful here.
After a few days playing with this and using it for real work in some cases (having it bang out some PowerShell based on a description and follow-up modifications), I'm not sure that "the real kind of learning humans do" is even a necessary goal anymore.
Here is a language model that doesn't "know" anything, it doesn't "understand" anything, it has no idea what an AST is or what the code it is producing does… But does it really matter? If that prompt "generate a PowerShell script that does X Y and Z" results in accurate code that meets the stated requirement, how it got there is an implementation detail.
Give me what exists today, give it an ongoing knowledge of the things I am conversing with it on, take off the stupid guardrails and this is something I would gladly pay a significant amount of money every month for.
From my rather limited understanding "learning from the conversation" is already an existing feature that is simply limited to a "thread" session for users with the current interface. I guess feeding those back to the model is ultimately the goal of the current beta test though, the marketing material hints at it at least.
That’s the rub, though. The bar for most tasks isn’t “decent” level of expertise. We want genuine expertise. It doesn’t matter if your Rust developer Jerry also knows how to write Italian operas about SpongeBob. He needs to write code that is big-free or be able to address bugs as they come up. As long as SOTA models are only “decent” Jerry keeps his job.
If it sounds like I’m moving the goalposts, I’m not. I acknowledge that this is impressive in the abstract. It’s fun to play around with. But I’m also predicting that we’re at a local maxima: there are diminishing returns to the architectures we’ve developed so far. Throwing more data and compute won’t solve the problems we have.
> Ask ChatGPT to play chess. It can generate a text based board and prompt you for moves, but it can’t reliably update its board correctly or even find legal moves. Note that I don’t expect it to play good moves, but the fact that it can’t even play legal moves should tell us something about its internal state.
Incidentally, I tried handing it a few partial games in algebraic notation and asking it to suggest the next move, and it generally suggested legal moves, though with tactical explanations that ranged from plausible to nonsensical. It refused to actually play chess with me though and I guess I just didn't have the right prompt.
>There’s only so far you can get by memorizing the textbook. //
If a person does that they know they're memorising a text book, it gets different wait to pyramid marketing schemes, no less sincere in some cases, monologue about how a crystal can cure all your ailments.
Does ChatGPT know to apply [fallacious!] authority to sources, chess.com is a better source than r/anarchychess, but still a game between two novices on chess.com wouldn't be a good training guide, et cetera.
A lot of web content is subtly wrong, that's always the challenge when searching ...
Now, 90% sounds pretty good compared to humans ... ?! (Not sure if I'm being sarcastic there or not!)
No doubt the pace of progress has been remarkable.
But I feel like arguments that cite only this progress make the tacit assumption that there's a single intelligence level that's progressing. That is, because large language models are getting better, they must be getting better in all imaginable skills and ability. Because their strengths are getting stronger, automatically they will overcome their weaknesses.
As a counterpoint, I'd mention the failure (so-far) of self-driving cars. These constructs were impressive ten years and in various measures I'm sure have only gotten more impressive yet they still don't have a level of reliability that would allow them on the road. And in my playing ChatGTP, it is certainly quite impressive yet also puts out some nonsense with nearly every paragraph in answers to questions, including things in no way "trick questions" (Edit: one could argue that the nitpicks do mask this problem, since one doesn't need trick problems to see it).
Mind-you, I'm not saying these systems can't overcome their weaknesses, I'm saying that linear progress by itself doesn't imply they'll overcome their weaknesses.
The self driving car is a great example, you're right, it was so good and yet never lived up to the hype.
Perhaps one difference is that a human could potentially get extremely good at textual tasks with nothing but text to learn from. You can read how to solve cryptic crosswords, see examples and extrapolate. In that sense language models have a somewhat complete training dataset. Yes this requires an understanding of the material, rather than just parroting, but the signal is there if you can separate it from noise.
Driving a car requires an understanding of a much wider context which is perhaps hard to acquire with just driving data. Understanding of rain, birds on the road, shaky drivers, balls rolling out from between cars, lane restrictions... You can't just throw petabytes of data at the problem. Training data is limited and expensive, and I believe we are mostly tackling AI-assired driving with rule-based approaches.
I believe self driving works just fine in simulations where data is effectively unlimited. But then it doesn't generalise to the real world where context matters.
Well, I would say language isn't a single task but a system, process or tool that's flexible enough to aid in many tasks at many different levels. It be used to signal social status and education or to guide someone through fixing a flat tired and often its used on these multiple level simultaneously.
One can be succeed at one level of using language without succeed at another level. But we humans expect another human to succeed or to fail fairly uniformly - or we call them "a bullshit artist". These expectations may not be met with large language models.
In that sense language models have a somewhat complete training dataset. Yes this requires an understanding of the material, rather than just parroting, but the signal is there if you can separate it from noise.
--> I'd agree that there's conceivably more that can be done with a language dataset. But the training process of transformers-based models isn't really oriented to engaging in the process you describe. It treated language as data and it's fundamentally a very sophisticated database that only appears to engage in such logic. As it will tell you.
I guess I'd say that a sufficiently (!) accurate language model is indistinguishable from a AGI limited to text. The question is, is that level of accuracy achievable. I'm not claiming, btw, we are close to that.
By contrast it feels like driving requires an understanding of effects that are really hard to distill from pure driving data. Not just analytically hard, but requiring an understanding of external context.
I think the problem of self-driving cars is that the driving problem isn't entirely sealed and success requires a long string of successes (most of which seem trivial). I think the situation with language is that it too isn't entirely sealed from other parts of reality but here each immediate success is judged a victory. A lot of self-driving process are straightforward, "just" adaptive control. Minimally self-driving vehicles have existed for a long time. Corner cases are the problem - distinguish a newspaper floating into an area from a load of bricks falling off a truck in front of you, etc.
For language, a problem to consider for a language using system is "talking a person through" a task. The thing about that is it involves two entities sharing a common model of reality and each updating their model as they listen to the other. And here I think corner cases of reality are basically as likely to show up.
If there were enough cars fitted with sensors that can learn from their driver's actions and reactions I am pretty sure an AI system can learn from this huge pool and be a good driver. A good driver also makes mistakes. And this is still nothing related to intelligence. Our networks are as of now what one calls universal approximates. Actually replying to so many different queries whether it is medicine, computer science, history , it almost seems like a better interface to the net than google
> Driving a car requires an understanding of a much wider context …
Yeah, it struck me that perhaps a better approach to self-driving would be the road infrastructure and all the vehicles using it cooperating to build a local activity map - that way each individual vehicle doesn’t have to detect, classify and route around every object in real time, most of the work will have been done by previous vehicles and / or stationary traffic cameras.
But self driving cars are on the roads, and they are expensive hardware that can kill lots of people. Mistakes need to be exceptionally low so it's a lot of order of magnitude improvements but minor visible changes before it hits major thresholds.
I'd also argue that LLMs and image ai has grown far beyond linearly over a fairly short time horizon.
Is amount of compute an the issue preventing self driving cars from being viable? I don't think so. Putting supercomputer in the car won't bring self driving cars.
Compute limits what you can run realtime. Not saying self-driving cars would necessarily be here today with more compute, but comparing with language models, self-driving car computers could run only models like GPT-2. There's quite a big difference from that to ChatGPT. Is self-driving an easier problem that requires less than language models? Perhaps. language models somehow try to compress all the knowledge on the internet (not entirely successfuly yet), so maybe that somehow needs much more than self-driving, a single human can drive a car after all.
Sorry what? With all this excitement, hype and overconfidence because we made notable progress, it is extremely important we highlight the shortcomings of "AI" by finding striking, easy to grasp examples.
When co-pilot and now chatgpt showed up and managed to produce working code snippets for simple text prompts, every manager on this planet with a background in economics probably started having wet dreams about replacing every programmer in their company by AI and getting a golden name plate for their desk with all the money saved. Explaining how there is a risk that the generated code might contain logic bugs, memory safety bugs, is way too abstract for these kinds of people and then tempting to ignore, so you need to demonstrate fail modes in an accessible way.
People are nitpicking as a response from a vocal group that loves to spew the doom of every knowledge worker out there. As if this will replace programmers, doctors, writers, copywriters, etc. Looking for reasons why it would not its only natural, For that purpose, I think it still has fundamental flaws that are not solvable as easily as some seem to believe.
People are also impressed, given how much is being used. Being impressed, I prefer now to know and explore its boundaries. Is this really going to a place where it will replace those workers or are those limitations a fundamental barrier to what it can do based on the method in which it works?
Its strengths on my tests so far:
* Summary of content for specific questions
* Language learning reference and translation
* Rephrasing and correction of grammar in text (paragraphs at most)
Its weaknesses:
* Trust of results in complex responses. (clear wrong answers)
* Give references.
* Ambiguous questions and clarifications. (nitpick, I think its fine as it is)
* New ideas or anything thats not been documented and done before or instructed in prompt (duh)
This last weakness is the crux of what annoys people so much, its a predictive language model, not AGI. I don't think it's anywhere near close to replacing any worker, supporters (I am one) should focus on what this really can do, which is to increase productivity and being an incredible tool.
ps. I asked it to rewrite this response and it tends to prefer the passive voice as if its writing and article. After a few tries it didn't give me a good result I could just replace what I wrote here. It doesn't really understand what I wrote, it just rephrases in its preferred form (article type constructions). Its still super helpful to "unblock" a hard to write paragraph for me, a non native speaker of english.
> If a tool enables a team of 4 do the same things that previously was done by a team of 5, the tool replaced a worker
Only if you believe that demand for work to be done is fixed as the cost of doing it goes down, a belief that was last reasonable to hold about three centuries ago...
I like how for the three examples that you gave, I don't know what diminutives are, quite often I miss sarcasm and fail to handle it and I can only speak two languages fluently and three languages very weakly.
I'd say that, while for the HN crowd this isn't an AGI, for the majority of population it not only is Artificial General Intelligence, for many it's much smarter then them. The only real give away is poor handling of unwanted or misunderstood queries.
Not to mention the correctness of grammar, while for my language (Slovenian) it still fails at some obvious points, the sentences and structure is already much better then most message that I've received from highschool and university students.
I once asked an earlier version of GPT a question that it was never asked before, and it will never be asked again, and it gave multiple imaginative and plausible answers to it. It's not a bullshit machine.
Yes it's certainly not an AGI or even super close but to even converse with humans at this level is mind boggling. 10 years before stable diffusion AI could just about label pictures, now it can do, well, stable diffusion.
The pace or progress is insane.
Like this, I feel we might engage in a naysaying dialogue with consecutive generations of GPT-like models, but finding increasingly minor nitpicks. "Ah but does it understand diminutives"? "It's handling of sarcasm isn't up to scratch". "I tried 10 languages to converse in and Esperanto was quite weak".
And then one day we might wake up to a world where we can't really nitpick anymore.