Every attempt to formally define "general intelligence" for humans has been a shitshow. IQ tests were literally designed to justify excluding immigrants and sterilizing the "feeble-minded." Modern psychometrics can't agree on whether intelligence is one thing (g factor) or many things, whether it's measurable across cultures, or whether the tests measure aptitude or just familiarity with test-taking and middle-class cultural norms.
Now we're trying to define AGI - artificial general intelligence - when we can't even define the G, much less the I. Is it "general" because it works across domains? Okay, how many domains? Is it "general" because it can learn new tasks? How quickly? With how much training data?
The goalposts have already moved a dozen times. GPT-2 couldn't do X, so X was clearly a requirement for AGI. Now models can do X, so actually X was never that important, real AGI needs Y. It's a vibes-based marketing term - like "artificial intelligence" was (per John McCarthy himself) - not a coherent technical definition.
I think you are overthinking this. The ARC benchmark for fluid abstracting reasoning was made in 2019 and it still hasn't been 'solved'. So the goalposts aren't moving as much as you think they are.
LLMs or neural nets have never been good with out of distribution tasks.
You got to look at how it scales. LLMs have already stopped increasing in parameter count as they don't get better by scaling them up anymore. New ideas are needed.
Turing Test was a thought experiment not a real benchmark for intelligence. If you read the paper the idea originated from it is largely philosophical.
As for abstract reasoning, if you look at ARC-2 it is barely capable though at least some progress has been made with the ARC-1 benchmark.
I wasn't claiming the Turing Test was a benchmark for intelligence but the ability to fool a human into thinking a machine is intelligent in conversation is still a significant milestone. I should have said "some abstract reasoning". ARC-2 looks promising.
>I wasn't claiming the Turing Test was a benchmark for intelligence but the ability to fool a human into thinking a machine is intelligent in conversation is still a significant milestone.
The Turing Test is whether it can fool a human into thinking it is talking to another human not an intelligent machine. And ironically this is becoming less true over time as people become more used to spotting the tendencies LLMs have with writing such as its frequent use of dashes or "it's not just X it is Y" type of statements.
This is not how LLMs work. You aren't 'unlocking' the "Truth" as it doesn't know what the "Truth" is. It is just pattern matching to words that match the style you are looking for. It may be more accurate for you in some cases but this is not a "Truth" instruction set as there is no such thing.
addendum: The ground truth for an LLM is the training dataset. Whereas the ground truth for a human is their own experience/qualia with actions in the world. You may argue that only a few of us are willing to engage with the world - and we take most things as told just like the LLMs. Fair enough. But we still have the option to engage with the world, and the LLMs dont.
The LLMs we get to use have been prompt engineered and post-trained so much that I doubt the training data is their main influence anymore. If it was you couldn’t change their entire behaviour by adding a few sentences to the personalisation section.
I'm just an ignorant bystander, but is the training dataset the ground truth?
Kind of feels like calling the fruit you put into the blender the ground truth, but the meaning of the apple is kinda lost in the soup.
Now i'm not a hater by any means. I am just not sure this is the correct way to define the structured "meaning" (for lack of a better word) that we see come out of LLM complexity. It is, i thought, a very lossy operation and so the structure of the inputs may or (more likely) may not provide a like-structured output.
I see. You are correct. Wrong is merely feedback on something outside of us, not a value judgment of us as people. But education and other systems need us to believe the latter at some low level so they can retain authority.
As a web developer who's first paid web site was in 1998 when I was 10-years-old, my favorite thing to do in my spare time is build web frameworks that I will never use.
- I've done CSS frameworks that replicate most of bootstrap that I use.
- I've made client-side reactive web-components (kind of) that almost replaced the parts of react that I like.
- I've built bespoke HTTP servers countless times since the VB6 days.
- And I've written my own MVC engines probably a half dozen times, just to learn a new language or library.
All of that to say, it isn't web devs who are threatened, it is developers who don't want to learn the underlying technologies that power the libraries and frameworks they use.
I actually see no fault in being that way. I've know tons of decent-to-good developers that have no desire to understand HTTP or Vanilla JavaScript, and they still do great work tying systems together. It's all about the kind of learner you are. Do you want depth, breadth, or a mixture of both (but always lacking in both - aka me).
Increasingly bloated and complicated frameworks with intangible benefits used for webpages that are now just training data for LLMs is much more important.
Very fair! I only came back to edit it because right after leaving that comment I went to see if Best Buy had something I needed locally, clicked into search, typed, hit enter, and it fucking broke. Seemingly entirely, even the search button didn't work, so cmd+a, cmd+c, cmd+r, click in again, paste, enter, and that worked.
I just fucking loathe how common this experience is now. Amazon seems to be the only one that doesn't do it, but I've experienced this exact issue on Best Buy, Target, Etsy, Mercari, ebay, and it just DRIVES ME UP THE WALL.