Hey, I also read that book, and came to basically the opposite conclusion!
The point of the book is that we've been very bad at testing animal intelligence because of a vast stack of human biases, including things like language and the geometry of our hands.
Animals with different geometries and no language are still intelligent, but we need to test them in ways which recognize their capabilities. Intelligence is general: it's adaptivity within one's set of constraints.
De waal also points out that there was massive shifting of the definition of language and intelligence as we became more aware of what animals are capable of.
From this angle, I would say that LLMs are intelligent: they do adapt to their inputs extremely readily, though they have a particular set of constraints (no physical body (usually), for starters). They are, like chimpanzees, smarter and more capable than humans in some ways, and much dumber in others.
Finally, the 'statistical learners can't be intelligent' line of argument is extremely short-sighted. Our brains are bags of electrified meat. Evolution somehow figured out a way to make meat think. No individual neutron is intelligent, yet the collection of cells is. We learn by processing experiences with hormonal signals because those hormonal signals are what the meat is capable of working with. LLMs, by contrast, learn by processing examples with backprop. If anything, the intelligence of meat is more surprising.
Political power is the bottleneck for most shit that matters, not computational power.
Most of the stuff that sucks on the us sucks because of entrenched institutions with perverse interests (health insurers, tax filing companies) and congressional paralysis, not computational bottlenecks. Raw intelligence is thus limited in what it can achieve.
It's the next step removed from the tablet based ordering that has taken over in restaurants. Like those tablets, it won't be everywhere, but its easy to imagine it being ubiquitous, especially in chain stores.
Block auto regressive generation can give you big speedups.
Consider that outputting two tokens at a time will be a (2-epsilon)x speedup over running one token at a time. As your block size increases, you quickly get to fast enough that it doesn't matter sooooo much whether you're doing blocks or actual all-at-once generation. What matters, then, is there quality trade-off for moving to block-mode output. And here it sounds like they've minimized that trade-off.
can it go back and use future blocks as context? Thats what i'm most interested in here - fixing line 2 because of a change/discovery we made in the process of writing line 122. I think that problem is a big part of the narrowsightedness of current coding models
Exactly. The current (streaming) way means that once it makes a decision, it's stuck with it. For example, variable naming: once it names it something, it's stuck using that name in the future. Where as a human would just go back and change the name.
Maybe "thinking" will fix this aspect, but I see it as a serious shortcoming.
The special thing is that it’s decentralized. I know this discussion will not resolve and I’m not a blockchain zealot. I do think it’s an elegant decentralized storage system for algorithmic art where you make outputs definitive and collectible after initiating a run.
I think that's more than fair - "I like blockchain for decentralized proof of ownership more than other methods for the same." is as fine a preference as any other, of course.
I'm kind of excited about that though. What I've come to realize is that automated testing and linting and good review tools are more important than ever, so we'll probably see some good developments in these areas. This helps both humans and AIs so it's a win win. I hope.
> it's looking like assessment and evaluation are massive bottlenecks.
So I think LLMs have moved the effort that used to be spent on fun part (coding) into the boring part (assessment and evaluation) that is also now a lot bigger..
You could build (code, if you really want) tools to ease the review. Of course we already have many tools to do this, but with LLMs you can use their stochastic behavior to discover unexpected problems (something a deterministic solution never can). The author also talks about this when talking about the security review (something I rarely did in the past, but also do now and it has really improved the security posture of my systems).
You can also setup way more elaborate verification systems. Don't just do a static analyis of the code, but actually deploy it and let the LLM hammer at it with all kinds of creative paths. Then let it debug why it's broken. It's relentless at debugging - I've found issues in external tools I normally would've let go (maybe created an issue for), that I can now debug and even propose a fix for, without much effort from my side.
So yeah, I agree that the boring part has become the more important part right now (speccing well and letting it build what you want is pretty much solved), but let's then automate that. Because if anything, that's what I love about this job: I get to automate work, so that my users (often myself) can be lazy and focus on stuff that's more valuable/enjoyable/satisfying.
When writing banal code, you can just ask it to write unit tests for certain conditions and it'll do a pretty good job. The cutting edge tools will correctly automatically run and iterate on the unit tests when they dont pass. You can even ask the agent to setup TDD.
Cars removed the fun part (raising and riding horses) and automatic transmissions removed the fun part (manual shifting), but for most people it's just a way to get from point A to B.
It's far more sane to review a complete PR than to verify every small change. They are like dicey new interns - do you want to look over their shoulder all day, or review their code after they've had time to do some meaningful quantum of work?
> It's far more sane to review a complete PR than to verify every small change.
Especially when the harness loop works if you let it work. First pass might have syntax issues. The loop will catch it, edit the file, and the next thing pops up. Linter issues. Runtime issues. And so on. Approving every small edit and reading it might lead to frustrations that aren't there if you just look at the final product (that's what you care about, anyway).
The point of the book is that we've been very bad at testing animal intelligence because of a vast stack of human biases, including things like language and the geometry of our hands.
Animals with different geometries and no language are still intelligent, but we need to test them in ways which recognize their capabilities. Intelligence is general: it's adaptivity within one's set of constraints.
De waal also points out that there was massive shifting of the definition of language and intelligence as we became more aware of what animals are capable of.
From this angle, I would say that LLMs are intelligent: they do adapt to their inputs extremely readily, though they have a particular set of constraints (no physical body (usually), for starters). They are, like chimpanzees, smarter and more capable than humans in some ways, and much dumber in others.
Finally, the 'statistical learners can't be intelligent' line of argument is extremely short-sighted. Our brains are bags of electrified meat. Evolution somehow figured out a way to make meat think. No individual neutron is intelligent, yet the collection of cells is. We learn by processing experiences with hormonal signals because those hormonal signals are what the meat is capable of working with. LLMs, by contrast, learn by processing examples with backprop. If anything, the intelligence of meat is more surprising.
reply