They'll qualify their answers in English but as the article mentions, if your prompt asks for a confidence score, that "uncertainty" doesn't translate into low numerical confidence.
Quantifying their own confidence is also something they're not good at, and which the format would prevent them from refusing to do or preceding with a caveat if that's what you'd want of them. Particularly since the response format seems backwards - giving confidence, then carbs estimate, then observations/notes, rather than being able to base carbs estimate off of observations/notes and then confidence estimate off of both of those.
> They'll qualify their answers in English but [...]
That the default user-facing chat as a normal user would use it gives a warning is the key part IMO. I don't think expectations of there being no "wrong way" to use the model can necessarily extend to API usage with long custom system prompt and restricted output format.
If my compiler "went down" I could still think through the problem I was trying to solve, maybe even work out the code on paper. I could reach a point where I would be fairly confident that I had the problem solved, even though I lacked the ability to actually implement the solution.
If my LLM goes down, I have nothing. I guess I could imagine prompts that might get it to do what I want, but there's no guarantee that those would work once it's available again. No amount of thought on my part will get me any closer to the solution, if I'm relying on the LLM as my "compiler".
What stops you from thinking through the problem if an LLM goes down, as you still have its previously produced code in front of you? It's worse if a compiler goes down because you can't even build the program to begin with.
In my opinion, this sort of learned helplessness is harmful for engineers as a whole.
Yeah I actually find writing the prompt itself to be such a useful mechanism of thinking through problems that I will not-infrequently find myself a couple of paragraphs in and decide to just delete everything I've written and take a new tack. Only when you're truly outsourcing your thinking to the AI will you run into the situation that the LLM being down means you can't actually work at all.
An interesting element here, I think, is that writing has always been a good way to force you to organize and confront your thoughts. I've liked working on writing-heavy projects, but often in fast-moving environments writing things out before coding becomes easy to skip over, but working with LLMs has sort of inverted that. You have to write to produce code with AI (usually, at least), and the more clarity of thought you put into the writing the better the outcomes (usually).
Why couldn’t you actually write out the documents and think through the problem? I think my interaction is inverted from yours. I have way more thinking and writing I can do to prep an agent than I can a compiler and it’s more valuable for the final output.
I think if you're vibe coding to the extent that you don't even know the shapes of data your system works with (e.g. the schema if you use a database) you might be outsourcing a bit too much of your thinking.
This. When compilers came along, I believe a bunch of junior engineers just gave up utterly on understanding the shape of how the code was generated in assembly which was a mistake given early compilers weren't as effective as they are today. Today vibe-coders are using these early AI tooling and giving up on understanding the shape, and similarly struggling.
> An interesting side effect might be that only people locked out from using LLMs will learn how to program in the future, as vide coding doesn't teach you the fundamentals.
While thinking about/working with LLMs, I've been reminded more than once of Asimov's short story Profession (http://employees.oneonta.edu/blechmjb/JBpages/m360/Professio...). In it, no one goes to school: information is just dumped into your brain. You get an initial dump of the basics when you're a kid, and then later all the specialty information for your career (which is chosen for you, based on what your brain layout is most suited to).
The protagonist is one of a number of people who can't get the second dump; his brain just isn't wired right, so he's sent to a Home for the Feeble Minded to be with other people who have to learn the old-fashioned way.
Through various adventures he eventually realizes that everyone who was "taped" is incapable of learning new material at all. His Home for the Feeble Minded is in fact an Institute of Higher Studies, one of only a handful, which are responsible for all the invention and creation that sustains human progress.
> On a phone keyboard, sure, it's as hard as an accent sign (á, for example), difficult but not twrrible. But on a keyboard? Yeah, no one is typing in Alt combos when literally any other construction will do.
For me, --- gets converted to an em-dash (—) while typing, if I have my input method (ELatin) enabled. I'm so used to typing in while working in LaTeX I can easily slip it in elsewhere.
Correct; the ability of a model to reproduce source material verbatim does not necessarily make the model's existence illegal. However, using a model to do just that might very well present a legal liability for the user. I would be interested to see the extent to which models can "recite from memory" source code, e.g., from the various MS code leaks. Put another way, if I'm using LLM code generation extensively, do I need to run a filter on its output to ensure that I don't "accidentally" copy large chunks of the Windows codebase?
I wonder why that would be? Presumably if the batteries are low then the pressure the machine "thinks" it's inflated the cuffs to is higher than the actual pressure...
Dance along with the characters of the new Series, now streaming on $sponsor, and achieve a score of at least 6/10 to get another door unlock your door.
---
Your dance was not good enough, try again or buy a door unlock with the flash discount code "Distopia" for 99ct.
I miss TkDesk, which I discovered many years ago when I was first trying Linux, partly because it supports unlimited splits, not just two. In fact, if I'm remembering correctly, when navigating to a subdirectory the default was just to open it in a new split. You ended up with splits containing the full path from wherever you started to your eventual subdirectory (you could scroll the view of splits horizontally once there got to be too many).
reply