> I wonder what adaptations will be necessary to make AIs work better on Lisp. S...

surround · 2026-04-05T04:14:20 1775362460

I think you're right. Try asking GPT-5 this:

> Are the parentheses in ((((()))))) balanced?

There was a thread about this the other day [1]. It's the same issue as "count the r's in strawberry." Tokenization makes it hard to count characters. If you put that string into OpenAI's tokenizer, [2] this is how they are grouped:

Token 1: ((((

Token 2: ()))

Token 3: )))

Which of course isn't at all how our minds would group them together in order to keep track of them.

[1] https://news.ycombinator.com/item?id=47615876 [2] https://platform.openai.com/tokenizer

ksaj · 2026-04-05T08:21:49 1775377309

This is mostly because people wrongly assume that LLMs can count things. Just because it looks like it can, doesn't mean it is.

Try to get your favourite LLM to read the time from a clock face. It'll fail ridiculously most of the time, and come up with all kinds of wonky reasons for the failures.

It can code things that it's seen the logic for before. That's not the same as counting. That's outputing what it's previously seen as proper code (and even then it often fails. Probably 'cos there's a lot of crap code out there)

otterley · 2026-04-05T05:11:33 1775365893

Don’t ask the LLM to do that directly: ask it to write a program to answer the question, then have it run the program. It works much better that way.

surround · 2026-04-05T05:35:03 1775367303

But for lisp, a more complex solution is needed. It's easy for a human lisp programmer to keep track of which closing parentheses corresponds to which opening parentheses because the editor highlights parentheses pairs as they are typed. How can we give an LLM that kind of feedback as it generates code?

otterley · 2026-04-05T17:41:54 1775410914

That's a different question than the one you asked. Are you saying LLMs are generating invalid LISP due to paren mismatching?

surround · 2026-04-06T20:17:32 1775506652

That's what the comment I was originally replying to was saying.

xigoi · 2026-04-05T06:20:10 1775370010

If the LLM is intelligent, why can’t it figure out on its own that it needs to write a program?

sph · 2026-04-05T09:56:34 1775382994

The answer is self-evident.

frwrfwrfeefwf · 2026-04-05T05:06:29 1775365589

does the ai performance drop if it uses letters for tokens rather than tokens for tokens?

surround · 2026-04-05T05:46:45 1775368005

Try asking an LLM a question like "H o w T o P r o g r a m I n R u s t ?" - each letter, separated by spaces, will be its own token, and the model will understand just fine. The issue is that computational cost scales quadratically with the number of tokens, so processing "h e l l o" is much more expensive than "hello". "hello" has meaning, "h" has no meaning by itself. The model has to waste a lot of computation forming words from the letters.

Our brains also process text entire words at a time, not letter-by-letter. The difference is that our brains are much more flexible than a tokenizer, and we can easily switch to letter-by-letter reading when needed, such as when we encounter an unfamiliar word.

mark_l_watson · 2026-04-05T09:47:51 1775382471

I am lazy: when an LLM messes up parenthesis when working with any Lisp language I just quickly fix the mismatch myself rather than try to fix the tooling.

tasty_freeze · 2026-04-05T03:51:21 1775361081

Sometimes LLMs astonish me with what the code they can write. Other times I have to laugh or cry.

As an example, I asked claude 3.5 back when that was the latest to indent all the code in my file by four more spaces. The file was about 700 lines long. I got a busy spinner for two minutes then it said, "OK, first 50 lines done, now I'll do the rest" and got another busy spinner and it said, "this is taking too long. I'm going to write a program to do it", which of course it had no problem doing. The point is that it is superhuman at some things and completely brain-dead about others, and counting parens is one of those things I wouldn't expect it to be good at.

nextos · 2026-04-05T03:57:30 1775361450

I think LLMs are great at compression and information retrieval, but poor at reasoning. They seem to work well with popular languages like Python because they have been trained with a massive amount of real code. As demonstrated by several publications, on niche languages their performance is quite variable.

smackeyacky · 2026-04-05T05:31:47 1775367107

I used to find it better to shortcut the AI by asking it to write python to do a task. Claude 4.6 seems to do this without prompting.

Edit: working on a lot of legacy code that needs boring refactoring, which Claude is great at.

lagniappe · 2026-04-05T04:15:33 1775362533

That's you at the time not knowing LLM fundamentals with regards to context management.

tasty_freeze · 2026-04-05T13:27:45 1775395665

That was me at the time kicking the tires to understand what it was good at or not. If I actually wanted to indent a file by four spaces it would take me less time in my editor than to prompt the LLM to do it, even if the LLM had been capable of it.

whartung · 2026-04-05T04:50:48 1775364648

I had that issue with the AI doing some CL dabbling.

Things, on the whole, were fine, save for the occasional, rogue (or not) parentheses.

The AI would just go off the rails trying to solve the problem. I told it that if it ever encountered the problem to let me know and not try to fix it, I’d do it.