Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Or even just a notepad. It's well established that long context histories with scattered information are hard for LLMs to navigate.

To distinguish whether it's using notes or context history, you could simply delete the context after each turn. The prompt could be something like "you're taking over this game from a previous player who has compiled these notes. (insert notes here). Play one turn, and update the notes with any new knowledge you have attained, relationships you have identified, inaccuracies you have confirmed, hypotheses you have, or anything else you this would be useful, so that the next player will be able use these notes to make the best next move.", and just clear the context after each move. Maybe also say there's a limit to the number of words on the notepad so that it doesn't just flood the notes with irrelevant information.

For future iterations, maybe also give it a bitmap or svg canvas, or a database, or a code interpreter, and see if it uses any of those tools at all.



I went ahead and did this, just using a maze where the only actions are "turn left". "turn right", and "go forward". The response is either OK or BLOCKED, and additionally which directions are open and which are walled in the current cell (relative directions: front, back, left, right. Not north east south west).

When having it try to retain context of where it is via chat history alone, it failed. It went back into fully-explored paths and explored them again. When giving it a scratchpad to store knowledge, it didn't do any better. At least one common failure pattern I noticed is it got confused and updated its position when issuing a "turn" command. It never created a map, no matter how strongly I recommended doing so in the prompt, but generally just stored a list of cells it went to, and what it saw in each cell.

I'd have played with it more and think I could have eventually gotten it working 99% better, but I'd already spent $12 and I'm unemployed so.

Anyway, even if I got it working really really well, it'd still just be another example of "don't use an LLM when a simple state machine would be easier and work better". I think an LLM could be useful if the response from the maze game was natural language: "You went forward a step, to find yourself a T-intersection with passages off to each side.", and have the LLM translate that to a structured response that could be fed into the state machine. But it's still not reliable enough to serve as the state machine itself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: