For reference, the details about how the LLMs are queried:
"How the players work
All players use the same system prompt
Each time it's their turn, or after a hand ends (to write a note), we query the LLM
At each decision point, the LLM sees:
General hand info — player positions, stacks, hero's cards
Player stats across the tournament (VPIP, PFR, 3bet, etc.)
Notes hero has written about other players in past hands
From the LLM, we expect:
Reasoning about the decision
The action to take (executed in the poker engine)
A reasoning summary for the live viewer interface
Models have a maximum token limit for reasoning
If there's a problem with the response (timeout, invalid output), the fallback action is fold"
The fact the models are given stats about the other models is rather disappointing to me, makes it less interesting. Would be curious how this would go if the models had to only use notes/context would be more interesting. Maybe it's a way to save on costs, this could get expensive...
"How the players work
The fact the models are given stats about the other models is rather disappointing to me, makes it less interesting. Would be curious how this would go if the models had to only use notes/context would be more interesting. Maybe it's a way to save on costs, this could get expensive...