>> but I recall there was some paper that showed you can get something like 98% of equilibrium utility in poker subgames, which could make deterministic strategy practical. (Can't find the paper now.)
Yeah I can see that for sure. That's also a holy grail of a poker enthusiast "can we please have non-mixed solution that is close enough". The problem is that 2% or even 1% equilibrium utility is huge. Professional players are often not happy seeing solutions that are 0.5% or less from equilibrium (measured by how much the solution can be exploited).
>>Continual resolving done in DeepStack [1]
Right, thank you. I am very used to the term resolving but not "online search".
The idea here is to first approximate the solution using betting abstraction (for example solving with 3 bet sizes) and then hope this gets closer to the real thing if we resolve parts of the tree with more sizes (those parts that become relevant for the current play).
>>Gadget game introduced in [3], used in continual resolving.
I don't see "strategy consistency" in the paper nor a gadget game. Did you mean a different one?
>>Being imprecise like this would arguably not result in a super-human play.
Well, you have noticed that we can get somewhat close with a deterministic strategy and that is one step closer. There is nothing stopping LLMs from giving more precise answers like 70-30 or 90-10 or whatever.
>>But this is in token space. I'd be curious to see a demonstration of sampling of a distribution (i.e. some uniform) in the "token space", not via external tool calling. Can you make an LLM sample an integer from 1 to 10, or from any other interval, e.g. 223 to 566, without an external tool?
It doesn't have to sample it. It just needs to approximate the function that takes a game state and outputs the best move. That move is a distribution, not a single action. It's purely about pattern recognition (like chess). It can even learn to output colors or w/e (yellow for 100-0, red for 90-10, blue for 80-20 etc.). It doesn't need to do any sampling itself, just recognize patterns.
>>You don't need an LLM under such scheme -- you can do a k-NN or some other simple approximation. But any strategy/value approximation would encounter the very same problem DeepStack had to solve with gadget games about strategy inconsistency [5]. During play, you will enter a subgame which is not covered by your training data very quickly, as poker has ~10^160 states.
Ok, thank you I see what you mean by strategy consistency now.
It's true that generating data if you need resolving (for example for no-limit poker) is also computationally expensive.
However your point:
>You don't need an LLM under such scheme -- you can do a k-NN or some other simple approximation.
Is not clear to me. You can say that about any other game then, no? The point of LLMs is that they are good at recognizing patterns in a huge space and may be able to approximate games like chess or poker pretty efficiently unlike traditional techniques.
>>How you define "precision" ?
I mean that there are patterns that seem very similar but result in completely different correct answers. In chess a miniscule difference in positions may result in a the same move being a winning one in one but a losing one in another.
In poker if you call 25% more or 35% more if the bet size is 20% smaller is unlikely to result in a huge blunder. Chess is more volatile and thus you need more "precision" telling patterns apart.
I realize it's nota technical term but it's the one that comes to mind when you think about things LLMs are good and bad at. They are very good at seeing general patterns but weak when they need to be precise.
I agree it is possible to build an LLM to play poker, with appropriate tool calling, in principle.
I think it's useful to distinguish what LLMs can do in a) theory, b) non-LLM approaches we know work and c) how to do it with LLMs.
In a) theory, LLMs with the "thinking" rollouts are equivalent to (finite-tape) Turing machine, so they can do anything a computer can, so a solution exists (given large-enough neural net/rollout). To do the sampling, I agree the LLM can use an external tool call. This a good start!
For b) to achieve strong performance in poker, we know you can do continual resolving (e.g. search + gadget)
For c) "Quantization" as you suggested is an interesting approach, but it goes against the spirit of "let's have a big neural net that can do any general task". You gave an example how to quantize for a state that has 2 actions. But what about 3? 4? Or N?
So in practice, to achieve such generality, you need to output in the token space.
On top of that, for poker, you'd need LLM to somehow implement continual resolving/ReBeL (for equilibrium guarantees). To do all of this, you need either i) LLM call the CPU implementation of the resolver or ii) the LLM to execute instructions like a CPU.
I do believe i) is practically doable today, to e.g. finetune an LLM to incorporate value function in its weights and call a resolver tool, but not something ChatGPT and others can do (to come to my original parent post).
Also, in such finetuning process, you will likely trade-off the LLM generality for specialization.
> you can do a k-NN or some other simple approximation. [..] You can say that about any other game then, no?
Yes, you can approximate value function with any model (k-NN, neural net, etc).
> In poker if you call 25% more or 35% more if the bet size is 20% smaller is unlikely to result in a huge blunder. Chess is more volatile and thus you need more "precision" telling patterns apart.
I see. The same applies for Chess however -- you can play mixed strategies there too, with similar property - you can linearly interpolate expected value between losing (-1) and winning (1).
Overall, I think being able to incorporate a value function within an LLM is super interesting research, there are some works there, e.g. Cicero [6], and certainly more should be done, e.g. have a neural net to be both a language model and be able to do AlphaZero-style search.
I agree with everything here. Thank you for interesting references and links as well!.
One point I would like to make:
>>On top of that, for poker, you'd need LLM to somehow implement continual resolving/ReBeL (for equilibrium guarantees). To do all of this, you need either i) LLM call the CPU implementation of the resolver or ii) the LLM to execute instructions like a CPU.
Maybe we don't. Maybe there are general patterns that LLM could pick up so it could make good decisions in all branches without resolving anything, just looking at the current state. For example LLM could learn to automatically scale calling/betting ranges depending on the bet size once it sees enough examples of solutions coming from algorithms that use resolving.
I guess what I am getting at is that intuitively there is not that much information in poker solutions in comparison to chess so there are more general patterns LLMs could pick up on.
I remember the discussion about the time heads-up limit holdem was solved and arguments that it's bigger than chess. I think it's clear now that solution to limit holdem is much smaller than solution to chess is going to be (and we haven't even started on compression there that could use internal structure of the game). My intuition is that no-limit might still be smaller than chess.
>>I see. The same applies for Chess however -- you can play mixed strategies there too, with similar property - you can linearly interpolate expected value between losing (-1) and winning (1).
I mean that in chess the same move in seemingly similar situation might be completely wrong or very right and a little detail can turn it from the latter to the former. You need a very "precise" pattern recognition to be able to distinguish between those situations. In poker if you know 100% calling with a top pair is right vs a river pot bet you will not make a huge mistakes if you 100% call vs 80% pot bet for example.
When NN based engines appeared (early versions of Lc0) it was instantly clear they have amazing positional "understanding" but get lost quickly when the position required a precise sequence of moves.
Yeah I can see that for sure. That's also a holy grail of a poker enthusiast "can we please have non-mixed solution that is close enough". The problem is that 2% or even 1% equilibrium utility is huge. Professional players are often not happy seeing solutions that are 0.5% or less from equilibrium (measured by how much the solution can be exploited).
>>Continual resolving done in DeepStack [1]
Right, thank you. I am very used to the term resolving but not "online search". The idea here is to first approximate the solution using betting abstraction (for example solving with 3 bet sizes) and then hope this gets closer to the real thing if we resolve parts of the tree with more sizes (those parts that become relevant for the current play).
>>Gadget game introduced in [3], used in continual resolving.
I don't see "strategy consistency" in the paper nor a gadget game. Did you mean a different one?
>>Being imprecise like this would arguably not result in a super-human play.
Well, you have noticed that we can get somewhat close with a deterministic strategy and that is one step closer. There is nothing stopping LLMs from giving more precise answers like 70-30 or 90-10 or whatever.
>>But this is in token space. I'd be curious to see a demonstration of sampling of a distribution (i.e. some uniform) in the "token space", not via external tool calling. Can you make an LLM sample an integer from 1 to 10, or from any other interval, e.g. 223 to 566, without an external tool?
It doesn't have to sample it. It just needs to approximate the function that takes a game state and outputs the best move. That move is a distribution, not a single action. It's purely about pattern recognition (like chess). It can even learn to output colors or w/e (yellow for 100-0, red for 90-10, blue for 80-20 etc.). It doesn't need to do any sampling itself, just recognize patterns.
>>You don't need an LLM under such scheme -- you can do a k-NN or some other simple approximation. But any strategy/value approximation would encounter the very same problem DeepStack had to solve with gadget games about strategy inconsistency [5]. During play, you will enter a subgame which is not covered by your training data very quickly, as poker has ~10^160 states.
Ok, thank you I see what you mean by strategy consistency now. It's true that generating data if you need resolving (for example for no-limit poker) is also computationally expensive.
However your point:
>You don't need an LLM under such scheme -- you can do a k-NN or some other simple approximation.
Is not clear to me. You can say that about any other game then, no? The point of LLMs is that they are good at recognizing patterns in a huge space and may be able to approximate games like chess or poker pretty efficiently unlike traditional techniques.
>>How you define "precision" ?
I mean that there are patterns that seem very similar but result in completely different correct answers. In chess a miniscule difference in positions may result in a the same move being a winning one in one but a losing one in another. In poker if you call 25% more or 35% more if the bet size is 20% smaller is unlikely to result in a huge blunder. Chess is more volatile and thus you need more "precision" telling patterns apart.
I realize it's nota technical term but it's the one that comes to mind when you think about things LLMs are good and bad at. They are very good at seeing general patterns but weak when they need to be precise.