I wouldn't call a bluff a lie. In the sense that you can tell anyone who asks honestly about your general policy around bluffing and that would not diminish how well your bluffs work. In contrast with lying, where you going around and saying "Oh, yeah, I tend to lie around 10% of the time." would backfire quite a bit.
In game theory, the point of bluffing is not so much to make money from your bluff directly, but to mask when you are playing a genuinely good hand.
> [...] it's required to play some ranges, sometimes as if they were a different range; [...]
Why the mental gymnastics? Just say what the optimal play for 'some ranges' is, and then play that. The extra indirection in explanation might be useful for human intuition, but I'm not sure the machine needs that dressing up.
> LLMs produce the most likely response from the input embeddings. [...]
If I wanted to have my LLM play poker, I would ask it suggest me probabilities for what to play next, and then sample from there, instead of using the next-token sampler in the LLM to directly tell you the action you should take.
(But I'm not sure that's what the original article is doing.)
> The problem in poker is that a good amount of the tokens in the sequence are masked and/or controlled by a villain who is actively trying to deceive.
> Human text doesn't have a latent space that's written about enough and thoroughly enough to have poker solved in there.
I agree with both. Though it's still a fun exercise to pit contemporary off-the-shelf LLMs against each other here.
And perhaps add a purpose built poker bot to the mix as a benchmark. And also try with and without access to an external random sampler (like I suggested above). Or with and without access to eg being able to run freshly written Python code.
In game theory, the point of bluffing is not so much to make money from your bluff directly, but to mask when you are playing a genuinely good hand.
> [...] it's required to play some ranges, sometimes as if they were a different range; [...]
Why the mental gymnastics? Just say what the optimal play for 'some ranges' is, and then play that. The extra indirection in explanation might be useful for human intuition, but I'm not sure the machine needs that dressing up.
> LLMs produce the most likely response from the input embeddings. [...]
If I wanted to have my LLM play poker, I would ask it suggest me probabilities for what to play next, and then sample from there, instead of using the next-token sampler in the LLM to directly tell you the action you should take.
(But I'm not sure that's what the original article is doing.)
> The problem in poker is that a good amount of the tokens in the sequence are masked and/or controlled by a villain who is actively trying to deceive.
> Human text doesn't have a latent space that's written about enough and thoroughly enough to have poker solved in there.
I agree with both. Though it's still a fun exercise to pit contemporary off-the-shelf LLMs against each other here.
And perhaps add a purpose built poker bot to the mix as a benchmark. And also try with and without access to an external random sampler (like I suggested above). Or with and without access to eg being able to run freshly written Python code.