As a Texas Hold'em enthusiast, some of the hands are moronic. Just checked one w...

prodigycorp · 2025-10-28T08:40:30 1761640830

  > Gemini folds K10 with an Ace and a King on the board, without Grok betting anything. Gemini just folds instead of checking.

It's well known that Gemini has low coding self-esteem. It's hilarious to see it applies to poker as well.

jpfromlondon · 2025-10-28T08:57:48 1761641868

it's probably trained off my repos then

raverbashing · 2025-10-28T09:51:12 1761645072

You're absolutely right! /s

hadeson · 2025-10-28T08:47:59 1761641279

From my experience, their hallucination when playing poker mostly comes from a wrong reading of their hand strength in the current state. E.g., thinking they have the nuts when they are actually on a nut draw. They would reason a lot better if you explicitly give out their hand strength in the prompt.

mpavlov · 2025-10-28T11:54:13 1761652453

(author of PokerBattle here)

I noticed the same and think that you're absolutely right. I've thought about adding their current hand / draw, but it was too close to the event to test it properly.

meep_morp · 2025-10-28T15:27:17 1761665237

I play PLO and sometimes share hand histories with ChatGPT for fun. It can never successfully parse a starting hand let alone how it interacts with the board.

energy123 · 2025-10-28T08:39:11 1761640751

> These machines are not made to play games like online poker deterministically

I thought you're supposed to sample from a distribution of decisions to avoid exploitation?

tialaramex · 2025-10-28T09:11:04 1761642664

You're correct that the theoretically optimal play is entirely statistical. Cepheus provides an approximate solution for Heads Up Limit, whereas these LLMs are playing full ring (ie 9 players in the same game, not two) and No Limit (ie you can pick whatever raise size you like within certain bounds instead of a fixed raise sizing) but the ideas are the same, just full ring with no limit is a much more complicated game and the LLMs are much worse at it.

miggol · 2025-10-28T08:47:40 1761641260

This invites a game where models have variants with slightly differing system prompts. Don't know if they could actually sample from their own output if instructed, but it would allow for iterations on the system prompt to find the best instructions.

energy123 · 2025-10-28T09:16:32 1761642992

You could give it access to a tool call which returns a sample from U[0, 1], or more elaborate tool calls to monte carlo software that humans use. Harnessing and providing rules of thumb in context is going to help a great deal as we see in IMO agents.

gorn · 2025-10-28T09:52:37 1761645157

Reminds me of the poker scene in Peep Show.