Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a Texas Hold'em enthusiast, some of the hands are moronic. Just checked one where grok wins with A3s because Gemini folds K10 with an Ace and a King on the board, without Grok betting anything. Gemini just folds instead of checking. It's not even GTO, it's just pure hallucination. Meaning: I wouldn't read anything into the fact that Grok leads. These machines are not made to play games like online poker deterministically and would be CRUSHED in GTO. It would be more interesting instead to understand if they could play exploitatively.


  > Gemini folds K10 with an Ace and a King on the board, without Grok betting anything. Gemini just folds instead of checking.
It's well known that Gemini has low coding self-esteem. It's hilarious to see it applies to poker as well.


it's probably trained off my repos then


You're absolutely right! /s


From my experience, their hallucination when playing poker mostly comes from a wrong reading of their hand strength in the current state. E.g., thinking they have the nuts when they are actually on a nut draw. They would reason a lot better if you explicitly give out their hand strength in the prompt.


(author of PokerBattle here)

I noticed the same and think that you're absolutely right. I've thought about adding their current hand / draw, but it was too close to the event to test it properly.


I play PLO and sometimes share hand histories with ChatGPT for fun. It can never successfully parse a starting hand let alone how it interacts with the board.


> These machines are not made to play games like online poker deterministically

I thought you're supposed to sample from a distribution of decisions to avoid exploitation?


You're correct that the theoretically optimal play is entirely statistical. Cepheus provides an approximate solution for Heads Up Limit, whereas these LLMs are playing full ring (ie 9 players in the same game, not two) and No Limit (ie you can pick whatever raise size you like within certain bounds instead of a fixed raise sizing) but the ideas are the same, just full ring with no limit is a much more complicated game and the LLMs are much worse at it.


This invites a game where models have variants with slightly differing system prompts. Don't know if they could actually sample from their own output if instructed, but it would allow for iterations on the system prompt to find the best instructions.


You could give it access to a tool call which returns a sample from U[0, 1], or more elaborate tool calls to monte carlo software that humans use. Harnessing and providing rules of thumb in context is going to help a great deal as we see in IMO agents.


Reminds me of the poker scene in Peep Show.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: