Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The tournament measures the cumulative winnings. However, those can be far from the statistical expectation due to the variance of card distribution in poker.

To establish a real winner, you need to play many games:

> As seen in the Claudico match (20), even 80,000 games may not be enough to statistically significantly separate players whose skill differs by a considerable margin [1]

It is possible to reduce the number of required games thanks to variance reduction techniques [1], but I don't think this is what the website does.

To answer the question - "which 'quality' of the LLMs this tournament then actually measures" - since we can't tell the winner reliably, I don't think we can even make particular claims about the LLMs.

However, it could be interesting to analyze the play from a "psychology profile perspective" of dark triad (psychopaths / machiavellians / narcissists). Essentially, these personality types have been observed to prefer some strategies and this can be quantified [2].

[1] DeepStack, https://static1.squarespace.com/static/58a75073e6f2e1c1d5b36...

[2] Generation of Games for Opponent Model Differentiation https://arxiv.org/pdf/2311.16781



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: