| 1. | | Show HN: LLM Round‑Trip Translation Benchmark (github.com/lechmazur) |
| 6 points by zone411 3 months ago | past |
|
| 2. | | Show HN: LLM Creative Story‑Writing Benchmark V3 (github.com/lechmazur) |
| 8 points by zone411 3 months ago | past |
|
| 3. | | Show HN: Mapping LLM Style and Range in Flash Fiction (github.com/lechmazur) |
| 7 points by zone411 3 months ago | past |
|
| 4. | | Pact: Head-to-head negotiation benchmark for LLMs (github.com/lechmazur) |
| 6 points by zone411 4 months ago | past |
|
| 5. | | Show HN: Bazaar – a new LLM benchmark for economic reasoning under uncertainty (github.com/lechmazur) |
| 8 points by zone411 5 months ago | past | 1 comment |
|
| 6. | | AI Comes Up with Physics Experiments. But They Work (quantamagazine.org) |
| 4 points by zone411 5 months ago | past |
|
| 7. | | Emergent Price-Fixing by LLM Auction Agents (github.com/lechmazur) |
| 7 points by zone411 5 months ago | past |
|
| 8. | | Public Goods Game Benchmark: Contribute and Punish, a Multi-Agent Benchmark (github.com/lechmazur) |
| 7 points by zone411 9 months ago | past |
|
| 9. | | Elimination Game: Multi-Agent LLM Social Reasoning, Strategy, and Deception (github.com/lechmazur) |
| 5 points by zone411 10 months ago | past |
|
| 10. | | SWE-Lancer: a benchmark of freelance software engineering tasks from Upwork (arxiv.org) |
| 111 points by zone411 10 months ago | past | 74 comments |
|
| 11. | | LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21 (github.com/lechmazur) |
| 17 points by zone411 10 months ago | past | 3 comments |
|
| 12. | | Multi-Agent Step Race Benchmark: LLM Collaboration and Deception Under Pressure (github.com/lechmazur) |
| 7 points by zone411 11 months ago | past | 1 comment |
|
| 13. | | Show HN: LLM Thematic Generalization Benchmark (github.com/lechmazur) |
| 6 points by zone411 11 months ago | past |
|
| 14. | | Show HN: LLM Creative Story-Writing Benchmark (github.com/lechmazur) |
| 5 points by zone411 11 months ago | past |
|
| 15. | | Show HN: LLM Divergent Thinking Creativity Benchmark (github.com/lechmazur) |
| 8 points by zone411 11 months ago | past |
|
| 16. | | Show HN: LLM Deceptiveness and Gullibility Benchmark (github.com/lechmazur) |
| 7 points by zone411 on Oct 22, 2024 | past | 1 comment |
|
| 17. | | LLM Confabulation (Hallucination) Leaderboard (github.com/lechmazur) |
| 6 points by zone411 on Oct 10, 2024 | past |
|
| 18. | | O1-preview and o1-mini results on NYT Connections (twitter.com/lechmazur) |
| 2 points by zone411 on Sept 13, 2024 | past | 1 comment |
|
| 19. | | Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy (twitter.com/xai) |
| 213 points by zone411 on Nov 5, 2023 | past | 228 comments |
|
| 20. | | Can you beat a stochastic parrot? ParrotChess.com (parrotchess.com) |
| 3 points by zone411 on Sept 22, 2023 | past | 4 comments |
|
| 21. | | Generative AI while browsing in Chrome (labs.google.com) |
| 3 points by zone411 on Aug 15, 2023 | past |
|
| 22. | | Statement on AI Risk (safe.ai) |
| 341 points by zone411 on May 30, 2023 | past | 921 comments |
|
| 23. | | Google tells staff it plans to limit publishing AI research (businessinsider.com) |
| 63 points by zone411 on May 5, 2023 | past | 28 comments |
|
| 24. | | 4th Gen Intel Xeon Scalable Sapphire Rapids Leaps Forward (servethehome.com) |
| 2 points by zone411 on Jan 10, 2023 | past | 1 comment |
|
| 25. | | Fast and Furious Movie Titles by 'Claude' from Anthropic AI (twitter.com/jayelmnop) |
| 2 points by zone411 on Jan 9, 2023 | past |
|
| 26. | | SatelliteXplorer (esri.com) |
| 2 points by zone411 on Dec 30, 2022 | past |
|
| 27. | | SBF Arrested by Bahamian Authorities (twitter.com/tier10k) |
| 1308 points by zone411 on Dec 12, 2022 | past | 812 comments |
|
| 28. | | Large Language Models Can Self-Improve (openreview.net) |
| 3 points by zone411 on Oct 2, 2022 | past | 1 comment |
|
| 29. | | America Reached One Million Covid Deaths (nytimes.com) |
| 5 points by zone411 on May 14, 2022 | past |
|
| 30. | | Show HN: Catchy melodies made with a diffusion-based neural net assistant (youtube.com) |
| 38 points by zone411 on May 11, 2022 | past | 14 comments |
|
|
| More |