Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For OpenAI perhaps? Sonnet 3.7 without extended thinking is quite strong. Swe-bench scores tie o3


How do you read those scores? I wanted to see how well 3.7 with thinking did, but I can't even read that table.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: