Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It should be benchmarked against something like RULER[1]

1: https://github.com/hsiehjackson/RULER (RULER: What’s the Real Context Size of Your Long-Context Language Models)



> To incorporate this, we ask the model to complete a chain of hashes instead (as recently proposed by RULER):

They did mention it but didn't provide concrete benchmarks




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: