Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Billion Words: Today's language modeling standard should be higher (googleresearch.blogspot.com)
9 points by vikram360 on May 1, 2014 | hide | past | favorite | 2 comments


The GZ file is only 1.7GB, I imagine a densely-packed model would almost fit on a machine with 8GB of RAM, which is surprising.

http://www.statmt.org/lm-benchmark/


Along similar lines, all of the English Wikipedia is < 10GB, and about 45GB uncompressed: http://en.wikipedia.org/wiki/Wikipedia:Database_download#Eng.... That omits all the history (just the current pages), but still surprising to me how small it seems now.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: