Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Right. There should really be a vanilla Transformer baseline.

With recurrence: The idea has been around: https://arxiv.org/abs/1807.03819

There are reasons why it hasn't really been picked up at scale, and the method tends to do well on synthetic tasks.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: