Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The idea is interesting, but I still don’t understand how this is supposed to solve continual learning in practice.

You’ve got a frozen transformer and a second module still trained with SGD, so how exactly does that solve forgetting instead of just relocating it?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: