Hacker Newsnew | past | comments | ask | show | jobs | submit | hh1's commentslogin

When you talk about "c" or "scalar memory" in the paper, does that refer to a single unit in the vector usually referred to as c?

So in mLSTM, each unit of the vector c is now a matrix (so a 3d tensor)? And we refer to each matrix as a head?

Having a bit of issue understanding this fundamental part


You mainly got it right. Usually one does have many scalar 'c' cells, that talk to each other via memory mixing. For the sLSTM, you group them into heads, talking only to cells within the same head. The reason that we referred to scalar cells here is that these are that fundamental building block. Many of them can and are usually combined and vector notation is useful in this case.

For the matrix 'C' state, there are also heads/cells in that sense that you have multiple, but they don't talk to each other. So yes, you can view that as a 3D tensor. And here, the matrix is the fundamental building block / concept.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: