Congratulations on the paper. That's some very interesting work!
But you would want to include sLSTM as well to get the best performance, right? How does the speed compares in that case? Specifically when scaling up.
Thank you! I can say that it is not really a diminishing factor at the scales reported in the paper. So, xLSTM[7:1] is pretty much on par with xLSTM[1:0] in speed.
We show that it is helpful on toy tasks, and it shows even better sequence extrapolation performance, so yes.
But you would want to include sLSTM as well to get the best performance, right? How does the speed compares in that case? Specifically when scaling up.