I’m new to this stuff, but as I understand it, the “Attention is all you need” p...

I’m new to this stuff, but as I understand it, the “Attention is all you need” paper stated that training the positional encoding weights didn’t improve results for language models specifically, but other papers found that vision transformers performed better with trainable weights.