This is a field I find fascinating. It's generally the research field of Machine...

HarHarVeryFunny · on May 25, 2023

Thanks, Jay!

I wasn't aware of that BERT explainability paper - will be reading it, and watching your video.

Are there any more recent Transformer Explainability papers that you would recommend - maybe ones that build on this and look at what's going on in later layers?

jayalammar · on May 25, 2023

Additional ones that come to mind now are:

Transformer Feed-Forward Layers Are Key-Value Memories https://arxiv.org/abs/2012.14913

The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention https://arxiv.org/abs/2202.05798

https://github.com/neelnanda-io/TransformerLens

HarHarVeryFunny · on May 26, 2023

Another piece of the puzzle seems to be transformer "induction heads" where attention heads in consecutive layers work together to provide a mechanism that is believed to be responsible for much of in-context learning. The idea is that earlier instances of a token pattern/sequence in the context are used to predict the continuation of a similar pattern later on.

In the most simple case this is a copying operation such that an early occurrence of AB predicts that a later A should be followed by B. In the more general case this becomes A'B' => AB which seems to be more of an analogy type relationship.

https://arxiv.org/abs/2209.11895

https://youtu.be/Vea4cfn6TOA

This is still only a low level mechanistic type of operation, but at least a glimpse into how transformers are operating at inference time.

HarHarVeryFunny · on May 25, 2023

That's great - thank you!