1. Run the text through a document embeddding model and save the embedding.
2. Remove one token at a time, run the text through the model, and compute the cosine similarity of the each new embedding with the original one.
3. Compute importance as a function of the change in cosine similarity.
Nice. I like it and expect it will work well in many scenarios.
Also check out https://github.com/glassroom/heinsen_routing . It takes N embeddings and outputs M embeddings (instead of one), and can optionally give you an N×M matrix with credit assignments, without having to remove tokens one by one, which can be prohibitively slow for long texts.
1. Run the text through a document embeddding model and save the embedding.
2. Remove one token at a time, run the text through the model, and compute the cosine similarity of the each new embedding with the original one.
3. Compute importance as a function of the change in cosine similarity.
Nice. I like it and expect it will work well in many scenarios.
Also check out https://github.com/glassroom/heinsen_routing . It takes N embeddings and outputs M embeddings (instead of one), and can optionally give you an N×M matrix with credit assignments, without having to remove tokens one by one, which can be prohibitively slow for long texts.