Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Surprisingly Effective Way to Estimate Token Importance in LLM Prompts (watchful.io)
3 points by hobs on Sept 12, 2023 | hide | past | favorite | 1 comment


Simple, and in hindsight, obvious:

1. Run the text through a document embeddding model and save the embedding.

2. Remove one token at a time, run the text through the model, and compute the cosine similarity of the each new embedding with the original one.

3. Compute importance as a function of the change in cosine similarity.

Nice. I like it and expect it will work well in many scenarios.

Also check out https://github.com/glassroom/heinsen_routing . It takes N embeddings and outputs M embeddings (instead of one), and can optionally give you an N×M matrix with credit assignments, without having to remove tokens one by one, which can be prohibitively slow for long texts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: