This is going to shred when it reaches industry. But yeah very math heavy for a ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		mithametacs on Aug 12, 2024 \| parent \| context \| favorite \| on: Tree Attention: Topology-Aware Decoding for Long-C... This is going to shred when it reaches industry. But yeah very math heavy for a software engineering paper.

ynniv on Aug 12, 2024 [–]

How long can a page of python take? https://github.com/Zyphra/tree_attention/blob/main/tree_shar...

mithametacs on Aug 12, 2024 | [–]

You seem to know your stuff.

Will this technique work with existing model weights?

kasmura on Aug 12, 2024 | | [–]

Yes, it is just a way of computing the self-attention in a distributed way

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact