Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am wondering why flash attention is like 5x slower with variable masking than without it? Lack of good masking support almost zeros out the optimizations


Where are you seeing these benchmarks?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: