Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When understanding the performance of your model it's very helpful to look at a roofline plot [1]. The roofline plot will show you the floating-point performance as a function of arithmetic intensity for the various ops in your model. The plot has two regimes: a memory-bound regime on the left and a compute-bound regime on the right. This can help to identify memory-bound ops that are taking a significant fraction of compute time.

[1]: https://en.wikipedia.org/wiki/Roofline_model



Agreed, roofline plots would be quite powerful in this context. From a quick search, seems like the only way to create a roofline plot for your model would be to use Nsight [1]? Would be interested to know if there are any simpler tools, since one of the big benefits of SM efficiency is how easily the metric is accessed.

[1]: https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s...


Depending on the size of your application you can calculate flops by hand

https://docs.nersc.gov/tools/performance/roofline/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: