Many of the algorithms in BLAS are not easily parallelized. For example, a QR fa...

		eigenman on Jan 25, 2024 \| parent \| context \| favorite \| on: BLIS: A BLAS-like framework for basic linear algeb... Many of the algorithms in BLAS are not easily parallelized. For example, a QR factorization an inherently sequential algorithm. Optimizing BLAS performance comes mainly from rewriting the sequential algorithm into larger blocks so as to efficiently access memory. As Jim Demmel is fond of saying, floating point optimizations are cheap, memory movement is expensive.

QR decomposition isn’t in BLAS, you’re probably thinking of LAPACK.