Shakedown Social

Dr. Robert M FlightAt what point does setting more threads for OpenBLAS actually help?For example, I have an SVD operation in <a href="https://mastodon.social/tags/RStats" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#RStats</a> on largish matrices (6000 rows and 6000 columns; doing an inverse), where default BLAS on Ubuntu is ~ 20 min. OpenBLAS with 1 or 4 threads takes ~ 2 min (10X speedup!). With 4 threads, I can see the additional usage of cores, but overall time is the same as 1 thread. Is there some magic size where using more threads for SVD will actually help?<a href="https://mastodon.social/tags/MultiThreading" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#MultiThreading</a> <a href="https://mastodon.social/tags/OpenBLAS" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#OpenBLAS</a>

FCLCTime for an <a href="https://mast.hpc.social/tags/introduction" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#introduction</a>! I'm a young Canuck with interests/experience in <a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#HPC</a>, <a href="https://mast.hpc.social/tags/Linux" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Linux</a>, <a href="https://mast.hpc.social/tags/BLAS" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#BLAS</a>, <a href="https://mast.hpc.social/tags/SYCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#SYCL</a>, <a href="https://mast.hpc.social/tags/C" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#C</a>, <a href="https://mast.hpc.social/tags/AVX512" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AVX512</a>, <a href="https://mast.hpc.social/tags/Rust" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Rust</a>, heterogeneous compute & other such things. Currently my personal projects are bringing <a href="https://mast.hpc.social/tags/FP16" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FP16</a> to the <a href="https://mast.hpc.social/tags/OpenBLAS" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#OpenBLAS</a> library, working to standardize what Complex domain BLAS FP16 kernels/implementations should look like, and making sure <a href="https://mast.hpc.social/tags/SYCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#SYCL</a> is available everywhere. I also write every now and again. Here's the tail of AVX512 FP16 on Alderlake <a href="https://gist.github.com/FCLC/56e4b3f4a4d98cfd274d1430fabb9458" rel="nofollow noopener noreferrer" target="_blank">https://gist.github.com/FCLC/56e4b3f4a4d98cfd274d1430fabb9458</a>

Recent searches

Search options

Administered by:

Server stats:

#openblas