shakedown.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A community for live music fans with roots in the jam scene. Shakedown Social is run by a team of volunteers (led by @clifff and @sethadam1) and funded by donations.

Administered by:

Server stats:

285
active users

#openblas

0 posts0 participants0 posts today
Dr. Robert M Flight<p>At what point does setting more threads for OpenBLAS actually help?</p><p>For example, I have an SVD operation in <a href="https://mastodon.social/tags/RStats" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RStats</span></a> on largish matrices (6000 rows and 6000 columns; doing an inverse), where default BLAS on Ubuntu is ~ 20 min. </p><p>OpenBLAS with 1 or 4 threads takes ~ 2 min (10X speedup!). With 4 threads, I can see the additional usage of cores, but overall time is the same as 1 thread. </p><p>Is there some magic size where using more threads for SVD will actually help?</p><p><a href="https://mastodon.social/tags/MultiThreading" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MultiThreading</span></a> <a href="https://mastodon.social/tags/OpenBLAS" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenBLAS</span></a></p>
FCLC<p>Time for an <a href="https://mast.hpc.social/tags/introduction" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>introduction</span></a>! <br>I'm a young Canuck with interests/experience in <a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>HPC</span></a>, <a href="https://mast.hpc.social/tags/Linux" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Linux</span></a>, <a href="https://mast.hpc.social/tags/BLAS" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>BLAS</span></a>, <a href="https://mast.hpc.social/tags/SYCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SYCL</span></a>, <a href="https://mast.hpc.social/tags/C" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>C</span></a>, <a href="https://mast.hpc.social/tags/AVX512" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AVX512</span></a>, <a href="https://mast.hpc.social/tags/Rust" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Rust</span></a>, heterogeneous compute &amp; other such things. </p><p>Currently my personal projects are bringing <a href="https://mast.hpc.social/tags/FP16" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>FP16</span></a> to the <a href="https://mast.hpc.social/tags/OpenBLAS" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenBLAS</span></a> library, working to standardize what Complex domain BLAS FP16 kernels/implementations should look like, and making sure <a href="https://mast.hpc.social/tags/SYCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SYCL</span></a> is available everywhere. </p><p>I also write every now and again. Here's the tail of AVX512 FP16 on Alderlake <br><a href="https://gist.github.com/FCLC/56e4b3f4a4d98cfd274d1430fabb9458" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">gist.github.com/FCLC/56e4b3f4a</span><span class="invisible">4d98cfd274d1430fabb9458</span></a></p>