shakedown.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A community for live music fans with roots in the jam scene. Shakedown Social is run by a team of volunteers (led by @clifff and @sethadam1) and funded by donations.

Administered by:

Server stats:

292
active users

#multithreading

0 posts0 participants0 posts today

#FluidX3D #CFD v3.2 is out! I've implemented the much requested #GPU summation for object force/torque; it's ~20x faster than #CPU #multithreading. 🖖😋
Horizontal sum in #OpenCL was a nice exercise - first local memory reduction and then hardware-supported atomic floating-point add in VRAM, in a single-stage kernel. Hammering atomics isn't too bad as each of the ~10-340 workgroups dispatched at a time does only a single atomic add.
Also improved volumetric #raytracing!
github.com/ProjectPhysX/FluidX

Remember when I mentioned we had ported our #fire propagation #cellularAutomaton from #Python to #Julia, gaining performance and the ability to parallelize more easily and efficiently?

A couple of days ago we had to run another big batch of simulations and while things progressed well at the beginning, we saw the parallel threads apparently hanging one by one until the whole process sat there doing who know what.

Our initial suspicion was that we had come across some weird #JuliaLang issue with #multithreading, which seemed to be confirmed by some posts we found on the Julia forums. We tried the workarounds suggested there, to no avail. We tried a different number of threads, and this led to the hang occurring after a different percent completion. We tried restarting the simulations skipping the ones already done. It always got stuck at the same place (for the same number of threads).

So, what was the problem?

1/n

Multithreaded CLI developers: let your users configure the number of threads.

Entire classes of use cases are hiding inside that will make your life easier as a dev -- and threads=1 is usually not hard to add.

One example: if your multithreaded tool works significantly faster on a single file when I force your tool to just use a single thread and parallelize it with parallel --pipepart --block instead, then either:

  1. you might decide to develop sharding the I/O of the physical file yourself, or

  2. you might consciously decide to not develop it, and leave that complexity to parallel (which is fine!)

But if your tool has no threads=N option, I have no workaround.

Configurable thread count lets me optimize in the meantime (or instead).

Leslie Lamport, of LaTeX fame, is a very accomplished mathematician and computer scientist with a Turing award for his work on “fundamental contributions to the theory and
practice of distributed and concurrent systems”. He just published a draft of his new book:

"A science of concurrent programs"

lamport.azurewebsites.net/tla/

True to his pedagogic approach to everything he does, "The book assumes only that you know the math one learns before entering a university." Even the appendices are fantastic. Can only wish I'll remain this lucid at his 82 years old.

At what point does setting more threads for OpenBLAS actually help?

For example, I have an SVD operation in #RStats on largish matrices (6000 rows and 6000 columns; doing an inverse), where default BLAS on Ubuntu is ~ 20 min.

OpenBLAS with 1 or 4 threads takes ~ 2 min (10X speedup!). With 4 threads, I can see the additional usage of cores, but overall time is the same as 1 thread.

Is there some magic size where using more threads for SVD will actually help?