shakedown.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A community for live music fans with roots in the jam scene. Shakedown Social is run by a team of volunteers (led by @clifff and @sethadam1) and funded by donations.

Administered by:

Server stats:

254
active users

#opencl

0 posts0 participants0 posts today

As usual, getting something like GPU compute that's cross platform working is a message because everyone likes to do their own thing and reinvent the wheel.

I would like something that is [modern] macOS and FreeBSD compatible, but doesn't look like that's possible since Apple deprecated OpenCL.

(Also, could Apple have picked a less searchable term for their new GPU framework?)

It's again looking like the best way to be cross platform is to use JS+browser.

Or am I missing some library?

Who is using CL_sRGBA images with #OpenCL, specifically to write to it (cl_khr_srgb_image_writes)?

There is limited hw support for writing to sRGBA images and I'm now curious what even uses that feature.

It was apparently important enough to require support for it for OpenCL 2.0, but... that's not telling me much.

Is it possible to run AMD+Intel+Nvidia #GPU​s in the same PC? Yes! 🖖😋
Got this RDNA3 chonker for free from 11 bit studios contest! It completes my 36GB VRAM RGB SLI abomination setup:
- 🟥 #AMD Radeon RX 7700 XT 12GB
- 🟦 #Intel Arc B580 12GB
- 🟩 #Nvidia Titan Xp 12GB
The drivers all work together in #Linux Ubuntu 24.04.2. Backbone is an ASUS ProArt Z790 with i7-13700K and 64GB, PCIe 4.0 x8/x8 + 3.0 x4 - plenty interconnect bandwidth.
Finally I can develop and test #OpenCL on all major patforms!

I'm liking the class this year. Students are attentive and participating, and the discussion is always productive.

We were discussing the rounding up of the launch grid in #OpenCL to avoid the catastrophic performance drops that come from the inability to divide the “actual” work size by anything smaller than the maximum device local work size, and were discussing on how to compute the “rounded up” work size.

The idea is this: given the worksize N and the local size L, we have to round N to the smallest multiple of L that is not smaller than N. This effectively means computing D = ceili(N/L) and then using D*L.

There are several ways to compute D, but on the computer, working only with integers and knowing that integer division always rounded down, what is the “best way”?

D = N/L + 1 works well if N is not a multiple of L, but gives us 1 more than the intended result if N *is* a multiple of L. So we want to add the extra 1 only if N is not a multiple. This can be achieved for example with

D = N/L + !!(N % L)

which leverages the fact that !! (double logical negation) turns any non-zero value into 1, leaving zero as zero. So we round *down* (which is what the integer division does) and then add 1 if (and only if) there is a reminder to the division.

This is ugly not so much because of the !!, but because the modulus operation % is slow.

1/n

Though of the day: I should make more of use lifetimes in Rust to express dependencies between API objects in #Rusticl

So far I haven't as using Arc is a good enough solution here. But I'm getting to the point where it's getting in the way.

The main reason I haven't is, that API objects are managed by the application and given #OpenCL being a C API there isn't much I can do really if the application destroys objects in a weird order.

So that's kinda annoying.

I've finally patched/enabled #FP16 vector arithmetic support for my #OpenCL-Benchmark on Nvidia #GPU​s that support it with Nvidia's NVVM-7.0-updated drivers. That is Pascal, Volta, Turing, Ampere, Ada, Hopper, Blackwell and future.
Interesting find: Nvidia Ada has cut FP16 vector throughput in half, to only 1:1 FP16:FP32 ratio instead of 2:1. And A100 has 4:1 ratio.
github.com/ProjectPhysX/OpenCL

GitHubRelease OpenCL-Benchmark v1.5 · ProjectPhysX/OpenCL-Benchmarkenabled benchmarking FP16 vector arithmetic on Nvidia Pascal and newer GPUs with Nvidia driver 520 or newer removed wait() call at the end of the benchmark on Linux |----------------.------------...

I'm gonna fix the most annoying and most weird bug I have atm inside #rusticl : "volatile private" variables getting optimized to registers.

And you might think "why does this even remotely matter?" Turns out the #OpenCL CTS checks with a simple kernel having two constants in volatile private memory how rounding actually works on hardware.

However, we constant fold it all away, so the detection fails, which also means that "fma" tests are failing on Apple M1/M2 in weird corner cases 🙃

#OpenCL has a compiler flag -cl-fp32-correctly-rounded-divide-sqrt. If you don't pass this, then divisions and square roots are incorrectly rounded. Shouldn't this be the other way around? How many other flags to I need to pass in order for arithmetic to be correct?

Passively participating in #Genuary2024 — Day 8 Chaotic System. In 2012/13 I designed an award-winning audioreactive brand identity system for Leeds College Of Music based on the DeJong strange attractor with tens and hundreds of millions of particles per frame. This massive almost 1 year project consisted of a Mac/PC desktop app (written in Clojure, OpenCL & OpenGL) for exploring the attractor, creating presets and scheduling render jobs for super hi-res print assets (which would take a hours to render and were the biggest image sizes I _ever_ had to deal with, up to 3x3 meters @ 150 dpi). I also had to develop an entire AWS based ad-hoc render farm and asset & user management system for the school to generate personalized video assets, allowing each student to upload their own music, handle audio FFT analysis and beat detection/mapping (all in Clojure) and to create individual sound-responsive clips for their in-school digital signage system and for sharing on social media... Most key aspects were handled via various old thi.ng libraries (e.g. thi.ng/simplecl for OpenCL interop). The server app also handled transcoding to dozens of video formats (via ffmpeg) and semi-automatic provisioning of EC2 machines for render/transcoding jobs...

An example video is below (music: Heyoka, Blue Towel)

A week ago was the 1st anniversary of this solo instance & more generally of my fulltime move to Mastodon. A good time for a more detailed intro, partially intended as CV thread (pinned to my profile) which I will add to over time (also to compensate the ongoing lack of a proper website)... Always open to consulting offers, commissions and/or suitable remote positions...

Hi, I'm Karsten 👋 — indy software engineer, researcher, #OpenSource author of hundreds of projects (since ~1999), computational/generative artist/designer, landscape photographer, lecturer, outdoor enthusiast, on the ND spectrum. Main interest in transdisplinary research, tool making, exploring techniques, projects & roles amplifying the creative, educational, expressive and inspirational potential of (personal) computation, code as material, combining this with generative techniques of all forms (quite different to what is now called and implied by "generative AI").

Much of my own practice & philosophy is about #BottomUpDesign, interconnectedness, simplicity and composability as key enablers of emergent effects (also in terms of workflow & tool/system design). Been adopting a round-robin approach to cross-pollinate my work & learning, spending periods going deep into various fields to build up and combine experience in (A-Z order): API design, audio/DSP, baremetal (mainly STM32), computer vision/image processing, compiler/DSL/VM impl, databases/linked data/query engines, data structures impl, dataviz, fabrication (3DP, CNC, knit, lasercut), file formats & protocols (as connective tissue), "fullstack" webdev (front/back/AWS), generative & evolutionary algorithms/art/design/aesthetics/music, geometry/graphics, parsers, renderers, simulation (agents/CFD/particles/physics), shaders, typography, UI/UX/IxD...

Since 2018 my main endeavor has been thi.ng/umbrella, a "jurassic" (as it's been called) monorepo of ~185 code libraries, addressing many of the above topics (plus ~150 examples to illustrate usage). More generally, for the past decade my OSS work has been focused on #TypeScript, #C, #Zig, #WebAssembly, #Clojure, #ClojureScript, #GLSL, #OpenCL, #Forth, #Houdini/#VEX. Earlier on, mainly Java (~15 years, since 1996).

Formative years in the deep end of the #Atari 8bit demoscene (Chip Special Software) & game dev (eg. The Brundles, 1993), B&W dark room lab (since age 10), music production/studio (from 1993-2003), studied media informatics, moved to London initially as web dev, game dev (Shockwave 3D, ActionScript), interaction designer, information architect. Branched out, more varied clients/roles/community for my growing collection of computational design tools, which I've been continously expanding/updating for the past 20+ years, and which have been the backbone of 99% of my work since ~2006 (and which helped countless artists/designers/students/studios/startups). Creator of thi.ng (since 2011), toxiclibs (2006-2013), both large-scale, multi-faceted library collections. Early contributor to Processing (2003-2005, pieces of core graphics API).

Worked on dozens of interactive installations/exhibitions, public spaces & mediafacades (own projects and many collabs, several award winning), large-scale print on-demand projects (>250k unique outputs), was instrumental in creating some of the first generative brand identity systems (incl. cloud infrastructure & asset management pipelines), collaborated with architects, artists, agencies, hardware engineers, had my work shown at major galleries/museums worldwide, taught 60+ workshops at universities, institutions and companies (mainly in EMEA). Was algorithm design lead at Nike's research group for 5 years, working on novel internal design tools, workflows, methods of make, product design (footwear & apparel) and team training. After 23 years in London, my family decided on a lifestyle change and so currently based in the beautiful Allgäu region in Southern Germany.

Do I have anyone in my wider network with skills in programming CUDA, SYCL, and OpenCL?

We want to determine feasibility of migrating CUDA-only code to SYCL (via SYCLomatic?): OpenCV feature detection/extraction modules (SIFT, HAGOG, ORB, AKAZE).

The intent is to upstream all feasible work.

This, hopefully, should stand to benefit everyone instead of being limited to NVIDIA.

Currently in info gathering/people connecting phase, not yet funded & ready to go.

#CUDA#SYCL#OpenCL

Got #TornadoVM installed and running on my local Linux laptop, a #Lenovo 14s Thinkpad with an 10th generation Intel® Core™ CPU and an integrated Intel® UHD graphics card.

Took a bit of futzing around with runtime dependencies, but the required packages (for Ubuntu Jammy) were:

intel-basekit
intel-oneapi-runtime-opencl
intel-opencl-icd

I also had to explicitly set

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/oneapi/compiler/2023.2.1/linux/compiler/lib/intel64_lin

#Java#Graal#JIT

As I've been updating the build files for my various #ziglang projects & templates, also learned that quite a few of them have to be overhauled/refactored due to syntax changes and a more strict compiler. One example is this #WASM #voxel #renderer from 1.5 years ago which doesn't build anymore without major code updates, but the old build still works:

demo.thi.ng/zig/voxel-trace/

Reload for random views. Press `x` to export current frame. The renderer is incremental (never finishes) and slowly reduces pixel size from 8 down to 1. It would be much faster, but I had some ideas for creating a more stylistic output and in this current state it only renders a fixed area per frame...

The 2-bit 512^3 voxel model was generated with a custom fork of @R4_Unit's voxel automata... 🥰

Other renders & process on my old Twitter:

twitter.com/search?q=from%3A%4

Ps. This renderer is heavily using this #SIMD vector library:

github.com/thi-ng/zig-thing/tr

...and is a rewrite of my 2013 hybrid #OpenCL #Clojure voxel renderer:

github.com/thi-ng/raymarchcl