clifff @clifff

**Martin Boller** @itisiboller@infosec.exchange · Aug 3

Martin Boller @itisiboller@infosec.exchange

Short write-up on running Hashcat 7 (or older) with OpenCL on CPUs and/or using the Nouveau FOSS driver for NVIDIA cards.

www.infosecworrier.dk/blog/2025/08/opencl/

All the good stuff is from @tychotithonus original post. The rest is just me standing on his shoulders.

Infosec Worrier · Aug 3Hashcat on CPUs and older NVIDIA GPUsRunning OpenCL with the Nouveau drivers

#Hashcat #OpenCL #Nouveau

**John-Mark Gurney** @encthenet@flyovercountry.social · Jun 23

Jun 23

John-Mark Gurney @encthenet@flyovercountry.social

As usual, getting something like GPU compute that's cross platform working is a message because everyone likes to do their own thing and reinvent the wheel.

I would like something that is [modern] macOS and FreeBSD compatible, but doesn't look like that's possible since Apple deprecated OpenCL.

(Also, could Apple have picked a less searchable term for their new GPU framework?)

It's again looking like the best way to be cross platform is to use JS+browser.

Or am I missing some library?

#OpenCL #GPUCompute #FreeBSD

**karolherbst** @karolherbst@chaos.social · Jun 16

Jun 16

karolherbst @karolherbst@chaos.social

Who is using CL_sRGBA images with #OpenCL, specifically to write to it (cl_khr_srgb_image_writes)?

There is limited hw support for writing to sRGBA images and I'm now curious what even uses that feature.

It was apparently important enough to require support for it for OpenCL 2.0, but... that's not telling me much.

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · Jun 16 *

Jun 16 *

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

Is it possible to run AMD+Intel+Nvidia #GPUs in the same PC? Yes!
Got this RDNA3 chonker for free from 11 bit studios contest! It completes my 36GB VRAM RGB SLI abomination setup:
- #AMD Radeon RX 7700 XT 12GB
- #Intel Arc B580 12GB
- #Nvidia Titan Xp 12GB
The drivers all work together in #Linux Ubuntu 24.04.2. Backbone is an ASUS ProArt Z790 with i7-13700K and 64GB, PCIe 4.0 x8/x8 + 3.0 x4 - plenty interconnect bandwidth.
Finally I can develop and test #OpenCL on all major patforms!

**kandid** @kandid@chaos.social · May 21

May 21

kandid @kandid@chaos.social

A little bit like #fractalFlame
Made with #openFrameworks and #OpenCL

**Giuseppe Bilotta** @giuseppebilotta@fediscience.org · Mar 25 *

Mar 25 *

Giuseppe Bilotta @giuseppebilotta@fediscience.org

I'm liking the class this year. Students are attentive and participating, and the discussion is always productive.

We were discussing the rounding up of the launch grid in #OpenCL to avoid the catastrophic performance drops that come from the inability to divide the “actual” work size by anything smaller than the maximum device local work size, and were discussing on how to compute the “rounded up” work size.

The idea is this: given the worksize N and the local size L, we have to round N to the smallest multiple of L that is not smaller than N. This effectively means computing D = ceili(N/L) and then using D*L.

There are several ways to compute D, but on the computer, working only with integers and knowing that integer division always rounded down, what is the “best way”?

D = N/L + 1 works well if N is not a multiple of L, but gives us 1 more than the intended result if N *is* a multiple of L. So we want to add the extra 1 only if N is not a multiple. This can be achieved for example with

D = N/L + !!(N % L)

which leverages the fact that !! (double logical negation) turns any non-zero value into 1, leaving zero as zero. So we round *down* (which is what the integer division does) and then add 1 if (and only if) there is a reminder to the division.

This is ugly not so much because of the !!, but because the modulus operation % is slow.

1/n

**karolherbst** @karolherbst@chaos.social · Sep 6, 2024

Sep 6, 2024

karolherbst @karolherbst@chaos.social

Though of the day: I should make more of use lifetimes in Rust to express dependencies between API objects in #Rusticl

So far I haven't as using Arc is a good enough solution here. But I'm getting to the point where it's getting in the way.

The main reason I haven't is, that API objects are managed by the application and given #OpenCL being a C API there isn't much I can do really if the application destroys objects in a weird order.

So that's kinda annoying.

**karolherbst** @karolherbst@chaos.social · Aug 27, 2024

Aug 27, 2024

karolherbst @karolherbst@chaos.social

aaaaannnd.. the second #OpenCL CTS submission for rusticl this month

I wonder if I should go for a third or if it's enough for now...

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · Aug 18, 2024

Aug 18, 2024

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

I've finally patched/enabled #FP16 vector arithmetic support for my #OpenCL-Benchmark on Nvidia #GPUs that support it with Nvidia's NVVM-7.0-updated drivers. That is Pascal, Volta, Turing, Ampere, Ada, Hopper, Blackwell and future.
Interesting find: Nvidia Ada has cut FP16 vector throughput in half, to only 1:1 FP16:FP32 ratio instead of 2:1. And A100 has 4:1 ratio.
https://github.com/ProjectPhysX/OpenCL-Benchmark/releases/tag/v1.5

GitHubRelease OpenCL-Benchmark v1.5 · ProjectPhysX/OpenCL-Benchmarkenabled benchmarking FP16 vector arithmetic on Nvidia Pascal and newer GPUs with Nvidia driver 520 or newer removed wait() call at the end of the benchmark on Linux |----------------.------------...

**karolherbst** @karolherbst@chaos.social · Aug 6, 2024

Aug 6, 2024

karolherbst @karolherbst@chaos.social

I'm gonna fix the most annoying and most weird bug I have atm inside #rusticl : "volatile private" variables getting optimized to registers.

And you might think "why does this even remotely matter?" Turns out the #OpenCL CTS checks with a simple kernel having two constants in volatile private memory how rounding actually works on hardware.

However, we constant fold it all away, so the detection fails, which also means that "fma" tests are failing on Apple M1/M2 in weird corner cases

**Troels** @athas@freeradical.zone · May 25, 2024

May 25, 2024

Troels @athas@freeradical.zone

Is there a way to query the GPU L2 cache size (if any) in #OpenCL? Both HIP and CUDA provide this, so the hardware/driver facility exists.

**Troels** @athas@freeradical.zone · May 24, 2024

May 24, 2024

Troels @athas@freeradical.zone

#OpenCL has a compiler flag -cl-fp32-correctly-rounded-divide-sqrt. If you don't pass this, then divisions and square roots are incorrectly rounded. Shouldn't this be the other way around? How many other flags to I need to pass in order for arithmetic to be correct?

Replied in thread

**Ian Brown** @igb@mastodon.hccp.org · May 5, 2024

May 5, 2024

Ian Brown @igb@mastodon.hccp.org

@petrillic @igb@books.hccp.org

avoid writing code in hardware-specific "DSLs" like #CUDA and #OpenCL through the magic of JIT in managed runtimes

**karolherbst** @karolherbst@chaos.social · Apr 7, 2024

Apr 7, 2024

karolherbst @karolherbst@chaos.social

So, I'll be having my talk about Rusticl, Compute in the linux desktop and other related topics at IWOCL next week.

Any specifics topics you want me to cover?

#iwocl #opencl #rusticl

Continued thread

**Karsten Schmidt** @toxi@mastodon.thi.ng · Jan 9, 2024

Jan 9, 2024

Karsten Schmidt @toxi@mastodon.thi.ng

Some more screenshots of the LCOM desktop app in action...

#Genuary2024 #GenerativeArt #Vintage

**Karsten Schmidt** @toxi@mastodon.thi.ng · Jan 9, 2024

Jan 9, 2024

Karsten Schmidt @toxi@mastodon.thi.ng

Passively participating in #Genuary2024 — Day 8 Chaotic System. In 2012/13 I designed an award-winning audioreactive brand identity system for Leeds College Of Music based on the DeJong strange attractor with tens and hundreds of millions of particles per frame. This massive almost 1 year project consisted of a Mac/PC desktop app (written in Clojure, OpenCL & OpenGL) for exploring the attractor, creating presets and scheduling render jobs for super hi-res print assets (which would take a hours to render and were the biggest image sizes I _ever_ had to deal with, up to 3x3 meters @ 150 dpi). I also had to develop an entire AWS based ad-hoc render farm and asset & user management system for the school to generate personalized video assets, allowing each student to upload their own music, handle audio FFT analysis and beat detection/mapping (all in Clojure) and to create individual sound-responsive clips for their in-school digital signage system and for sharing on social media... Most key aspects were handled via various old thi.ng libraries (e.g. https://thi.ng/simplecl for OpenCL interop). The server app also handled transcoding to dozens of video formats (via ffmpeg) and semi-automatic provisioning of EC2 machines for render/transcoding jobs...

An example video is below (music: Heyoka, Blue Towel)

#GenerativeArt #Vintage #StrangeAttractor

**Karsten Schmidt** @toxi@mastodon.thi.ng · Nov 19, 2023

Nov 19, 2023

Karsten Schmidt @toxi@mastodon.thi.ng

A week ago was the 1st anniversary of this solo instance & more generally of my fulltime move to Mastodon. A good time for a more detailed intro, partially intended as CV thread (pinned to my profile) which I will add to over time (also to compensate the ongoing lack of a proper website)... Always open to consulting offers, commissions and/or suitable remote positions...

Hi, I'm Karsten — indy software engineer, researcher, #OpenSource author of hundreds of projects (since ~1999), computational/generative artist/designer, landscape photographer, lecturer, outdoor enthusiast, on the ND spectrum. Main interest in transdisplinary research, tool making, exploring techniques, projects & roles amplifying the creative, educational, expressive and inspirational potential of (personal) computation, code as material, combining this with generative techniques of all forms (quite different to what is now called and implied by "generative AI").

Much of my own practice & philosophy is about #BottomUpDesign, interconnectedness, simplicity and composability as key enablers of emergent effects (also in terms of workflow & tool/system design). Been adopting a round-robin approach to cross-pollinate my work & learning, spending periods going deep into various fields to build up and combine experience in (A-Z order): API design, audio/DSP, baremetal (mainly STM32), computer vision/image processing, compiler/DSL/VM impl, databases/linked data/query engines, data structures impl, dataviz, fabrication (3DP, CNC, knit, lasercut), file formats & protocols (as connective tissue), "fullstack" webdev (front/back/AWS), generative & evolutionary algorithms/art/design/aesthetics/music, geometry/graphics, parsers, renderers, simulation (agents/CFD/particles/physics), shaders, typography, UI/UX/IxD...

Since 2018 my main endeavor has been https://thi.ng/umbrella, a "jurassic" (as it's been called) monorepo of ~185 code libraries, addressing many of the above topics (plus ~150 examples to illustrate usage). More generally, for the past decade my OSS work has been focused on #TypeScript, #C, #Zig, #WebAssembly, #Clojure, #ClojureScript, #GLSL, #OpenCL, #Forth, #Houdini/#VEX. Earlier on, mainly Java (~15 years, since 1996).

Formative years in the deep end of the #Atari 8bit demoscene (Chip Special Software) & game dev (eg. The Brundles, 1993), B&W dark room lab (since age 10), music production/studio (from 1993-2003), studied media informatics, moved to London initially as web dev, game dev (Shockwave 3D, ActionScript), interaction designer, information architect. Branched out, more varied clients/roles/community for my growing collection of computational design tools, which I've been continously expanding/updating for the past 20+ years, and which have been the backbone of 99% of my work since ~2006 (and which helped countless artists/designers/students/studios/startups). Creator of thi.ng (since 2011), toxiclibs (2006-2013), both large-scale, multi-faceted library collections. Early contributor to Processing (2003-2005, pieces of core graphics API).

Worked on dozens of interactive installations/exhibitions, public spaces & mediafacades (own projects and many collabs, several award winning), large-scale print on-demand projects (>250k unique outputs), was instrumental in creating some of the first generative brand identity systems (incl. cloud infrastructure & asset management pipelines), collaborated with architects, artists, agencies, hardware engineers, had my work shown at major galleries/museums worldwide, taught 60+ workshops at universities, institutions and companies (mainly in EMEA). Was algorithm design lead at Nike's research group for 5 years, working on novel internal design tools, workflows, methods of make, product design (footwear & apparel) and team training. After 23 years in London, my family decided on a lifestyle change and so currently based in the beautiful Allgäu region in Southern Germany.

**Brett Edmond Carlock** @Brett_E_Carlock@mastodon.online · Oct 16, 2023 *

Oct 16, 2023 *

Brett Edmond Carlock @Brett_E_Carlock@mastodon.online

Do I have anyone in my wider network with skills in programming CUDA, SYCL, and OpenCL?

We want to determine feasibility of migrating CUDA-only code to SYCL (via SYCLomatic?): OpenCV feature detection/extraction modules (SIFT, HAGOG, ORB, AKAZE).

The intent is to upstream all feasible work.

This, hopefully, should stand to benefit everyone instead of being limited to NVIDIA.

Currently in info gathering/people connecting phase, not yet funded & ready to go.

#CUDA #SYCL #OpenCL

**Ian Brown** @igb@mastodon.hccp.org · Sep 28, 2023 *

Sep 28, 2023 *

Ian Brown @igb@mastodon.hccp.org

Got #TornadoVM installed and running on my local Linux laptop, a #Lenovo 14s Thinkpad with an 10th generation Intel® Core™ CPU and an integrated Intel® UHD graphics card.

Took a bit of futzing around with runtime dependencies, but the required packages (for Ubuntu Jammy) were:

intel-basekit
intel-oneapi-runtime-opencl
intel-opencl-icd

I also had to explicitly set

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/oneapi/compiler/2023.2.1/linux/compiler/lib/intel64_lin

#Java #Graal #JIT

**Karsten Schmidt** @toxi@mastodon.thi.ng · Jul 13, 2023 *

Jul 13, 2023 *

Karsten Schmidt @toxi@mastodon.thi.ng

As I've been updating the build files for my various #ziglang projects & templates, also learned that quite a few of them have to be overhauled/refactored due to syntax changes and a more strict compiler. One example is this #WASM #voxel #renderer from 1.5 years ago which doesn't build anymore without major code updates, but the old build still works:

https://demo.thi.ng/zig/voxel-trace/

Reload for random views. Press `x` to export current frame. The renderer is incremental (never finishes) and slowly reduces pixel size from 8 down to 1. It would be much faster, but I had some ideas for creating a more stylistic output and in this current state it only renders a fixed area per frame...

The 2-bit 512^3 voxel model was generated with a custom fork of @R4_Unit's voxel automata...

Other renders & process on my old Twitter:

https://twitter.com/search?q=from%3A%40toxi+voxel+ziglang

Ps. This renderer is heavily using this #SIMD vector library:

https://github.com/thi-ng/zig-thing/tree/main/vectors

...and is a rewrite of my 2013 hybrid #OpenCL #Clojure voxel renderer:

https://github.com/thi-ng/raymarchcl

#GenerativeArt #ThingUmbrella

Recent searches

Search options

Administered by:

Server stats:

#opencl