clifff @clifff

Olivier LeroyMy talks at <a href="https://mastodon.social/@useR_conf" class="u-url mention" rel="nofollow noopener" target="_blank">@useR_conf</a> is here <a href="https://defuneste.codeberg.page/useR_2025/" rel="nofollow noopener" translate="no" target="_blank">https://defuneste.codeberg.page/useR_2025/</a>tldr: I think storing "big" data as a parquet files, stored in s3 accessed with duckDB and wrapped in an R package is a nice way to save some of your sanity. Now that we know that DuckDB is great let start showing how R can make it in production! 😉 Side notes: loved using {litedown} and codeberg for the prez. Mermai.js you are also great but I am not rdy!<a href="https://fosstodon.org/tags/Rstats" class="mention hashtag" rel="nofollow noopener" target="_blank">#Rstats</a> <a href="https://fosstodon.org/tags/DuckDB" class="mention hashtag" rel="nofollow noopener" target="_blank">#DuckDB</a>

Ryan KennedyI’ve always known that the <a href="https://hachyderm.io/tags/DuckDB" class="mention hashtag" rel="nofollow noopener" target="_blank">#DuckDB</a> appender interface was the way to go for bulk loading data. But today I had reason to write a <a href="https://hachyderm.io/tags/Golang" class="mention hashtag" rel="nofollow noopener" target="_blank">#Golang</a> benchmark to see just how much faster it is and discovered it’s at least 250x faster (on my laptop) at inserting a bigint into a table.I tested both in-memory and on-disk as well as testing INSERT with auto-commit and with batched commits at various batch sizes.<a href="https://gist.github.com/rkennedy-argus/9e9b2a9fe79d7b098ff40bfb4ffc0384" rel="nofollow noopener" translate="no" target="_blank">https://gist.github.com/rkennedy-argus/9e9b2a9fe79d7b098ff40bfb4ffc0384</a>I suppose I should test INSERTs with prepared statements, too. But I doubt they’ll put much of a dent in that difference.

Ryan Kennedy<a href="https://mastodon.nl/@christoffel66" class="u-url mention" rel="nofollow noopener" target="_blank">@christoffel66</a> exactly…the “highest” available in the list provided. So far the ORDER BY with list_position seems to be the clearest winner in terms of readability and not repeating itself. This is using <a href="https://hachyderm.io/tags/DuckDB" class="mention hashtag" rel="nofollow noopener" target="_blank">#DuckDB</a>.

boB Rudis 🇺🇦 🇬🇱 🇨🇦I track <a href="https://infosec.exchange/@stratosphere" class="u-url mention" rel="nofollow noopener" target="_blank">@stratosphere</a>'s posts & their bot has a daily top 10 sketch IPs list. My 🧠 kept 👀 lots of "*.100" IPs & I was curious how frequently they showed up.Went back 200 posts w/GH:McKael/madonctl using both R and DuckDB.Def block these.— <a href="https://mastodon.social/tags/DuckDB" class="mention hashtag" rel="nofollow noopener" target="_blank">#DuckDB</a>: <a href="https://ray.so/SdMcBZa" rel="nofollow noopener" translate="no" target="_blank">https://ray.so/SdMcBZa</a> — <a href="https://mastodon.social/tags/RStats" class="mention hashtag" rel="nofollow noopener" target="_blank">#RStats</a>: <a href="https://ray.so/naTBBMS" rel="nofollow noopener" translate="no" target="_blank">https://ray.so/naTBBMS</a>

Data QuineDiscovering DuckDB Use Cases via GitHub - Petrica Leuca"TL;DR: In this post, we use the GitHub API to find repositories that mention DuckDB, then use DuckDB itself to parse and query the results efficiently with SQL."Think a lot of projects/organisations would like some of the techniques shown in this post for finding out who else is using their code on GitHub and for what purpose.<a href="https://duckdb.org/2025/06/27/discovering-w-github" rel="nofollow noopener" translate="no" target="_blank">https://duckdb.org/2025/06/27/discovering-w-github</a><a href="https://datasci.social/tags/OpenSource" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenSource</a> <a href="https://datasci.social/tags/GitHub" class="mention hashtag" rel="nofollow noopener" target="_blank">#GitHub</a> <a href="https://datasci.social/tags/DuckDB" class="mention hashtag" rel="nofollow noopener" target="_blank">#DuckDB</a>

aeriqueGreat talk by Hannes Mühleisen of <a href="https://genart.social/tags/DuckDB" class="mention hashtag" rel="nofollow noopener" target="_blank">#DuckDB</a> about tables being a fundamental technology to civilization and not dismissing databases, SQL & ACID just because some implementation are getting old in the tooth.DuckDB sounds awesome and I know <a href="https://mastodon.nl/@bert_hubert" class="u-url mention" rel="nofollow noopener" target="_blank">@bert_hubert</a> is a big fan.<a href="https://genart.social/tags/JoyOfCoding" class="mention hashtag" rel="nofollow noopener" target="_blank">#JoyOfCoding</a> <a href="https://genart.social/tags/JoyOfCoding2025" class="mention hashtag" rel="nofollow noopener" target="_blank">#JoyOfCoding2025</a>

hrbrmstr's Daily DropDrop #669 (2025-06-23): Monday Morning (Barely) Grab BagRube Goldberg X-traction Pipeline; fplot; Color Everything in CSSSomething for (hopefully) everyone as we start off this brutally hot (in many parts of the northern hemisphere) terminal week of June.Stay safe out there. Type your email…Subscribe TL;DR(This is an LLM/GPT-generated summary of today’s Drop using Ollama + Qwen 3 and a custom prompt.)<ul><li>A Rube Goldberg-inspired data pipeline is created to archive X posts into a DuckDB database, using XCancel, Inoreader, and a DuckDB script for automation (<a href="https://en.wikipedia.org/wiki/Rube_Goldberg" rel="nofollow noopener" target="_blank">https://en.wikipedia.org/wiki/Rube_Goldberg</a>)</li><li>The <code>{fplot}</code> R package automates the creation of distribution plots by detecting data types and selecting appropriate visualizations, with options for global relabeling of variables (<a href="https://lrberge.github.io/fplot/" rel="nofollow noopener" target="_blank">https://lrberge.github.io/fplot/</a>)</li><li>The CSS-Tricks article “Color Everything in CSS” provides an in-depth look at color spaces, models, and gamuts in modern web development, offering a comprehensive guide to advanced CSS color techniques (<a href="https://css-tricks.com/color-everything-in-css/" rel="nofollow noopener" target="_blank">https://css-tricks.com/color-everything-in-css/</a>)</li></ul> Rube Goldberg X-traction PipelineI don’t see many mentions of <a href="https://en.wikipedia.org/wiki/Rube_Goldberg" rel="nofollow noopener" target="_blank">Rube Goldberg</a> in pop-culture settings anymore, which is a shame, since I used to enjoy poring over them in my younger days. Perhaps the reason for the lack of mentions is that many data pipelines have much in common with those complex, over-“engineerd” contraptions.Case in point for a recent “need” of mine: I wanted a way to store posts from users on X into a DuckDB database, for archival and research purposes. I already use <a href="https://xcancel.com" rel="nofollow noopener" target="_blank">XCancel</a>’s ability to generate an RSS feed for an account/search, which I yank into Inoreader for the archival part (the section header shows the XCancel-generated RSS feed for the White House’s other, even more MAGA, propaganda account).Inoreader’s API is…not great. It can most certainly be machinated (I have an R package with the function I need in it), but I really wanted a solution that let me just use DuckDB for all the work.Then, I rememberd, if you put feeds in Inoreader folders, you can turn that folder into a JSON feed that gets updates every ~30 minutes or so. This one:is for a series of feeds related to what’s going on in the Middle East right now.With that JSON URL in hand, it’s as basic as: <pre>#!/usr/bin/env bash# for cache bustingepoch=$(date +%s)duckdb articles.ddb <<EOQLOAD json;INSTALL shellfs FROM community;LOAD shellfs;CREATE TABLE IF NOT EXISTS broadcast_feed_items ( url VARCHAR PRIMARY KEY, title VARCHAR, content_html VARCHAR, date_published VARCHAR, tags VARCHAR[], authors JSON);-- this is where the update magic happensINSERT OR IGNORE INTO broadcast_feed_itemsFROM read_json('curl -s https://www.inoreader.com/stream/user/##########/tag/broadcast/view/json?since=${epoch} | jq .items[] |')SELECT url, title, content_html, date_published, tags, authors;-- Thinned out JSON content for viewing appCOPY ( FROM broadcast_feed_items SELECT content_html, -- "title" is useless for the most part since this is an X post date_published AS "timestamp", regexp_replace(authors.name, '"', '', 'g') AS handle) TO 'posts.json' (FORMAT JSON, ARRAY );EOQ</pre> There are other ways to unnest the data than using <code>jq</code> and the <code>shellfs</code> DuckDB extension, but the more RG the better (for this post)!So the final path is:X -> XCancel -> XCancel RSS -> Inoreader -> Inoreader JSON -> jq -> DuckDBwith virtually no code (save for the snippet, above).I’ve got this running as a systemd timer/service running every 30 minutes.Later this week (when I’m done hand-coding it—yes, sans-Claude), I’ll have a Lit-based vanilla HTML/CS/JS viewer app in one of the Drops. fplot(This is an <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://dailydrop.hrbrmstr.dev/tag/rstats/" target="_blank">#RStats</a> section, so def move along if that is not your cuppa.)My daily git-stalking led me to finding this gem of an R package.<a href="https://lrberge.github.io/fplot/" rel="nofollow noopener" target="_blank"><code>{fplot}</code></a> (<a href="https://github.com/lrberge/fplot?tab=readme-ov-file" rel="nofollow noopener" target="_blank">GH</a>) is designed to automate and simplify the visualization of data distributions (something I have to do every. single. day.). Its core mission is to let folks quickly generate meaningful and aesthetically pleasing distribution plots, regardless of the underlying data type (it supports continuous, categorical, or skewed), by making spiffy choices about the appropriate graphical representation for each variable.Functions in the package detect the nature of your data (e.g., categorical vs. continuous, skewed or not) and automatically selects the most suitable plot type. For example, it will not use the same visualization for a categorical variable as it would for a continuous one, and it adapts further if the data is heavily skewed.Ergonomics are pretty dope, since you only need a single line of code to generate a plot, with the package handling the details of layout and type selection. This is particularly useful for exploratory data analysis or for folks who want quick, visually appealing graphics without extensive customization.Tools are provided to globally relabel variable names for all plots. This is managed via the <code>setFplot_dict()</code> function, which lets us map cryptic/gosh awful or technical variable names to more readable labels that will appear in all subsequent plots.Example usage: <pre>setFplot_dict(c( Origin = "Exporting Country", Destination = "Importing Country", Euros = "Exports Value in €", jnl_top_25p = "Pub. in Top 25% journal", jnl_top_5p = "Publications in Top 5% journal", journal = "Journal", institution = "U.S. Institution", Petal.Length = "Petal Length"))</pre> The typical workflow with fplot is straightforward:<ol><li>Load your data.</li><li>Optionally set global variable labels using <code>setFplot_dict()</code>.</li><li>Call the <code>fplot</code> function on your variable(s) of interest.</li><li>The package automatically determines the best plot type and layout for your data.</li></ol>The same function call can yield different types of plots depending on the data provided, streamlining the process of distributional analysis and visualization.A gallery of examples and a more detailed walk-through are available on the package’s website. Color Everything in CSSThe CSS-Tricks article “<a href="https://css-tricks.com/color-everything-in-css/" rel="nofollow noopener" target="_blank">Color Everything in CSS</a>” offers a comprehensive, up-to-date exploration of how color works in CSS, moving beyond just the basics of color and background-color to cover the deeper technical landscape of color on the web. The article introduces essential concepts like color spaces, color models, and color gamuts, which are foundational for understanding how colors are represented, manipulated, and rendered in browsers today.We’ve covered many of these individual topics before, but this is a well-crafted, all-in-one that does such a good job, I do not wish to steal any thunder from it. Head on over for to level up your CSS skills. FINRemember, you can follow and interact with the full text of The Daily Drop’s free posts on:<ul><li>🐘 Mastodon via <code>@dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev</code></li><li>🦋 Bluesky via <code>https://bsky.app/profile/dailydrop.hrbrmstr.dev.web.brid.gy</code></li></ul>☮️<a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://dailydrop.hrbrmstr.dev/tag/duckdb/" target="_blank">#duckdb</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://dailydrop.hrbrmstr.dev/tag/rstats/" target="_blank">#RStats</a>

boB Rudis 🇺🇦 🇬🇱 🇨🇦Never gonna not 💙 <a href="https://mastodon.social/tags/DuckDB" class="mention hashtag" rel="nofollow noopener" target="_blank">#DuckDB</a>'s built-in bar chart function

Francis 🏴‍☠️ Gulottaoops <a href="https://toot.cafe/tags/til" class="mention hashtag" rel="nofollow noopener" target="_blank">#til</a> to use <a href="https://toot.cafe/tags/duckdb" class="mention hashtag" rel="nofollow noopener" target="_blank">#duckdb</a> to query a CSV, generate date ranges, use windowing functions to backfill data and pivot functions to make data that you can easily graph in a spreadsheet.based upon; - average solar radiation distribution over the year for my area - My actual kwh production and usage for the last month (which <a href="https://toot.cafe/tags/homeassistant" class="mention hashtag" rel="nofollow noopener" target="_blank">#homeassistant</a> gives as data change events, not hourly or daily reporting) - The KWHs I've spent on AC that I expect to increase over the summerI'm operating at 85% capacity 🙌

Christos Argyropoulos MD PhDSince we can't use the cloud to automate our <a href="https://mastodon.social/tags/EHR" class="mention hashtag" rel="nofollow noopener" target="_blank">#EHR</a> analysis projects , we tried <a href="https://mastodon.social/tags/nextflow" class="mention hashtag" rel="nofollow noopener" target="_blank">#nextflow</a> (traditionally used in bioinformatics <a href="https://www.nextflow.io/" rel="nofollow noopener" translate="no" target="_blank">https://www.nextflow.io/</a>), and it worked as a charm coordinating <a href="https://mastodon.social/tags/duckdb" class="mention hashtag" rel="nofollow noopener" target="_blank">#duckdb</a> <a href="https://mastodon.social/tags/R" class="mention hashtag" rel="nofollow noopener" target="_blank">#R</a> <a href="https://mastodon.social/tags/python" class="mention hashtag" rel="nofollow noopener" target="_blank">#python</a> and within node tasks. Next in line is (R)?ex <a href="https://www.rexify.org/" rel="nofollow noopener" translate="no" target="_blank">https://www.rexify.org/</a>

Michael Simons> Anyone who has worked for more than 5 minutes in an enterprise more than 30 miles outside San Fransisco know that the vast majority of information in the enterprise is cataloged and transacted via Excel spreadsheets. And if you're lucky, these spreadsheets are accessible to more than one person at a time via platforms like SharePoint.This is large. Or shall I say x-large? It basically excels and will essentially solve all German Enterprise IT issues:<a href="https://github.com/gregwdata/ducklakexl" rel="nofollow noopener" translate="no" target="_blank">https://github.com/gregwdata/ducklakexl</a><a href="https://mastodon.social/tags/DuckDB" class="mention hashtag" rel="nofollow noopener" target="_blank">#DuckDB</a>

YazadA general <a href="https://techhub.social/tags/softwareengineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#softwareengineering</a> question, when using the word <a href="https://techhub.social/tags/embedded" class="mention hashtag" rel="nofollow noopener" target="_blank">#embedded</a> how do you distinguish between embedded as in <a href="https://techhub.social/tags/firmware" class="mention hashtag" rel="nofollow noopener" target="_blank">#firmware</a> vs <a href="https://techhub.social/tags/embedded" class="mention hashtag" rel="nofollow noopener" target="_blank">#embedded</a> as in in process like a programming language like <a href="https://techhub.social/tags/lua" class="mention hashtag" rel="nofollow noopener" target="_blank">#lua</a> that is used in a game engine or a database like <a href="https://techhub.social/tags/sqllite" class="mention hashtag" rel="nofollow noopener" target="_blank">#sqllite</a> or <a href="https://techhub.social/tags/duckdb" class="mention hashtag" rel="nofollow noopener" target="_blank">#duckdb</a> that (can potentially) run in the host application. As in do you use different terms like <a href="https://techhub.social/tags/embeddedsystems" class="mention hashtag" rel="nofollow noopener" target="_blank">#embeddedsystems</a> or <a href="https://techhub.social/tags/EmbeddedSoftware" class="mention hashtag" rel="nofollow noopener" target="_blank">#EmbeddedSoftware</a> to refer to one or the other? Or am I completely off base here?

**James Gilbert** @jgrg@mstdn.science · May 21

May 21

James Gilbert @jgrg@mstdn.science

New #DuckDB release 1.3.0️
Looks like a solid release, including better compression of strings. #SQL #Analytics
https://duckdb.org/2025/05/21/announcing-duckdb-130.html

DuckDB · May 21Announcing DuckDB 1.3.0The DuckDB team is happy to announce that today we're releasing DuckDB version 1.3.0, codenamed “Ossivalis”.

**Esk** @esk@hachyderm.io · Mar 29 *

Mar 29 *

Esk @esk@hachyderm.io

howdy, #hachyderm!

over the last week or so, we've been preparing to move hachy's #DNS zones from #AWS route 53 to bunny DNS.

since this could be a pretty scary thing -- going from one geo-DNS provider to another -- we want to make sure *before* we move that records are resolving in a reasonable way across the globe.

to help us to do this, we've started a small, lightweight tool that we can deploy to a provider like bunny's magic containers to quickly get DNS resolution info from multiple geographic regions quickly. we then write this data to a backend S3 bucket, at which point we can use a tool like #duckdb to analyze the results and find records we need to tweak to improve performance. all *before* we make the change.

then, after we've flipped the switch and while DNS is propagating -- -- we can watch in real-time as different servers begin flipping over to the new provider.

we named the tool hachyboop and it's available publicly --> https://github.com/hachyderm/hachyboop

please keep in mind that it's early in the booper's life, and there's a lot we can do, including cleaning up my hacky code.

attached is an example of a quick run across 17 regions for a few minutes. the data is spread across multiple files but duckdb makes it quite easy for us to query everything like it's one table.

#sre #devops #mastodon

**Jason Punyon** @JasonPunyon@fosstodon.org · Jan 23 *

Jan 23 *

Jason Punyon @JasonPunyon@fosstodon.org

is it hot in here or is it this new #duckdb query profiler output?

zomg that heatmap

**jmsfbs** @jaforbes@mastodon.social · Dec 3, 2024

Dec 3, 2024

jmsfbs @jaforbes@mastodon.social

I'm James, I've only recently started using Mastodon. Mainly after #cohost shut down.

I love #music. I play guitar and sing, solo and in a band. I live in the #bluemountains in Australia.

I'm also a software dev who's pretty active in the #JS ecosystem. I love #postgres and #duckdb I used to be more active in #fp spaces but not so much now.

Completely as a hobby I'm interested in #gamedev / #gamedesign

I'm fairly #left leaning but I don't get too political online.

#introduction

**Jason Punyon** @JasonPunyon@fosstodon.org · Nov 20, 2024

Nov 20, 2024

Jason Punyon @JasonPunyon@fosstodon.org

#duckdb 1, demo gods 0

**Jon Udell** @judell@social.coop · Sep 20, 2024 *

Sep 20, 2024 *

Jon Udell @judell@social.coop

Powerpipe dashboards can now connect not only to Steampipe but also to SQLite and DuckDB. This creates a combinatorial explosion of possibilities, including dashboards that use SQL to visualize large datasets read from Parquet files by DuckDB.

https://thenewstack.io/llm-assisted-translation-from-postgres-to-sqlite-and-duckdb/

The New Stack · Sep 16, 2024LLM-Assisted Translation From Postgres to SQLite and DuckDBJon Udell considers whether SQLite and DuckDB can be analytics alternatives to Postgres — with a little help from his team of LLM assistants.

#sql #duckdb #parquet

**Marcos Huerta** @marcoshuerta@vmst.io · Aug 16, 2024 *

Aug 16, 2024 *

Marcos Huerta @marcoshuerta@vmst.io

I liked using #polars for my recent analysis of those large parquet files, but #duckdb did the same core computation (filtering and counting with a group by, directly from the parquet files) about 50% faster. 224ms vs 361ms. (EDITED as my earlier numbers were not the same calculations…)

But subsequent steps (normalizing the counts, creating cumulative sums) are just much more intuitive for me to write in python with polars than writing more sql.

**Dr. Robert M Flight** @rmflight@mastodon.social · Aug 15, 2024

Aug 15, 2024

Dr. Robert M Flight @rmflight@mastodon.social

If watching the #PositConf2024 keynote on #DuckDB got you excited, the awesome @hrbrmstr has a web book on getting started with DuckDB, "Cooking with DuckDB".

https://duckdb.hrbrmstr.app/

duckdb.hrbrmstr.appCooking With DuckDB

Recent searches

Search options

Administered by:

Server stats:

#duckdb