shakedown.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A community for live music fans with roots in the jam scene. Shakedown Social is run by a team of volunteers (led by @clifff and @sethadam1) and funded by donations.

Administered by:

Server stats:

246
active users

#computerscience

7 posts7 participants0 posts today
AI has a greater chance of failure the longer it works on a task

I was browsing over this research paper by Toby Ord who talks about comparing an LLM working on a lengthy task as compared to a human working on the same lengthy task. The longer it takes to complete the task, the greater the odds of failure for the LLM, in fact the odds of failure increase exponentially with time. Naturally, larger LLMs with more computing power can go for longer, but the exponential odds of failure with respect to time trend are the same no matter what.

So for example you might be able to get the LLM to work on a task for 1 hour with a 8-billion parameter model and expect it will succeed at the task 50% of the time. Maybe you can get 2 hours out of a 16-billion parameter model where you can expect it will succeed at the task 50% of the time (I am guessing these parameter sizes). But after that “half life” the odds of the LLM succeeding taper-off to zero percent.

I haven’t read the little details yet, like how do you judge when the task is done (I presume when the LLM claims that it has finished the job), or do these tests allow multiple prompts (I presume it is just fed an input once and allowed to churn on that input until it believes it is finished). So that makes me wonder, could you solve this problem, increase the AI’s rate of success, if you combine it with classical computing methods? For example, perhaps you can ask the LLM to list the steps it would perform to complete a task, then parse the list of steps into a list of new prompts, then feed-back each of those prompts to the LLM again, each one producing another list of sub tasks — could you keep breaking-down the tasks and feeding them back into the AI to increase it’s odds of success?

It is an interesting research question. I am also interested to see how much energy and water this takes as compared to a human working on the same task, including the caloric intake of food, and perhaps the energy used to harvest, process, and deliver the food.

Toby OrdIs there a Half-Life for the Success Rates of AI Agents? — Toby OrdBuilding on the recent empirical work of Kwa et al. (2025), I show that within their suite of research-engineering tasks the performance of AI agents on longer-duration tasks can be explained by an extremely simple mathematical model — a constant rate of failing during each minute a human would take
#tech#AI#LLM

This is for the super nerds, so don't feel bad if you don't get it.

I asked ChatGPT to design a menu with Dutch food influences for an Edsger W. Dijkstra-themed restaurant based upon his work. I then asked it to create the LaTeX code to generate a printable version of the menu.

No notes. Perfection. Lost in the PDF generation was that drinks were labeled as “Side Effects (Handled)" which is divine.

🧬 The Future Of Discover: What AlphaEvolve Tells Us About the Future of Human Knowledge
buzzsprout.com/2405788/episode
helioxpodcast.substack.com/pub

The kind of breakthrough that makes you wonder what else we've been missing, what other solutions have been hiding in plain sight, waiting for the right kind of intelligence to find them.

#AlphaEvolve #DeepMind #Google#AI #AlphaEvolve #DeepMind #MachineLearning #OpenScience #TechEthics #AlgorithmicDiscovery #ComputerScience #Innovation #TechCriticism

#softwareEngineering #computerScience #programming #lisp #commonLisp #interview #macro #discussion with historical notes-

screwlisp.small-web.org/show/V

My quick notes on the downloadable interview discussion with @vnikolov and @kentpitman About Vassil's assertables classed toggleable assertion macro design.

Provokes lots of fascinating historical notes from Kent about what the ANSI CL and earlier standardisations were doing and had in mind.

screwlisp.small-web.orgVassil Nikolov’s assertables with Kent Pitman

#computerScience #engineering #commonLisp #show #live #lispyGopherClimate communitymedia.video/w/uBZexon

#climateCrisis #haiku @kentpitman

We have @vnikolov talking about common lisp and type checking macros

+:
We do not have incredible artist @shizamura who has her fourth #scifi comic volume finished being funded or something (?) sarilho.net/en/ (if you speak english and not portuguese).
She promises to record something about semantics for us in the future.

#lambdaMOO live chat

Computer engineer and Apple veteran William "Bill" Atkinson has died of pancreatic cancer at age 74. Atkinson created the QuickDraw graphics engine, which made the Macintosh interface possible and, says @arstechnica's @benjedwards, "transformed abstract computer science into intuitive visual experiences that millions would use daily."

"I say this with no hyperbole: Bill Atkinson may well have been the best computer programmer who ever lived," wrote veteran Apple analyst @gruber on his Daring Fireball blog. "Without question, he's on the short list. What a man, what a mind, what gifts to the world he left us." Here's Edwards' story; find Gruber's full tribute at the second link.

flip.it/UJO0cf

flip.it/ImgyIy

Bill Atkinson in 1987.
Ars Technica · Bill Atkinson, architect of the Mac’s graphical soul, dies at 74By Benj Edwards

I watched on YouTube, where Dr. Kelsey Houston-Edwards (PhD Mathematics, Cornell) explained a recent discovery in CS by Ryan Williams involving computational complexity.

She started out by explaining Turing Machines, to give some intuition.

Anyway, the comment section is filled with “duh obviously Turing machines use tapes? of course modern computers with disks are faster idiot” and/or “quantum woo says time isn’t real so this is meaningless” and it’s giving me a headache.

How Accurately Do Large Language Models Understand Code?

arxiv.org/html/2504.04372v1

"This paper presents the first large-scale empirical investigation into the ability of LLMs to understand code. Inspired by mutation testing, we use an LLM’s ability to find faults as a proxy for its deep understanding of code. This approach is based on the insight that a model capable of identifying subtle functional discrepancies must understand the code well."

It appears that coding LLMs are vulnerable to misleading code comments, misleading variable names and misleading dead code. They still have shallow understanding of code, based on syntax and tokenization designed for natural languages, instead of analyzing code semantics. Writing a lot of incorrect comments can confuse them 😉

arxiv.orgHow Accurately Do Large Language Models Understand Code?

I asked a Harvard postdoc which skills are essential to thrive as a researcher in AI & Biomedicine:

His answer:
:blobcoffee: Cultivate abstract thinking.
:blobcoffee: Build solid foundations instead of chasing hypes.
:blobcoffee: Think independently, embrace a do-it mindset, stay curious and persistent.

Knowledge is accessible. Thinking is up to us.

What skills are you trying to develop?