shakedown.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A community for live music fans with roots in the jam scene. Shakedown Social is run by a team of volunteers (led by @clifff and @sethadam1) and funded by donations.

Administered by:

Server stats:

244
active users

#llms

28 posts27 participants1 post today
Maybe one positive outcome of this AI bubble:

The one good thing that could come of this latest #AI craze is that finally people are getting used to the idea that their computer can do more for them than just follow simple point-and-click or touch-and-swipe commands. The computer can actually do more intelligent things for you.

The thing is, computers have always been able to do “intelligent” things for you, since the first mainstream commercial GUIs of the 1980s. You have always had the ability to just open up a command line and enter commands, or open up a BASIC interpreter and enter a program, and have its commands affect the GUI environment in which you are working. You can still do that nowadays with the POSIX shell (e.g. #Bash or #Zsh), and with tools that facilitate the use of command lines like #Emacs and #Vim . People think #LLMs can do that for them now, but if you learn the computer language yourself, you can control the computer yourself without needing to trust an AI to get it right for you, which it often does not.

So really, us app developers need to rebrand the idea of a command line as a “chat log,” rebrand scripting as “AI.” Thanks to AI, now maybe ordinary tech consumers may finally see the virtue of having interactive notebooks, and demanding notebook features in more of their apps. People don’t want to learn shell scripting, but if you package shell scripting as “AI,” you can get ordinary consumers to demand it of you.

Is #AI Making Us Stupider? This Study Certainly Thinks So

New research highlights the impact of AI on #CriticalThinking skills.

Posted August 21, 2025 | Reviewed by Michelle Quirk

Key points
- Using AI to do the thinking impacts not only quality of work but also the long-term acquisition of skills.
- AI can have a place in education, but it needs to be carefully employed.
- Employing AI is having a detrimental impact on our ability to think critically.

Excerpt: "A new study by #MIT has published initial results highlighting a likely connection between large language models (#LLMs)—colloquially grouped under the banner of AI—and a direct cognitive cost, particularly when it comes to our ability to think critically. They conducted a study using a pool of 54 participants, divided into three groups. The groups were asked to deliver an SAT-grade essay, with the first group granted access to #ChatGPT, the second to traditional #GoogleSearch, and the third with no search resources at all. Group 3 was referred to as the '#BrainOnly' group.

"Over time, each participant had to produce several essays, and each time was attached to an EEG to record brain activity across 32 regions. Consistently, ChatGPT users had the lowest brain engagement for every essay, and worryingly showed decreasing brain activity over time, as they progressed through the essay assignments. What this represents is the ChatGPT participants getting '#lazier' with each subsequent essay, with many resorting to copy-and-paste content by the end of the study. The study also recorded that this group 'consistently underperformed at #neural, #linguistic, and behavioral levels.' The findings make sense; with ChatGPT doing much of the work, it's easier on our #cognitive processes, and it is unsurprising that it takes less mental effort.

"Perhaps the more concerning aspect of the findings is the connection between #ReducedBrainActivity and impact on long-term learning and memory. As well as recording the participants' level of satisfaction, #curiosity, #creativity and #memory as part of the study—which were all high for both the brain-only and the search assisted groups (groups 2 and 3)—a later part of the study asked each group to reproduce one of their earlier essays, with the ChatGPT group asked to do so without access to any tools. There was little evidence of this group's participants remembering much of their previous essay, highlighting the very necessary link between #CognitiveEngagement and memory and, therefore, the potential impact on #LongTermLearning and gaining new skills. Conversely, and perhaps reassuringly, however, the brain-only group was then permitted to rewrite their essay with access to ChatGPT, with their efforts showing higher levels of creativity and stronger arguments, while retaining original thinking and unique language. This could present reassurance that, employed properly, AI has a place in enhancing learning without diminishing cognitive capability.

"Despite awaiting peer review and using only a small participant sample, the research team decided to release their findings in advance of peer feedback, in what they highlight as an urgent consideration of #LearningWithAI."

Read more:
psychologytoday.com/us/blog/th

arXiv logo
arXiv.orgBeyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized RoutingBalancing performance and efficiency is a central challenge in large language model (LLM) advancement. GPT-5 addresses this with test-time routing, dynamically assigning queries to either an efficient or a high-capacity model during inference. In this work, we present Avengers-Pro, a test-time routing framework that ensembles LLMs of varying capacities and efficiencies, providing a unified solution for all performance-efficiency tradeoffs. The Avengers-Pro embeds and clusters incoming queries, then routes each to the most suitable model based on a performance-efficiency score. Across 6 challenging benchmarks and 8 leading models -- including GPT-5-medium, Gemini-2.5-pro, and Claude-opus-4.1 -- Avengers-Pro achieves state-of-the-art results: by varying a performance-efficiency trade-off parameter, it can surpass the strongest single model (GPT-5-medium) by +7% in average accuracy. Moreover, it can match the average accuracy of the strongest single model at 27% lower cost, and reach ~90% of that performance at 63% lower cost. Last but not least, it achieves a Pareto frontier, consistently yielding the highest accuracy for any given cost, and the lowest cost for any given accuracy, among all single models. Code is available at https://github.com/ZhangYiqun018/AvengersPro.
Continued thread

Honestly, that quote is one of the most disturbing things I've seen in years, in "#AI" / #LLMs or almost anything else. And yet it's just encapsulating a common belief in a whole class of developers.

Every prediction of hyperbolic or super exponential computational scaling - every single one - has been wrong. People routinely underestimate the practical limits imposed by physics, economics, and engineering.

The prediction that #ArtificialIntelligence would result in super intelligence once an #LLM could build #AI was transparent horseshit if you know anything about the history of computing.

#MooresLaw, that is exponential scaling of computation, predates the transistor. The transistor didn't change the game, it rescued computers from the limits of vacuum tubes.

The problem is that as a technology advances in capability, it also advances in complexity. The potential hyperbolic growth is limited by exponential scaling in complexity.

Turns out that there's now a proof that error correcting LLMs is a computationally hard problem. So, the previously (already bad IMO) scaling laws for #LLMs, are actually _even_ worse. And you can see this in the response to #OpenAI and their new #GPT5.

But even without that, the energy requirements were always going to be a huge limit. A human is calorically way cheaper than a 1 trillion parameter model. A brain runs a machine many orders of magnitude more complex on less power than a lightbulb.

However... don't take this to mean AI is dead. The transformer architecture's demise was obvious to researchers like me for a long time. However, it's nowhere near the scope of the possible space to explore with AI.

In the meantime, anticipate people like me to be pushing to get AI out of the control of megacorporations and into the hands of everyone without it acting as spyware.

Government provides vital services. Their AI strategy should be to go slowly and*not* break things. I don't see how you can be confident that automated translation won't make serious mistakes with legal implications. #LLMs

(Quote posted to keep general automation frustration out of @stephanie's personal news) ottawa.place/@stephanie/115062

ottawa.placeStéphanie (@stephanie@ottawa.place)Got the dreaded email at work... [...] are now preparing to roll out GC Translate to other departments [in line with the AI Strategy]

New #research by #Salesforce about #LLMs and their tool use via #MCP 🚨

MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers:

arxiv.org/abs/2508.14704

Reliability of LLMs using MCP for real-world tasks:
- GPT-5 => 43.72%
- Grok-4 => 33.33%
- Claude-4.0-Sonnet => 29.44%

Glad there are no negative percentages, I guess!?😌

So let me get this straight: we are burning the planet for... not even a coinflip!? 🪙 🎲

🎈 💥

arXiv logo
arXiv.orgMCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol ServersThe Model Context Protocol has emerged as a transformative standard for connecting large language models to external data sources and tools, rapidly gaining adoption across major AI providers and development platforms. However, existing benchmarks are overly simplistic and fail to capture real application challenges such as long-horizon reasoning and large, unfamiliar tool spaces. To address this critical gap, we introduce MCP-Universe, the first comprehensive benchmark specifically designed to evaluate LLMs in realistic and hard tasks through interaction with real-world MCP servers. Our benchmark encompasses 6 core domains spanning 11 different MCP servers: Location Navigation, Repository Management, Financial Analysis, 3D Design, Browser Automation, and Web Searching. To ensure rigorous evaluation, we implement execution-based evaluators, including format evaluators for agent format compliance, static evaluators for time-invariant content matching, and dynamic evaluators that automatically retrieve real-time ground truth for temporally sensitive tasks. Through extensive evaluation of leading LLMs, we find that even SOTA models such as GPT-5 (43.72%), Grok-4 (33.33%) and Claude-4.0-Sonnet (29.44%) exhibit significant performance limitations. In addition, our benchmark poses a significant long-context challenge for LLM agents, as the number of input tokens increases rapidly with the number of interaction steps. Moreover, it introduces an unknown-tools challenge, as LLM agents often lack familiarity with the precise usage of the MCP servers. Notably, enterprise-level agents like Cursor cannot achieve better performance than standard ReAct frameworks. Beyond evaluation, we open-source our extensible evaluation framework with UI support, enabling researchers and practitioners to seamlessly integrate new agents and MCP servers while fostering innovation in the rapidly evolving MCP ecosystem.
#LLM#AI#AIHype

Damn, I was just about to offer myself as an AI critic to Zuck, for only $ 500,000,000 a year…

𝙈𝙖𝙧𝙠 𝙕𝙪𝙘𝙠𝙚𝙧𝙗𝙚𝙧𝙜 𝙛𝙧𝙚𝙚𝙯𝙚𝙨 𝘼𝙄 𝙝𝙞𝙧𝙞𝙣𝙜 𝙖𝙢𝙞𝙙 𝙗𝙪𝙗𝙗𝙡𝙚 𝙛𝙚𝙖𝙧𝙨

telegraph.co.uk/business/2025/

The Telegraph · Zuckerberg freezes AI hiring amid bubble fearsBy James Titcomb
Replied in thread

@david_chisnall

In animal brains, constant updating is a feature. For the purpose there are short-, mid-, and long-term memory modules, all intertelated, feeding into each other, with many variants of each type with different memory consolidation and extinction dynamics. Because the world is constantly changing, different aspects changing at different rates.

The only animals that don't have an updating brain are internal parasites that have largely dispensed with one: they don't expect the environment to change.

This limitation of #LLMs, namely that the expensive part is in the weight updating, in practice fundamentally limits its usefulness. Time to go back to working in the architecture side of things instead. Much to learn from #neuroscience.

Continued thread

Coolest thing we got was Esperanza Spalding and sinking threes at the WH.

Popularity contests, like #LLMs and #Reddit upvotes, select for mid mush. Zohran’s supposed radical ideas are pretty tame, globally.

Great literature coming through that help to further constrain what #LLMs can/cannot do and how they work. Key points: pattern matching is the main MoA in LLMs and "CoT", test inputs outside of training distribution leads to performance degradation, fine-tuning is a patch and not a panacea, further lending credence that pattern matching and not abstract reasoning is what's happening.

LLMs generate 'fluent nonsense' when reasoning outside their training zone | VentureBeat venturebeat.com/ai/llms-genera

VentureBeat · LLMs generate ‘fluent nonsense’ when reasoning outside their training zoneBy Ben Dickson

LLMs are harmful. Not just because of their output, but also because of the output their owners explicitly design them not to produce.

For example, try asking Meta's LLM what the company does to prevent their tech from having inappropriate conversations with minors and see corporate censorship in action.
--
#IrresponsibleTech #LLMs #Meta