shakedown.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A community for live music fans with roots in the jam scene. Shakedown Social is run by a team of volunteers (led by @clifff and @sethadam1) and funded by donations.

Administered by:

Server stats:

246
active users

#transformer

0 posts0 participants0 posts today

youtube.com/watch?v=UGO_Ehywux
absolutely amazing talk on #LLM "Mechanistic Interpretability", or in layman's terms, basically how to untangle LLMs and understand how they think, and what part of the neurons are responsible for something like 'doubt'.

I really like they use approachable not overly technical language in this, I can really recommend watching this if you want to understand more about how #Transformer architecture works.

A while ago i wrote here about a paper that has discovered that LLMs store knowledge in reproductions of simple linear functions. Although not all knowledge more things like "what is the capital of germany". [0]

There is something called Kolmogorov Arnold Networks [1] which build it's network directly on top of mathematical function in addition with some activation function.

So i am asking me now what a mixed network would be able to do, i imagine it quite performant to be able to store linear functions as such and not to have them simulated by a group of neurons.
But then again to bring the model to store these information's directly into such function is maybe impossible without some resource usage reward... which has the potential to introduce a whole extra of problems.
And then there is the problem with how to integrate such a thing in the first place.

[0] arxiv.org/pdf/2308.09124

[1] github.com/KindXiaoming/pykan

#AI#LLM#KAN