shakedown.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A community for live music fans with roots in the jam scene. Shakedown Social is run by a team of volunteers (led by @clifff and @sethadam1) and funded by donations.

Administered by:

Server stats:

245
active users

#nlp

2 posts2 participants0 posts today

[LangExtract](developers.googleblog.com/en/i) has got me curious, but I don't get what makes it different from a [spacy-llm/prodigy](prodi.gy/docs/large-language-m) setup. Is it just that I am spared the effort of chunking long input and/or constructing output JSON from entities and offsets by writing the corresponding python code myself?...

Ah, one more difference is that langextract is #OpenSource whereas prodigy is not (?). (On the other hand, prodigy has a better integration with a correction+training workflow.)

developers.googleblog.comIntroducing LangExtract: A Gemini powered information extraction library- Google Developers BlogExplore LangExtract: a Gemini-powered, open-source Python library for reliable, structured information extraction from unstructured text with precise source grounding.

This is a Ficaria verna (formerly Ranunculus ficaria L.), or lesser celandine or pilewort (as per Wikipedia and other sites).

I have tested gemma 3 4b-it-q4_0 multimodal vision model for a while: it is not accurate, and can't be trusted.
For instance, it thinks it is a Ranunculus acris (it isn't). It's really hit and miss with this model. I guess it could still be useful to provide some clues or vocabulary.

🤔 What is #NLP research 𝘳𝘦𝘢𝘭𝘭𝘺 about?
We analyzed 29k+ papers to find out! 📚🔍

📌 Our NLPContributions dataset, from the ACL Anthology, reveals what authors actually contribute—artifacts, insights, and more.

📈 Trends show a swing back towards language & society. Curious where you fit in?

🎁 Tools, data, and analysis await you:

📄 Paper: arxiv.org/abs/2409.19505
🌐Project: ukplab.github.io/acl25-nlp-con
💻 Code: github.com/UKPLab/acl25-nlp-co
💾 Data: tudatalib.ulb.tu-darmstadt.de/

(1/🧵)

VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification.

This paper presents VLAI, a transformer-based model that predicts software vulnerability severity levels directly from text descriptions. Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities and achieves over 82% accuracy in predicting severity categories, enabling faster and more consistent triage ahead of manual CVSS scoring. The model and dataset are open-source and integrated into the Vulnerability-Lookup service.

We ( @cedric and I) decided to make a paper to better document how VLAI is implemented. We hope it will give other ideas and improvements in such model.

#vulnerability #cybersecurity #vulnerabilitymanagement #ai #nlp #opensource

@circl

🔗 arxiv.org/abs/2507.03607

arXiv.orgVLAI: A RoBERTa-Based Model for Automated Vulnerability Severity ClassificationThis paper presents VLAI, a transformer-based model that predicts software vulnerability severity levels directly from text descriptions. Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities and achieves over 82% accuracy in predicting severity categories, enabling faster and more consistent triage ahead of manual CVSS scoring. The model and dataset are open-source and integrated into the Vulnerability-Lookup service.

How big of a deal would it be if someone developed a language model (kind of like ChatGPT) which didn't hallucinate, didn't use prodigious amounts of electricity/water/compute/memory, which ran locally or on a distributed user mesh instead of corporate server farms, and which remembered and learned from what you say if you want it to? Something which was reliable and testable and even interpretable -- meaning you could pop the hood and see what it's really doing. Would you be inclined to use a system like this? Are there other things you'd still take issue with?

#LLM
#ChatGPT
#NLP
#NLU

Continued thread

The only actual machine learning the system uses, aside from its purely emergent ability to learn language from context, is in the parser, where I adjust the probability of matching certain patterns based on prior success in understanding the user.

#NLP
#NLU

Continued thread

I wrote this code in my spare time, over about a decade. I wrote the parser, the knowledge graph, the conversational engine, everything from scratch in pure python. It starts out literally knowing nothing except English grammar, and learns everything from interactions with the user. These tests I posted... It is literally learning the words from context as I use them. It had no idea what a dog or an apple or the color red is until I tell it things about them.

#NLP
#NLU

Continued thread

Here's another passing unit test. Again, this is stuff I wrote almost a decade ago, before LLMs were even invented.

> The brown dog bites me.
> The black dog barks at me.
> The dog that bites me hates me.
> Which dog hates me?
< The brown dog.
> Does the brown dog bite me?
< Yes.
> Does the brown dog hate me?
< Yes.
> Does the black dog hate me?
< I don't know.
> Does the dog that barks at me hate me?
< I don't know.
> Is the brown dog the dog that bites me?
< Yes.

#NLP
#NLU

Just dug out some of my code from almost a decade ago, pre-LLM, and got it working again. Some examples of what it could do, taken from actual passing unit tests, are below. Lines starting with > are what the user says, and lines starting with < are what the system says.

> The apple is red.
> The worm is in the apple.
> The apple is in the basket.
> Where is the worm?
< In the apple.
> Where is the apple?
< In the basket.
> Is the worm in the apple?
< Yes.
> Is the apple in the basket?
< Yes.
> Is the worm in the basket?
< I don't know.
> What is in the apple?
< The worm.
> Is the apple red?
< Yes.

#NLP
#NLU