New research: The Passive Voice in Artificial Intelligence Language
AI systems often rely on passive voice to construct the illusion of neutrality.
SSRN → https://papers.ssrn.com/abstract=5263305
Zenodo → https://zenodo.org/records/15616036
Hello! I’m a computational linguist exploring NLP, fact-checking, and multilingual learning. Curious to connect with others working on language + AI!
#Introduction #NLP #ComputationalLinguistics #AIethics
https://www.europesays.com/2329211/ The Essential AI Tools for Business Analysts in 2025 » World Business Outlook #accuracy #AI #analytical #benefits #business #capabilities #Collaborating #Data #DataAnalysis #efficiency #Governance #interpreting #Management #Microsoft #mining #Modeling #NLP #Platforms #PowerBI #results #stakeholders #technical #tools #wbo #WorldBusinessOutlook
[LangExtract](https://developers.googleblog.com/en/introducing-langextract-a-gemini-powered-information-extraction-library/) has got me curious, but I don't get what makes it different from a [spacy-llm/prodigy](https://prodi.gy/docs/large-language-models) setup. Is it just that I am spared the effort of chunking long input and/or constructing output JSON from entities and offsets by writing the corresponding python code myself?...
Ah, one more difference is that langextract is #OpenSource whereas prodigy is not (?). (On the other hand, prodigy has a better integration with a correction+training workflow.)
This is a Ficaria verna (formerly Ranunculus ficaria L.), or lesser celandine or pilewort (as per Wikipedia and other sites).
I have tested gemma 3 4b-it-q4_0 multimodal vision model for a while: it is not accurate, and can't be trusted.
For instance, it thinks it is a Ranunculus acris (it isn't). It's really hit and miss with this model. I guess it could still be useful to provide some clues or vocabulary.
What is #NLP research 𝘳𝘦𝘢𝘭𝘭𝘺 about?
We analyzed 29k+ papers to find out!
Our NLPContributions dataset, from the ACL Anthology, reveals what authors actually contribute—artifacts, insights, and more.
Trends show a swing back towards language & society. Curious where you fit in?
Tools, data, and analysis await you:
Paper: https://arxiv.org/abs/2409.19505
Project: https://ukplab.github.io/acl25-nlp-contributions/
Code: https://github.com/UKPLab/acl25-nlp-contributions
Data: https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/4678
(1/)
Context Rot: How increasing input tokens impacts LLM performance
VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification.
This paper presents VLAI, a transformer-based model that predicts software vulnerability severity levels directly from text descriptions. Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities and achieves over 82% accuracy in predicting severity categories, enabling faster and more consistent triage ahead of manual CVSS scoring. The model and dataset are open-source and integrated into the Vulnerability-Lookup service.
We ( @cedric and I) decided to make a paper to better document how VLAI is implemented. We hope it will give other ideas and improvements in such model.
#vulnerability #cybersecurity #vulnerabilitymanagement #ai #nlp #opensource
How big of a deal would it be if someone developed a language model (kind of like ChatGPT) which didn't hallucinate, didn't use prodigious amounts of electricity/water/compute/memory, which ran locally or on a distributed user mesh instead of corporate server farms, and which remembered and learned from what you say if you want it to? Something which was reliable and testable and even interpretable -- meaning you could pop the hood and see what it's really doing. Would you be inclined to use a system like this? Are there other things you'd still take issue with?
What I told Siri to say (“I hate this sap smear across our windshield”) vs what Siri said.
I wrote this code in my spare time, over about a decade. I wrote the parser, the knowledge graph, the conversational engine, everything from scratch in pure python. It starts out literally knowing nothing except English grammar, and learns everything from interactions with the user. These tests I posted... It is literally learning the words from context as I use them. It had no idea what a dog or an apple or the color red is until I tell it things about them.
Here's another passing unit test. Again, this is stuff I wrote almost a decade ago, before LLMs were even invented.
> The brown dog bites me.
> The black dog barks at me.
> The dog that bites me hates me.
> Which dog hates me?
< The brown dog.
> Does the brown dog bite me?
< Yes.
> Does the brown dog hate me?
< Yes.
> Does the black dog hate me?
< I don't know.
> Does the dog that barks at me hate me?
< I don't know.
> Is the brown dog the dog that bites me?
< Yes.
Just dug out some of my code from almost a decade ago, pre-LLM, and got it working again. Some examples of what it could do, taken from actual passing unit tests, are below. Lines starting with > are what the user says, and lines starting with < are what the system says.
> The apple is red.
> The worm is in the apple.
> The apple is in the basket.
> Where is the worm?
< In the apple.
> Where is the apple?
< In the basket.
> Is the worm in the apple?
< Yes.
> Is the apple in the basket?
< Yes.
> Is the worm in the basket?
< I don't know.
> What is in the apple?
< The worm.
> Is the apple red?
< Yes.
Question for the digital humanities people:
Is there any good #OpenSource graphical tool for natural language processing that is both easy to use and performs a reasonable number of analyses?
I am looking for something that the average lawyer or student with a couple of weeks training could operate.
Thanks!