shakedown.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A community for live music fans with roots in the jam scene. Shakedown Social is run by a team of volunteers (led by @clifff and @sethadam1) and funded by donations.

Administered by:

Server stats:

290
active users

#datasets

1 post1 participant1 post today

#Toronto #Canada #crowdsourcing #datasets #OpenAccess #OpenData

'...what if Toronto’s open data portal could go beyond simply offering city-owned data and start hosting datasets created by community organizations, researchers, and academic institutions? That’s the big idea the Open Data team has been exploring. Bringing community-generated data into the City’s open data portal could lead to positive impacts for civic engagement, innovation, and collaboration.'

open.toronto.ca/exploring-the-

open.toronto.caExploring the Future of Open Data: Hosting Community-Generated Datasets on Toronto’s Open Data Portal 

This data may vanish under Trump, so we charted it
Some of most valuable #datasets in human history vanished from #US #government websites, felt like watching Library of Alexandria go up in smoke
Many have gone on record describing #Census Bureau’s #American Community Survey as wonder of modern world
Another loss? #HouseholdPulse survey, online survey that provided week-by-week data on income losses, economic struggles and precarious mental health
washingtonpost.com/business/20
archive.ph/mB512

The Washington Post · This data may vanish under Trump, so we charted itBy Andrew Van Dam

"On Friday, numerous essential #datasets were #purged from federal agency websites, including #data from #CDC PLACES (Population Level Analysis and Community Estimates), the Social Vulnerability Index (SVI), and the Climate and Economic Justice Screening Tool (CEJST)—to name just a few. While we don’t know when or if this data will return, we want to assure you that they are still accessible on our platform." policymap.com/blog/purged-fede #PolicyMap #PublicHealth #USPol #Project2025 #CivilRights

PolicyMap · Purged Federal Agency Data Available on PolicyMapOn Friday, numerous essential datasets were purged from federal agency websites, including data from CDC PLACES (Population Level Analysis and Community Estimates), the Social Vulnerability Index (SVI), and the Climate...

#Archivists Work to Identify and Save the Thousands of #Datasets Disappearing From Data.gov

Datasets aggregated on data.gov, the largest repository of U.S. government open data on the internet, are being deleted, according to the website’s own information. Since Donald #Trump was inaugurated as president, more than 2,000 datasets have disappeared from the database.
#archive

404media.co/archivists-work-to

404 Media · Archivists Work to Identify and Save the Thousands of Datasets Disappearing From Data.govMore than 2,000 datasets have disappeared from data.gov since Trump was inaugurated. But analyzing exactly what happened and where it went is going to take some time.

For The Love Of The Web. Posting Publicly Is Going To Get Used In Some Way

Sam Cole over at 404 Media wrote an article about a Hugging Face Machine Learning Librarian making a public data set of 1 million Bluesky posts available to everyone for Machine Learning.

People were of course outraged. Afterall it’s the Internet. People thrive on being outraged, pissed off, and otherwise salty.

What people seem to miss is that what they’re posting on Bluesky is public and scrapable.

The way this guy made the data set was a bit sloppy and , in my opinion, irresponsible. He didn’t anonymize the data and left personal identifiable information in the data set. He also didn’t get consent from people first.

Yea, I agree it feels a bit icky that this was done, mostly without consent or anonymizing the data. But for the love of the Web, what you put online publicly is — PUBLIC. People will see it and possibly use it for whatever they want. How hard is this to grasp?

This collection, according to Sam’s article, is also in a legal gray area right now and is going through the courts around the world.

To give some credit to the librarian, he down the data set after getting quite a bit of “feedback.” 😵‍💫😜

But that didn’t stop the trolls from making even bigger data sets and putting the out online.

I really do in fact understand why people are upset, but those posts are public. Don’t post stuff and expect it to be private when it’s PUBLIC!

Honestly, I’m fine with my content that I post publicly be used to train LLMs and AI, because it will improve the technology that I benefit from.

I agree with Rand Fishkin, the founder of Moz and Sparktoro.

He posted on Bluesky:

I know others are probably upset about this, but LLM training is, for me, a benefit of participating in spaces like this. I *want* my word usage, brands, and content to be part of how AI answers questions in the future. Just like I wanted Google to index my websites.

— Rand Fishkin (@randfish.bsky.social) December 8, 2024 at 4:06 PM

I don’t think that’s crazy desire. Right? Am I completely off-base? What do you think?

#AI#Bluesky#Data
Continued thread

"Some federal #datasets are nearly irreplaceable. Hurricane Helene helped drive that fact home in September, when it flooded much of western North Carolina and temporarily knocked NOAA’s NCEI headquarters in Asheville offline. Scientists found they were unable to complete certain kinds of analyses until the databases were back up and running."

scientificamerican.com/article

Scientific American · Scientists Scramble to Save Climate Data from Trump—AgainBy Chelsea Harvey

#Datasets obtained and analyzed by Rappler show how the drug war pushed an already overwhelmed #PrisonSystem to its breaking point. #JailPopulations soared beyond the 100,000 mark, and drug defendants – and other defendants who became collateral damage – were held hostage by delayed trials. #Philippines

#Duterte’s #DrugWar pushes #prisons to a breaking point
rappler.com/newsbreak/investig

The Institute for Dissent and Datalove is a loose collective comprised of hackers, artists, activists and tinkerers. It overlaps with networks of solidarities involved in active defense of free speech and free/libre technologies, technology critics and political interventions.

The Institute for Dissent and Datalove has so far mostly been used for operations of de/re-contextualization of large datasets, de-formatting of formats and playful use of liberating algorithms.

It tries to criticize and deconstruct itself, while remaining grounded in uncompromising collective practices of autonomy and solidarity.

We even have a website: dissent-and-datalove.institute

#Trans identities are missing from our #datasets, meaning that their experiences in a number of domains cannot be studied quantitatively. The way in which many of our datasets are constructed reinforces a cis-normative understanding of the world, where people are pushed into the false #binary of describing themselves as either male or female.
#SocialScience #LGBTQ #sflorg
sflorg.com/2024/03/ss03292401.

www.sflorg.comTrue trans visibility requires better dataTrans identities are missing from our datasets

In shock news, #AI is intrinsically #Racist, and it is only going to get worse as more and more #Data is scraped.

"...Many #Tech companies have operated under the assumption that training #ArtificialIntelligence on more data can help fix the ongoing problem of #AIs replicating #Human #Prejudices. But a study has found that AIs trained on increasingly larger #DataSets can produce even more racist results..."

#News #ChatBots #SiliconValley #IT #Tech #Technology

newscientist.com/article/23816

New ScientistUsing bigger AI training data sets may produce more racist resultsBy Jeremy Hsu

Comedian, author Sarah Silverman, authors Christopher Golde, Richard Kadrey r #suing #OpenAI & Meta each in #US #DistrictCourt over dual claims of #copyrightinfringement.
Suits alleges that #OpenAI’s #ChatGPT and Meta’s #LLaMA were trained on #illegally-acquired #datasets containing their works, which they say were acquired from “shadow library” websites like #Bibliotik, #LibraryGenesis, #ZLibrary, and others, noting the books are “available in bulk via torrent systems.” theverge.com/2023/7/9/23788741

The VergeSarah Silverman is suing OpenAI and Meta for copyright infringement.By Wes Davis

Hey @Jkerski, I've been playing around with your #PowerBI #Datasets & #Dataflows refresh monitor report (kerski.tech/bringing-dataops-t), and I'm loving it so far! Thanks for sharing this powerful tool with the community!

One thing I haven't figured out yet is how to make it show me the results from ALL workspaces in my tenant, rather than just a handful of workspaces that match the "Workspace(s) Query" parameter. Any tips on how to make that work?

John Kerski’s BlogPart 26: Bringing DataOps to Power BIDataOps Principle #16 - Monitor Quality and Performance