clifff @clifff

1 post1 participant1 post today

**Calishat** @researchbuzz@researchbuzz.masto.host · 14h

Calishat @researchbuzz@researchbuzz.masto.host

#research #science #BigData #DataAnalysis #datasets

'Two hundred forty-six researchers in the fields of ecology and evolutionary biology — including two from Clemson University — worked in 174 teams to answer two different research questions based on the same unpublished data sets.

They came up with a strikingly variable range of answers, including some that were direct opposites of each other.'

https://news.clemson.edu/study-researchers-choices-could-result-in-different-conclusions-from-the-same-data/?utm_source=rss

Clemson News · 1dStudy: Researchers’ choices could result in different conclusions from the same dataThere is growing recognition among scientists that the many choices researchers must make — such as which statistical methods to apply — can lead to divergent conclusions even when the different options are all reasonable. The leaders of the study say the results ring true to a growing recognition among scientists that the many choices researchers must make — such as which statistical methods to apply — can lead to divergent conclusions even when the different options are all reasonable.

**Calishat** @researchbuzz@researchbuzz.masto.host · Mar 18

Mar 18

Calishat @researchbuzz@researchbuzz.masto.host

@simon Thank you so much for writing about this new resource. I'm not quite sure it's in his wheelhouse but @maproomblog might find it interesting. #maps #travel #datasets

https://sno.ws/opentimes/

sno.wsOpenTimes: Free travel times between U.S. Census geographiesThe blog, photos, and book reviews of Dan Snow

**Greg Cocks** @GregCocks@techhub.social · Mar 18

Mar 18

Greg Cocks @GregCocks@techhub.social

New Map Of Landscape Beneath Antarctica Unveiled
--
https://phys.org/news/2025-03-landscape-beneath-antarctica-unveiled.html <-- shared technical article
--
https://doi.org/10.1038/s41597-025-04672-y <-- shared paper
--

#GIS #spatial #mapping #Bedmap3 #icebed #surface #thickness #gridded #datasets #Antarctica #raster #model #modeling #landscape #elevation #icesheet #survey #remotesensing #earthobservation #climatechange #warming #climate #melt #melting #seafloor #subglacial #geophysical #survey #topography #geology #bathymetry #topobathy #BritishAntarcticSurvey
@BritishAntarcticSurvey

schematic / flow chart - Schematic showing the Bedmp3 source datasets (white boxes) combining to make intermediate products (blue boxes) and ultimately the final set of grids (orange boxes) and their uncertainties (yellow boxes). Note that the surface grid has a uniform estimated uncertainty (Section Uncertainty estimates).

maps - Bedmap3 grids of (a) bed topography and (b) surface elevation, in metres above sea level (g104c geoid), and (c) ice thickness in metres. Locations labelled in (b) are referred to in the text

The BedMachine v3 bed has some subjective advantages over Bedmap3 where glacier flow is relatively fast, but survey data are sparse. (a) Surface flow speed43; (b) BedMachine v3 bed topography; (c) Bedmap3 bed topography, both overlaid on shaded relief and with survey data shown as grey lines. The black dotted line highlights a trough under slow-flowing ice that is more smoothly continuous in Bedmap3. The white oval highlights the trough of a relatively fast-flowing ice stream that in BedMachine v3 is more smoothly streamlined and subjectively more realistic than in Bedmap3, given the typical streamlined form of deglaciated ice stream landscapes. An objective test of bed accuracy in this area requires more survey data.

maps - The Bedmap3 survey dataset allows us to calibrate the more continuous and extensive freeboard-based ice shelf thickness grid. (a) Pointwise ice thickness biases between 119,000 survey measurements and the freeboard-derived ice thickness grid47 on the Ross Ice Shelf, showing coherent patterns of bias; (b) these pointwise offsets median-filtered over a 20 km radius; (c) spline-interpolated calibration grid of the filtered offsets in (b); (d) bias-corrected version of the freeboard-derived ice thickness grid after subtraction of the calibration grid in (c).

**Calishat** @researchbuzz@researchbuzz.masto.host · Mar 3 *

Mar 3 *

Calishat @researchbuzz@researchbuzz.masto.host

#Toronto #Canada #crowdsourcing #datasets #OpenAccess #OpenData

'...what if Toronto’s open data portal could go beyond simply offering city-owned data and start hosting datasets created by community organizations, researchers, and academic institutions? That’s the big idea the Open Data team has been exploring. Bringing community-generated data into the City’s open data portal could lead to positive impacts for civic engagement, innovation, and collaboration.'

https://open.toronto.ca/exploring-the-future-of-open-data-hosting-community-generated-datasets-on-torontos-open-data-portal/

open.toronto.caExploring the Future of Open Data: Hosting Community-Generated Datasets on Toronto’s Open Data Portal

**Benjamin Carr, Ph.D.** @BenjaminHCCarr@hachyderm.io · Feb 18

Feb 18

Benjamin Carr, Ph.D. @BenjaminHCCarr@hachyderm.io

This data may vanish under Trump, so we charted it
Some of most valuable #datasets in human history vanished from #US #government websites, felt like watching Library of Alexandria go up in smoke
Many have gone on record describing #Census Bureau’s #American Community Survey as wonder of modern world
Another loss? #HouseholdPulse survey, online survey that provided week-by-week data on income losses, economic struggles and precarious mental health
https://www.washingtonpost.com/business/2025/02/14/this-data-may-vanish-under-trump-so-we-charted-it/
https://archive.ph/mB512

The Washington Post · Feb 14This data may vanish under Trump, so we charted itBy Andrew Van Dam

**notes** @notes@social.coop · Feb 14

Feb 14

notes @notes@social.coop

"On Friday, numerous essential #datasets were #purged from federal agency websites, including #data from #CDC PLACES (Population Level Analysis and Community Estimates), the Social Vulnerability Index (SVI), and the Climate and Economic Justice Screening Tool (CEJST)—to name just a few. While we don’t know when or if this data will return, we want to assure you that they are still accessible on our platform." https://www.policymap.com/blog/purged-federal-agency-data-available #PolicyMap #PublicHealth #USPol #Project2025 #CivilRights

PolicyMap · Feb 4Purged Federal Agency Data Available on PolicyMapOn Friday, numerous essential datasets were purged from federal agency websites, including data from CDC PLACES (Population Level Analysis and Community Estimates), the Social Vulnerability Index (SVI), and the Climate...

**PrivacyDigest** @PrivacyDigest@mas.to · Jan 30

Jan 30

PrivacyDigest @PrivacyDigest@mas.to

#Archivists Work to Identify and Save the Thousands of #Datasets Disappearing From Data.gov

Datasets aggregated on data.gov, the largest repository of U.S. government open data on the internet, are being deleted, according to the website’s own information. Since Donald #Trump was inaugurated as president, more than 2,000 datasets have disappeared from the database.
#archive

https://www.404media.co/archivists-work-to-identify-and-save-the-thousands-of-datasets-disappearing-from-data-gov/

404 Media · Jan 30Archivists Work to Identify and Save the Thousands of Datasets Disappearing From Data.govMore than 2,000 datasets have disappeared from data.gov since Trump was inaugurated. But analyzing exactly what happened and where it went is going to take some time.

**Games at Work dot biz** @gamesatwork_biz@mastodon.social · Dec 23, 2024

Dec 23, 2024

Games at Work dot biz @gamesatwork_biz@mastodon.social

e493 with Michael, Michael and Andy on #digital #storage for a century, #AI #datasets, videos & new #Oreo flavors, #hacking digital #LicensePlates and #robots, and a whole lot more!

https://gamesatwork.biz/2024/12/23/e494-license-to-brick/

Games At Work dot Biz | Play games with us! · Dec 23, 2024e494 — License to Brick | Games At Work dot Biz

More from

Games at Work dot biz

**Seth Goldstein** @seth@sethgoldstein.me · Dec 8, 2024 *

Dec 8, 2024 *

Seth Goldstein @seth@sethgoldstein.me

For The Love Of The Web. Posting Publicly Is Going To Get Used In Some Way

Sam Cole over at 404 Media wrote an article about a Hugging Face Machine Learning Librarian making a public data set of 1 million Bluesky posts available to everyone for Machine Learning.

People were of course outraged. Afterall it’s the Internet. People thrive on being outraged, pissed off, and otherwise salty.

What people seem to miss is that what they’re posting on Bluesky is public and scrapable.

The way this guy made the data set was a bit sloppy and , in my opinion, irresponsible. He didn’t anonymize the data and left personal identifiable information in the data set. He also didn’t get consent from people first.

Yea, I agree it feels a bit icky that this was done, mostly without consent or anonymizing the data. But for the love of the Web, what you put online publicly is — PUBLIC. People will see it and possibly use it for whatever they want. How hard is this to grasp?

This collection, according to Sam’s article, is also in a legal gray area right now and is going through the courts around the world.

To give some credit to the librarian, he down the data set after getting quite a bit of “feedback.”

But that didn’t stop the trolls from making even bigger data sets and putting the out online.

I really do in fact understand why people are upset, but those posts are public. Don’t post stuff and expect it to be private when it’s PUBLIC!

Honestly, I’m fine with my content that I post publicly be used to train LLMs and AI, because it will improve the technology that I benefit from.

I agree with Rand Fishkin, the founder of Moz and Sparktoro.

He posted on Bluesky:

I know others are probably upset about this, but LLM training is, for me, a benefit of participating in spaces like this. I *want* my word usage, brands, and content to be part of how AI answers questions in the future. Just like I wanted Google to index my websites.
— Rand Fishkin (@randfish.bsky.social) December 8, 2024 at 4:06 PM

I don’t think that’s crazy desire. Right? Am I completely off-base? What do you think?

white cloud sky - Photo by Kumiko SHIMIZU on UnSplash.com

#AI #Bluesky #Data

Continued thread

**CelloMom On Cars** @CelloMomOnCars@mastodon.social · Nov 23, 2024

Nov 23, 2024

CelloMom On Cars @CelloMomOnCars@mastodon.social

"Some federal #datasets are nearly irreplaceable. Hurricane Helene helped drive that fact home in September, when it flooded much of western North Carolina and temporarily knocked NOAA’s NCEI headquarters in Asheville offline. Scientists found they were unable to complete certain kinds of analyses until the databases were back up and running."

https://www.scientificamerican.com/article/scientists-scramble-to-save-climate-data-from-trump-again/

Scientific American · Nov 22, 2024Scientists Scramble to Save Climate Data from Trump—AgainBy Chelsea Harvey

**openSUSE Linux** @opensuse@fosstodon.org · May 24, 2024

May 24, 2024

openSUSE Linux @opensuse@fosstodon.org

Have you checked out @opensuse's new space on @huggingface? Discover the #AI models & #datasets we already have for #cavil's #opensource #licensing. #collaborate & #contribute! #AI #openSUSE #HuggingFace https://news.opensuse.org/2024/05/24/osp-listed-as-org-on-ai-hugging-faces/

openSUSE NewsopenSUSE Project Listed as Organization on Hugging FaceThe openSUSE Project has an official space on Hugging Face, which is a popular platform offering a range of open-source Artificial Intelligence models, tools...

**Norobiik @Norobiik@noc.social** @Norobiik@noc.social · May 19, 2024

May 19, 2024

Norobiik @Norobiik@noc.social @Norobiik@noc.social

#Datasets obtained and analyzed by Rappler show how the drug war pushed an already overwhelmed #PrisonSystem to its breaking point. #JailPopulations soared beyond the 100,000 mark, and drug defendants – and other defendants who became collateral damage – were held hostage by delayed trials. #Philippines

#Duterte’s #DrugWar pushes #prisons to a breaking point
https://www.rappler.com/newsbreak/investigative/duterte-drug-war-pushes-prisons-breaking-point/

CROWDED. The small facility of the Dasmariñas City Jail female dormitory is divided into three cells that cram 312 PDLs.

ULYSIS PONTANARES/RAPPLER

**Institute for Dissent&Datalove** @dissentanddatalove@post.lurk.org · May 7, 2024

May 7, 2024

Institute for Dissent&Datalove @dissentanddatalove@post.lurk.org

The Institute for Dissent and Datalove is a loose collective comprised of hackers, artists, activists and tinkerers. It overlaps with networks of solidarities involved in active defense of free speech and free/libre technologies, technology critics and political interventions.

The Institute for Dissent and Datalove has so far mostly been used for operations of de/re-contextualization of large datasets, de-formatting of formats and playful use of liberating algorithms.

It tries to criticize and deconstruct itself, while remaining grounded in uncompromising collective practices of autonomy and solidarity.

We even have a website: https://dissent-and-datalove.institute/

zoomed-in header of the website of the Institute for Dissent and Datalove, displaying its name and sections of the site "the artworks" / "events/expos" / "about" / "contact" on a black background

#introduction #art #datasets

**Adrianna Tan** @skinnylatte@hachyderm.io · Apr 6, 2024

Apr 6, 2024

Adrianna Tan @skinnylatte@hachyderm.io

Someone I follow just launched this site. Looks pretty interesting.

https://www.dataforindia.com/

Data For IndiaData For IndiaInsights, charts and data to create shared knowledge and expand our understanding of India

#India #Asia #Data

**Scientific Frontline** @sflorg@mastodon.social · Mar 29, 2024

Mar 29, 2024

Scientific Frontline @sflorg@mastodon.social

#Trans identities are missing from our #datasets, meaning that their experiences in a number of domains cannot be studied quantitatively. The way in which many of our datasets are constructed reinforces a cis-normative understanding of the world, where people are pushed into the false #binary of describing themselves as either male or female.
#SocialScience #LGBTQ #sflorg
https://www.sflorg.com/2024/03/ss03292401.html

www.sflorg.comTrue trans visibility requires better dataTrans identities are missing from our datasets

**ReDATA, University of Arizona** @redata@mastodon.social · Jan 30, 2024 *

Jan 30, 2024 *

ReDATA, University of Arizona @redata@mastodon.social

ReDATA staff have carefully curated and published hundreds of #datasets and other materials from #uarizona researchers. Follow us to get updates on newly published submissions as well as updates on the service itself

**danmcquillan** @danmcquillan@kolektiva.social · Sep 27, 2023

Sep 27, 2023

danmcquillan @danmcquillan@kolektiva.social

Re-reading 'On the genealogy of machine learning datasets: A critical history of ImageNet' by @alexhanna. So clear the LLM debacle goes back to the start of the DL boom; it's data fetish, flat universalism, social illiteracy & contempt for workers https://journals.sagepub.com/doi/full/10.1177/20539517211035955
#AI #datasets #Imagenet #resistingAI

**Alexander Hay** @alexanderhay@mastodon.social · Jul 13, 2023

Jul 13, 2023

Alexander Hay @alexanderhay@mastodon.social

In shock news, #AI is intrinsically #Racist, and it is only going to get worse as more and more #Data is scraped.

"...Many #Tech companies have operated under the assumption that training #ArtificialIntelligence on more data can help fix the ongoing problem of #AIs replicating #Human #Prejudices. But a study has found that AIs trained on increasingly larger #DataSets can produce even more racist results..."

#News #ChatBots #SiliconValley #IT #Tech #Technology

https://www.newscientist.com/article/2381644-using-bigger-ai-training-data-sets-may-produce-more-racist-results/

New ScientistUsing bigger AI training data sets may produce more racist resultsBy Jeremy Hsu

**beSpacific** @bespacific@newsie.social · Jul 10, 2023

Jul 10, 2023

beSpacific @bespacific@newsie.social

Comedian, author Sarah Silverman, authors Christopher Golde, Richard Kadrey r #suing #OpenAI & Meta each in #US #DistrictCourt over dual claims of #copyrightinfringement.
Suits alleges that #OpenAI’s #ChatGPT and Meta’s #LLaMA were trained on #illegally-acquired #datasets containing their works, which they say were acquired from “shadow library” websites like #Bibliotik, #LibraryGenesis, #ZLibrary, and others, noting the books are “available in bulk via torrent systems.” https://www.theverge.com/2023/7/9/23788741/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai

The VergeSarah Silverman is suing OpenAI and Meta for copyright infringement.By Wes Davis

**James Bartlett** @JamesDBartlett3@techhub.social · May 5, 2023

May 5, 2023

James Bartlett @JamesDBartlett3@techhub.social

Hey @Jkerski, I've been playing around with your #PowerBI #Datasets & #Dataflows refresh monitor report (https://www.kerski.tech/bringing-dataops-to-power-bi-part26/), and I'm loving it so far! Thanks for sharing this powerful tool with the community!

One thing I haven't figured out yet is how to make it show me the results from ALL workspaces in my tenant, rather than just a handful of workspaces that match the "Workspace(s) Query" parameter. Any tips on how to make that work?

John Kerski’s BlogPart 26: Bringing DataOps to Power BIDataOps Principle #16 - Monitor Quality and Performance

Recent searches

Search options

Administered by:

Server stats:

#datasets