shakedown.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A community for live music fans with roots in the jam scene. Shakedown Social is run by a team of volunteers (led by @clifff and @sethadam1) and funded by donations.

Administered by:

Server stats:

255
active users

#robotstxt

5 posts4 participants0 posts today
Kevin Karhan :verified:<p><span class="h-card" translate="no"><a href="https://daedal.io/@thomzane" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>thomzane</span></a></span> the problem with <a href="https://robotstxt.org" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">robotstxt.org</span><span class="invisible"></span></a> / <a href="https://infosec.space/tags/robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotstxt</span></a> is that <a href="https://social.tchncs.de/@cgudrian/114980303354716730" rel="nofollow noopener" target="_blank">it's a mere <em>ask</em></a> and not something enforced.</p><ul><li>And with <em>"enforced"</em> I mean <code>.htaccess</code> or even better <a href="https://infosec.space/tags/iptables" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>iptables</span></a> at the hoster's site.</li></ul><p>Personally I tend to go <em>full "asshole mode"</em> on <a href="https://infosec.space/tags/hostile" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>hostile</span></a> actors like <a href="https://infosec.space/tags/NSAbook" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>NSAbook</span></a>!</p>
Kevin Karhan :verified:<p><span class="h-card" translate="no"><a href="https://ohai.social/@lina" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>lina</span></a></span> np. </p><p>I think it's important to highlight such projects like <a href="https://cuiiliste.de" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">cuiiliste.de</span><span class="invisible"></span></a> and actually point people at it in the hopes that <span class="h-card" translate="no"><a href="https://mastodon.social/@ooni" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>ooni</span></a></span> and <span class="h-card" translate="no"><a href="https://mastodon.social/@citizenlab" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>citizenlab</span></a></span> add it to their <a href="https://github.com/citizenlab/test-lists/" rel="nofollow noopener" target="_blank">test lists...</a></p><ul><li>Personally I did add it to my <a href="https://github.com/greyhat-academy/lists.d/blob/cacf5b9fbd80affd34d760c50b0b7333def000bf/blocklists.list.tsv#L28" rel="nofollow noopener" target="_blank">blocklist lists</a> not as an endorsement [far from it!] but as a point of reference for <a href="https://infosec.space/tags/CUII" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CUII</span></a>... </li></ul><p>Maybe someday I (or someone else) can get around and make a <a href="https://infosec.space/tags/VirusTotal" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>VirusTotal</span></a> - style <a href="https://infosec.space/tags/URL" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>URL</span></a> / <a href="https://infosec.space/tags/Domain" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Domain</span></a> checker that provides more than a mere <em>"is clean"</em> / <em>"is sus"</em> / <em>"is dangerous"</em> assessment but a more nuanced answers like:</p><ul><li><em>"This domain is blocked by CUII for alleged copyright infringement"</em> </li></ul><p>or</p><ul><li><em>"This domain belongs to <a href="https://infosec.space/tags/NSAbook" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>NSAbook</span></a>"</em> </li></ul><p>and allow for granular, informed decisions (i.e. allow accessing it as a client, but blocking the entire <a href="https://infosec.space/tags/ASN" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ASN</span></a> from accessing one's Servers and thus block non-consensual <a href="https://infosec.space/tags/Scrapers" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Scrapers</span></a> that violate <a href="https://infosec.space/tags/RobotsTXT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RobotsTXT</span></a>.) </p><ul><li>I'm just shit at programming anything with a <a href="https://infosec.space/tags/GUI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GUI</span></a> but hey, I'm just a <a href="https://infosec.space/tags/Sysadmin" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Sysadmin</span></a> not a <a href="https://infosec.space/tags/Frontend" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Frontend</span></a> person...</li></ul>
Thom Zane<p>How do we get Crawl-delay into the RFC 9309 robots.txt standard? Big tech hides behind RFC 9309 as an excuse to ignore Crawl-delay because they want to crawl as fast as they want.</p><p><a href="https://daedal.io/tags/robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotstxt</span></a> <a href="https://daedal.io/tags/sysadmin" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>sysadmin</span></a></p>
teufelswerk<p>Semrush ist eines der bekanntesten SEO-Analyse-Tools auf dem Markt. Es durchsucht Websites regelmäßig mit seinem Bot (SemrushBot), um Daten wie Keywords, Backlinks, Rankings und vieles mehr von deiner Website zu erfassen und zu analysieren. Hier sind 5 effektive, schnell umzusetzende Methoden, wie du Semrush von deiner Website aussperren kannst. 👇 </p><p><a href="https://teufelswerk.net/semrushbot-blockieren-so-schuetzt-du-jede-website-egal-ob-wordpress-joomla-typo3-oder-statisch/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">teufelswerk.net/semrushbot-blo</span><span class="invisible">ckieren-so-schuetzt-du-jede-website-egal-ob-wordpress-joomla-typo3-oder-statisch/</span></a></p><p><a href="https://social.tchncs.de/tags/SEO" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SEO</span></a> <a href="https://social.tchncs.de/tags/semrush" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>semrush</span></a> <a href="https://social.tchncs.de/tags/botblocker" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>botblocker</span></a> <a href="https://social.tchncs.de/tags/bots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>bots</span></a> <a href="https://social.tchncs.de/tags/website" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>website</span></a> <a href="https://social.tchncs.de/tags/websecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>websecurity</span></a> <a href="https://social.tchncs.de/tags/cybersecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cybersecurity</span></a> <a href="https://social.tchncs.de/tags/wordpress" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>wordpress</span></a> <a href="https://social.tchncs.de/tags/joomla" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>joomla</span></a> <a href="https://social.tchncs.de/tags/typo3" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>typo3</span></a> <a href="https://social.tchncs.de/tags/nginx" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>nginx</span></a> <a href="https://social.tchncs.de/tags/robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotstxt</span></a> <a href="https://social.tchncs.de/tags/htaccess" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>htaccess</span></a></p>
Christian Gudrian<p>“robots.txt” is like leaving your front door open with a “please don’t steal here” sign next to it. </p><p><a href="https://social.tchncs.de/tags/WebDev" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebDev</span></a> <a href="https://social.tchncs.de/tags/robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotstxt</span></a></p>
Jonathan Bailey<p>I asked ChatGPT about the recent copyright news. It rehashed my latest column and misconstrued the facts. But why was it on my site at all?</p><p><a href="https://www.plagiarismtoday.com/2025/07/23/chatgpt-ignores-robots-txt-rehashes-my-column/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">plagiarismtoday.com/2025/07/23</span><span class="invisible">/chatgpt-ignores-robots-txt-rehashes-my-column/</span></a></p><p><a href="https://mastodon.world/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.world/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ChatGPT</span></a> <a href="https://mastodon.world/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAI</span></a> <a href="https://mastodon.world/tags/RobotsTxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RobotsTxt</span></a></p>
George E. 🇺🇸♥🇺🇦🇵🇸🏳️‍🌈🏳️‍⚧️<p>Here's <a href="https://bofh.social/tags/Cloudflare" rel="nofollow noopener" target="_blank">#Cloudflare</a>'s <a href="https://bofh.social/tags/robots-txt" rel="nofollow noopener" target="_blank">#robots-txt</a><span> file:<br></span></p><pre><code># Cloudflare Managed Robots.txt to block AI related bots. User-agent: AI2Bot Disallow: / User-agent: Amazonbot Disallow: / User-agent: amazon-kendra Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Applebot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: AwarioRssBot Disallow: / User-agent: AwarioSmartBot Disallow: / User-agent: bigsur.ai Disallow: / User-agent: Brightbot Disallow: / User-agent: Bytespider Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Diffbot Disallow: / User-agent: DigitalOceanGenAICrawler Disallow: / User-agent: DuckAssistBot Disallow: / User-agent: FacebookBot Disallow: / User-agent: FriendlyCrawler Disallow: / User-agent: Google-Extended Disallow: / User-agent: GPTBot Disallow: / User-agent: iaskspider/2.0 Disallow: / User-agent: ICC-Crawler Disallow: / User-agent: img2dataset Disallow: / User-agent: Kangaroo Bot Disallow: / User-agent: LinerBot Disallow: / User-agent: MachineLearningForPeaceBot Disallow: / User-agent: Meltwater Disallow: / User-agent: meta-externalagent Disallow: / User-agent: meta-externalfetcher Disallow: / User-agent: Nicecrawler Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: omgili Disallow: / User-agent: omgilibot Disallow: / User-agent: PanguBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Perplexity-User Disallow: / User-agent: PetalBot Disallow: / User-agent: PiplBot Disallow: / User-agent: QualifiedBot Disallow: / User-agent: Scoop.it Disallow: / User-agent: Seekr Disallow: / User-agent: SemrushBot-OCOB Disallow: / User-agent: Sidetrade indexer bot Disallow: / User-agent: Timpibot Disallow: / User-agent: VelenPublicWebCrawler Disallow: / User-agent: Webzio-Extended Disallow: / User-agent: YouBot Disallow: /</code></pre><p></p>
smeg<p>I've had the robots.txt to block ChatGPT from touching my site in place for months. Yet it's a referrer?</p><p><a href="https://assortedflotsam.com/tags/chatgpt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>chatgpt</span></a> <a href="https://assortedflotsam.com/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llm</span></a> <a href="https://assortedflotsam.com/tags/privacy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>privacy</span></a> <a href="https://assortedflotsam.com/tags/robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotstxt</span></a></p>
zeyus :gopnik_implant_b:‎<p>Hey does anyone know if there's still a working zip bomb style exploit that can be deployed on a static site/JS (or as a asset/resource)? Specifically to target web scrapers and AI bullshit? The second any server goes online now it's immediately bombarded by stupid numbers of requests.</p><p><a href="https://corteximplant.com/tags/hacking" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>hacking</span></a> <a href="https://corteximplant.com/tags/aislop" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>aislop</span></a> <a href="https://corteximplant.com/tags/crawlers" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>crawlers</span></a> <a href="https://corteximplant.com/tags/webscraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webscraping</span></a> <a href="https://corteximplant.com/tags/webcrawler" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webcrawler</span></a> <a href="https://corteximplant.com/tags/robots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robots</span></a> <a href="https://corteximplant.com/tags/zipbomb" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>zipbomb</span></a> <a href="https://corteximplant.com/tags/zipbombing" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>zipbombing</span></a> <a href="https://corteximplant.com/tags/exploit" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>exploit</span></a> <a href="https://corteximplant.com/tags/robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotstxt</span></a> <a href="https://corteximplant.com/tags/server" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>server</span></a> <a href="https://corteximplant.com/tags/scraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraper</span></a></p>
Dawn Tåke 🌙 :sparkletrans:<p>Hi, got a question.</p><p>Is there a standard for Anti-AI/Anti-SEO etc robots.txt file? Or a trustworthy site that explains how to build one if prefab isn't available? Is there anything else I should consider? </p><p>Thanks.</p><p><a href="https://tech.lgbt/tags/AskFedi" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AskFedi</span></a> <a href="https://tech.lgbt/tags/TechHelp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechHelp</span></a> <a href="https://tech.lgbt/tags/RobotsTXT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RobotsTXT</span></a> <a href="https://tech.lgbt/tags/RobotsDotTXT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RobotsDotTXT</span></a></p>
Preston Maness ☭<p><a href="https://www.tiktok.com/@alberta.nyc/video/7465916806939659563?lang=en" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">tiktok.com/@alberta.nyc/video/</span><span class="invisible">7465916806939659563?lang=en</span></a></p><p><a href="https://tenforward.social/tags/DeepSeek" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DeepSeek</span></a> <a href="https://tenforward.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://tenforward.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAI</span></a> <a href="https://tenforward.social/tags/China" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>China</span></a> <a href="https://tenforward.social/tags/robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotstxt</span></a></p>
Kevin Karhan :verified:<p><span class="h-card" translate="no"><a href="https://infosec.space/@fennix" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>fennix</span></a></span> the fact that neither <span class="h-card" translate="no"><a href="https://social.bund.de/@bsi" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>bsi</span></a></span> nor <span class="h-card" translate="no"><a href="https://ec.social-network.europa.eu/@EUCommission" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>EUCommission</span></a></span> make honoring <a href="https://infosec.space/tags/RobotsTXT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RobotsTXT</span></a> legally mandatory under penalty of fines <em>and</em> forced disconnects is a problem.</p><p><a href="https://infosec.space/tags/WhatYouAllowIsWhatWillContinue" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WhatYouAllowIsWhatWillContinue</span></a> applies here and I kniw some folks intent to literally ban entire ASNs for hosting crawlers because those literally <a href="https://infosec.space/tags/DDoS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DDoS</span></a> sites offline and criminally incompetent, value-removing middlemen like <a href="https://infosec.space/tags/ClownFlare" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ClownFlare</span></a> <a href="https://www.youtube.com/watch?v=Hi5sd3WEh0c" rel="nofollow noopener" target="_blank">do jack shit about even when tasked to do so.</a></p><p><a href="https://infosec.space/tags/sarcasm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>sarcasm</span></a> <a href="https://infosec.space/tags/vent" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>vent</span></a> <a href="https://infosec.space/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://infosec.space/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> <a href="https://infosec.space/tags/Enshittification" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Enshittification</span></a></p>
Kevin Karhan :verified:<p><span class="h-card" translate="no"><a href="https://mastodon.neilzone.co.uk/@neil" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>neil</span></a></span> <span class="h-card" translate="no"><a href="https://kind.social/@ThreeGerbilsInACoat" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>ThreeGerbilsInACoat</span></a></span> also note that <a href="https://infosec.space/tags/InternetArchive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>InternetArchive</span></a> disregards the <a href="https://infosec.space/tags/RobotsTXT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RobotsTXT</span></a> file...</p>
Ecologia Digital<p><a href="https://mato.social/tags/Robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Robotstxt</span></a> <a href="https://mato.social/tags/CrawlerBacklash" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CrawlerBacklash</span></a> Trickle-down effects: "people start blocking all crawlers, and some crawlers are very important, for search indexing, internet archiving, some are used for academic research, and so the bad behaviours of all these <a href="https://mato.social/tags/AIcompanies" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIcompanies</span></a>, and the backlash to it, is kind of fundamentally changing how the Internet works, how it is remembered and indexed..."<br><a href="https://pca.st/yto6v3il?t=11m34s" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">pca.st/yto6v3il?t=11m34s</span><span class="invisible"></span></a></p>
🧿🪬🍄🌈🎮💻🚲🥓🎃💀🏴🛻🇺🇸<p>Y'all really putting a file on your webserver that says "don't look *here* if you're a bot!" and expecting people not to look there first 🤡 </p><p><a href="https://mastodon.social/tags/webDev" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webDev</span></a> <a href="https://mastodon.social/tags/robotsTxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotsTxt</span></a> <a href="https://mastodon.social/tags/bots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>bots</span></a> <a href="https://mastodon.social/tags/search" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>search</span></a> <a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://mastodon.social/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llm</span></a> <a href="https://mastodon.social/tags/scraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraper</span></a> <a href="https://mastodon.social/tags/privacy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>privacy</span></a> <a href="https://mastodon.social/tags/web" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>web</span></a></p>
Ecologia Digital<p>"…the <a href="https://mato.social/tags/backlash" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>backlash</span></a> to AI tools from content creators and website owners who do not want their work to be used for AI training purposes without permission or compensation is not only real but is becoming increasingly widespread. The analysis also highlights the limitations of robots.txt—while many companies respect robots.txt instructions, some do not. Perplexity have been caught circumventing &amp; ignoring <a href="https://mato.social/tags/robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotstxt</span></a>."</p><p><a href="https://www.404media.co/the-backlash-against-ai-scraping-is-real-and-measurable/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">404media.co/the-backlash-again</span><span class="invisible">st-ai-scraping-is-real-and-measurable/</span></a></p>
Ecologia Digital<p>"…researchers estimate that in the 3 data sets—called C4, RefinedWeb and Dolma—5% of all data, and 25% of data from the highest-quality sources, has been restricted…set up through the <a href="https://mato.social/tags/RobotsExclusionProtocol" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RobotsExclusionProtocol</span></a>, a method for website owners to prevent automated bots from crawling their pages using a file called <a href="https://mato.social/tags/robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotstxt</span></a>."</p><p><a href="https://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html?unlocked_article_code=1.8k0.8eMA.cGAaZ0i10aZE&amp;smid=nytcore-ios-share&amp;referringSource=articleShare" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">nytimes.com/2024/07/19/technol</span><span class="invisible">ogy/ai-data-restrictions.html?unlocked_article_code=1.8k0.8eMA.cGAaZ0i10aZE&amp;smid=nytcore-ios-share&amp;referringSource=articleShare</span></a></p>
🧿🪬🍄🌈🎮💻🚲🥓🎃💀🏴🛻🇺🇸<p>Web scrapers work by finding URLs in a page and then visiting those URLs to find more URLs recursively.</p><p>What's stopping us from serving them infinite trees of URLs, filled with random garbage?</p><p><a href="https://mastodon.social/tags/webDev" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webDev</span></a> <a href="https://mastodon.social/tags/robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotstxt</span></a> <a href="https://mastodon.social/tags/scrapers" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scrapers</span></a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a></p>
🧿🪬🍄🌈🎮💻🚲🥓🎃💀🏴🛻🇺🇸<p>Blocking robots.txt is not very cyberpunk</p><p><a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://mastodon.social/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llm</span></a> <a href="https://mastodon.social/tags/cyberpunk" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cyberpunk</span></a> <a href="https://mastodon.social/tags/openAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>openAI</span></a> <a href="https://mastodon.social/tags/perplexity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>perplexity</span></a> <a href="https://mastodon.social/tags/anthropic" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>anthropic</span></a> <a href="https://mastodon.social/tags/gemini" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>gemini</span></a> <a href="https://mastodon.social/tags/claude" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>claude</span></a> <a href="https://mastodon.social/tags/stableDiffusion" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>stableDiffusion</span></a> <a href="https://mastodon.social/tags/aiArt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>aiArt</span></a> <a href="https://mastodon.social/tags/llms" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llms</span></a> <a href="https://mastodon.social/tags/chatGPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>chatGPT</span></a> <a href="https://mastodon.social/tags/copilot" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>copilot</span></a> <a href="https://mastodon.social/tags/robots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robots</span></a> <a href="https://mastodon.social/tags/robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotstxt</span></a> <a href="https://mastodon.social/tags/scraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraper</span></a> <a href="https://mastodon.social/tags/search" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>search</span></a> <a href="https://mastodon.social/tags/google" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>google</span></a> <a href="https://mastodon.social/tags/crawler" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>crawler</span></a></p>
🧿🪬🍄🌈🎮💻🚲🥓🎃💀🏴🛻🇺🇸<p>AI scraped all your photos so that I could look up the names of flowers which completely justifies the scraping in my mind.</p><p><a href="https://mastodon.social/tags/robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotstxt</span></a> <a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://mastodon.social/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llm</span></a> <a href="https://mastodon.social/tags/stableDiffusion" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>stableDiffusion</span></a> <a href="https://mastodon.social/tags/openai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>openai</span></a> <a href="https://mastodon.social/tags/copilot" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>copilot</span></a> <a href="https://mastodon.social/tags/gemini" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>gemini</span></a> <a href="https://mastodon.social/tags/ip" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ip</span></a> <a href="https://mastodon.social/tags/copyright" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>copyright</span></a></p>