Shakedown Social

spielleitung<a href="https://mastodon.pnpde.social/tags/GPTBot" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPTBot</a> macht nach wie vor ca. 20% der Zugriffe dieser Mastodon-Instanz aus, aber der Crawler bekommt nur noch von <a href="https://mastodon.pnpde.social/tags/Iocaine" class="mention hashtag" rel="nofollow noopener" target="_blank">#Iocaine</a> generierten Unsinn. Das reduziert die Datenmenge, die wir an ihn ausliefern, drastisch und zerstört die Qualität unseres Datensatzes für ihn vollkommen.Es hilft uns also Kosten zu sparen, verschlechtert die LLM und macht auch noch diebische Freude! Win-Win-Win! :KritischerTreffer: <a href="https://mastodon.pnpde.social/tags/MastoAdmin" class="mention hashtag" rel="nofollow noopener" target="_blank">#MastoAdmin</a> <a href="https://mastodon.pnpde.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenAI</a>

Kevin Karhan :verified:<a href="https://toot.cafe/@baldur" class="u-url mention" rel="nofollow noopener" target="_blank">@baldur</a> nodds in agreement at my current employer we had to block <a href="https://infosec.space/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenAI</a>'s entire <a href="https://github.com/greyhat-academy/lists.d/blob/1a61ef878ec970c554f7263ec06d57fdc4d49e3e/scrapers.ipv4.block.list.tsv#L6" rel="nofollow noopener" target="_blank">IP ranges</a> as they literally <a href="https://infosec.space/tags/DDoS" class="mention hashtag" rel="nofollow noopener" target="_blank">#DDoS</a>'d a <a href="https://infosec.space/tags/customer" class="mention hashtag" rel="nofollow noopener" target="_blank">#customer</a> with spoofed <a href="https://infosec.space/tags/UserAgent" class="mention hashtag" rel="nofollow noopener" target="_blank">#UserAgent</a>(s) [instead of using <a href="https://infosec.space/tags/GPTbot" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPTbot</a>]…<ul><li>It's really fucking annoying!</li></ul>

Kevin Karhan :verified:<a href="https://mastodon.social/@khobochka" class="u-url mention" rel="nofollow noopener" target="_blank">@khobochka</a> guess why I <a href="https://github.com/greyhat-academy/lists.d/blob/main/scrapers.ipv4.block.list.tsv" rel="nofollow noopener" target="_blank">maintain</a> a <a href="https://infosec.space/tags/Scraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#Scraper</a> <a href="https://infosec.space/tags/blocklist" class="mention hashtag" rel="nofollow noopener" target="_blank">#blocklist</a>?<ul><li>In fact I know multiple people and organizations that decide to basically redirect <a href="https://infosec.space/tags/ValueRemoving" class="mention hashtag" rel="nofollow noopener" target="_blank">#ValueRemoving</a> <a href="https://infosec.space/tags/Scrapers" class="mention hashtag" rel="nofollow noopener" target="_blank">#Scrapers</a> like <a href="https://infosec.space/tags/GPTbot" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPTbot</a>, <a href="https://infosec.space/tags/ByteSpider" class="mention hashtag" rel="nofollow noopener" target="_blank">#ByteSpider</a> (which <a href="https://www.youtube.com/watch?v=Hi5sd3WEh0c" rel="nofollow noopener" target="_blank">literally</a> <a href="https://infosec.space/tags/DDoS" class="mention hashtag" rel="nofollow noopener" target="_blank">#DDoS</a>'d <a href="https://infosec.space/tags/MattKC" class="mention hashtag" rel="nofollow noopener" target="_blank">#MattKC</a> because <a href="https://infosec.space/tags/ClownFlare" class="mention hashtag" rel="nofollow noopener" target="_blank">#ClownFlare</a> are a criminally incompetent <a href="https://infosec.space/tags/RogueISP" class="mention hashtag" rel="nofollow noopener" target="_blank">#RogueISP</a>!) to <a href="https://infosec.space/tags/Hetzner" class="mention hashtag" rel="nofollow noopener" target="_blank">#Hetzner</a>'s <a href="http://hil-speed.hetzner.com/" rel="nofollow noopener" target="_blank">10GB Speedtest file</a> which can be found at <code>http://hil-speed.hetzner.com/10GB.bin</code> as an extra middlefinger!</li></ul><a href="https://infosec.space/tags/Cloudflare" class="mention hashtag" rel="nofollow noopener" target="_blank">#Cloudflare</a> <a href="https://infosec.space/tags/hetznered" class="mention hashtag" rel="nofollow noopener" target="_blank">#hetznered</a> <a href="https://infosec.space/tags/ByteDance" class="mention hashtag" rel="nofollow noopener" target="_blank">#ByteDance</a> <a href="https://infosec.space/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#ChatGPT</a>

beSpacificThe <a href="https://newsie.social/tags/NewYorkTimes" class="mention hashtag" rel="nofollow noopener" target="_blank">#NewYorkTimes</a> has blocked <a href="https://newsie.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenAI</a>’s <a href="https://newsie.social/tags/webcrawler" class="mention hashtag" rel="nofollow noopener" target="_blank">#webcrawler</a>, meaning that OpenAI can’t use content from the publication to train its AI models. If you check the NYT’s robots.txt page, you can see that the NYT disallows <a href="https://newsie.social/tags/GPTBot" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPTBot</a>, the crawler that OpenAI introduced earlier this month. Based on the <a href="https://newsie.social/tags/InternetArchive" class="mention hashtag" rel="nofollow noopener" target="_blank">#InternetArchive</a>’s <a href="https://newsie.social/tags/WaybackMachine" class="mention hashtag" rel="nofollow noopener" target="_blank">#WaybackMachine</a>, it appears NYT blocked the crawler as early as August 17th. <a href="https://www.theverge.com/2023/8/21/23840705/new-york-times-openai-web-crawler-ai-gpt" rel="nofollow noopener" target="_blank">https://www.theverge.com/2023/8/21/23840705/new-york-times-openai-web-crawler-ai-gpt</a> <a href="https://newsie.social/tags/copyright" class="mention hashtag" rel="nofollow noopener" target="_blank">#copyright</a> <a href="https://newsie.social/tags/legalresearch" class="mention hashtag" rel="nofollow noopener" target="_blank">#legalresearch</a>

Paul Chambers🚧<a href="https://oldfriends.live/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenAI</a> IP block ranges if you want to block them from your instance and scraping your content. I saw Mastodon devs added something to block <a href="https://oldfriends.live/tags/GPTBot" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPTBot</a> via robots.txt a few days ago. Here are the IP ranges:<a href="https://oldfriends.live/tags/MastoAdmin" class="mention hashtag" rel="nofollow noopener" target="_blank">#MastoAdmin</a> <a href="https://oldfriends.live/tags/FediBlock" class="mention hashtag" rel="nofollow noopener" target="_blank">#FediBlock</a>20.15.240.64/28 20.15.240.80/28 20.15.240.96/28 20.15.240.176/28 20.15.241.0/28 20.15.242.128/28 20.15.242.144/28 20.15.242.192/28 40.83.2.64/28<a href="https://openai.com/gptbot-ranges.txt" rel="nofollow noopener" target="_blank">https://openai.com/gptbot-ranges.txt</a><a href="https://www.theverge.com/2023/8/7/23823046/openai-data-scrape-block-ai" rel="nofollow noopener" target="_blank">https://www.theverge.com/2023/8/7/23823046/openai-data-scrape-block-ai</a><a href="https://github.com/mastodon/mastodon/pull/26396" rel="nofollow noopener" target="_blank">https://github.com/mastodon/mastodon/pull/26396</a>

IT NewsSites scramble to block ChatGPT web crawler after instructions emerge - Enlarge (credit: Getty Images) Without announcement, OpenAI re... - <a href="https://arstechnica.com/?p=1960108" rel="nofollow noopener" target="_blank">https://arstechnica.com/?p=1960108</a> <a href="https://schleuss.online/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#machinelearning</a> <a href="https://schleuss.online/tags/webscraming" class="mention hashtag" rel="nofollow noopener" target="_blank">#webscraming</a> <a href="https://schleuss.online/tags/webcrawling" class="mention hashtag" rel="nofollow noopener" target="_blank">#webcrawling</a> <a href="https://schleuss.online/tags/aiethics" class="mention hashtag" rel="nofollow noopener" target="_blank">#aiethics</a> <a href="https://schleuss.online/tags/chatgpt" class="mention hashtag" rel="nofollow noopener" target="_blank">#chatgpt</a> <a href="https://schleuss.online/tags/chatgtp" class="mention hashtag" rel="nofollow noopener" target="_blank">#chatgtp</a> <a href="https://schleuss.online/tags/biz" class="mention hashtag" rel="nofollow noopener" target="_blank">#biz</a>⁢ <a href="https://schleuss.online/tags/gptbot" class="mention hashtag" rel="nofollow noopener" target="_blank">#gptbot</a> <a href="https://schleuss.online/tags/openai" class="mention hashtag" rel="nofollow noopener" target="_blank">#openai</a> <a href="https://schleuss.online/tags/tech" class="mention hashtag" rel="nofollow noopener" target="_blank">#tech</a> <a href="https://schleuss.online/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#ai</a>

h o ʍ l e t t<a href="https://mamot.fr/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenAI</a> just admitted it has a <a href="https://mamot.fr/tags/bot" class="mention hashtag" rel="nofollow noopener" target="_blank">#bot</a> that crawls the <a href="https://mamot.fr/tags/web" class="mention hashtag" rel="nofollow noopener" target="_blank">#web</a> to collect <a href="https://mamot.fr/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a> training data. If you don't block <a href="https://mamot.fr/tags/GPTbot" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPTbot</a>, that's self-sabotage. <a href="https://www.businessinsider.com/openai-gptbot-web-crawler-content-creators-ai-bots-2023-8?IR=T" rel="nofollow noopener" target="_blank">https://www.businessinsider.com/openai-gptbot-web-crawler-content-creators-ai-bots-2023-8?IR=T</a>

Recent searches

Search options

Administered by:

Server stats:

#gptbot