Shakedown Social

〄a little tool I built to fight linkrot and save our sources from the memory hole → <a href="https://sij.law/deepciter" rel="nofollow noopener" translate="no" target="_blank">https://sij.law/deepciter</a> <a href="https://earth.law/tags/digitalpreservation" class="mention hashtag" rel="nofollow noopener" target="_blank">#digitalpreservation</a> <a href="https://earth.law/tags/selfhosting" class="mention hashtag" rel="nofollow noopener" target="_blank">#selfhosting</a> <a href="https://earth.law/tags/archivebox" class="mention hashtag" rel="nofollow noopener" target="_blank">#archivebox</a> <a href="https://earth.law/tags/opensource" class="mention hashtag" rel="nofollow noopener" target="_blank">#opensource</a> <a href="https://earth.law/tags/foss" class="mention hashtag" rel="nofollow noopener" target="_blank">#foss</a> <a href="https://earth.law/tags/textfragments" class="mention hashtag" rel="nofollow noopener" target="_blank">#textfragments</a> <a href="https://earth.law/tags/waybackmachine" class="mention hashtag" rel="nofollow noopener" target="_blank">#waybackmachine</a> <a href="https://earth.law/tags/linkrot" class="mention hashtag" rel="nofollow noopener" target="_blank">#linkrot</a> <a href="https://earth.law/tags/memoryhole" class="mention hashtag" rel="nofollow noopener" target="_blank">#memoryhole</a> <a href="https://earth.law/tags/legaltech" class="mention hashtag" rel="nofollow noopener" target="_blank">#legaltech</a> <a href="https://earth.law/tags/permalink" class="mention hashtag" rel="nofollow noopener" target="_blank">#permalink</a> <a href="https://earth.law/tags/deepcite" class="mention hashtag" rel="nofollow noopener" target="_blank">#deepcite</a>

Tom MacWrightbeen trying to archive all outlinks from macwright.com with <a href="https://mastodon.social/tags/archivebox" class="mention hashtag" rel="nofollow noopener" target="_blank">#archivebox</a> and results are decidedly mixed: tasks keep getting stuck in a 'pending' state with no feedback as to whether anything is working or not.

Preston Maness ☭I've mirrored a relatively simple website (redsails.org; it's mostly text, some images) for posterity via <a href="https://tenforward.social/tags/wget" class="mention hashtag" rel="nofollow noopener" target="_blank">#wget</a>. However, I also wanted to grab snapshots of any outlinks (of which there are many, as citations/references). By default, I couldn't figure out a configuration where wget would do that out of the box, without endlessly, recursively spidering the whole internet. I ended up making a kind-of poor man's <a href="https://tenforward.social/tags/ArchiveBox" class="mention hashtag" rel="nofollow noopener" target="_blank">#ArchiveBox</a> instead:for i in $(cat others.txt) ; do dirname=$(echo "$i" | sha256sum | cut -d' ' -f 1) ; mkdir -p $dirname ; wget --span-hosts --page-requisites --convert-links --backup-converted --adjust-extension --tries=5 --warc-file="$dirname/$dirname" --execute robots=off --wait 1 --waitretry 5 --timeout 60 -o "$dirname/wget-$dirname.log" --directory-prefix="$dirname/" $i ; doneBasically, there's a list of bookmarks^W URLs in others.txt that I grabbed from the initial mirror of the website with some <a href="https://tenforward.social/tags/grep" class="mention hashtag" rel="nofollow noopener" target="_blank">#grep</a> foo. I want to do as good of a mirror/snapshot of each specific URL as I can, without spidering/mirroring endlessly all over. So, I hash the URL, and kick off a specific wget job for it that will span hosts, but only for the purposes of making the specific URL as usable locally/offline as possible. I know from experience that this isn't perfect. But... it'll be good enough for my purposes. I'm also stashing a WARC file. Probably a bit overkill, but I figure it might be nice to have.<a href="https://tenforward.social/tags/RedSails" class="mention hashtag" rel="nofollow noopener" target="_blank">#RedSails</a> <a href="https://tenforward.social/tags/archive" class="mention hashtag" rel="nofollow noopener" target="_blank">#archive</a> <a href="https://tenforward.social/tags/archival" class="mention hashtag" rel="nofollow noopener" target="_blank">#archival</a> <a href="https://tenforward.social/tags/archiving" class="mention hashtag" rel="nofollow noopener" target="_blank">#archiving</a> <a href="https://tenforward.social/tags/warc" class="mention hashtag" rel="nofollow noopener" target="_blank">#warc</a>

Enron Hubbardof course, neofeudal lords are looking to <a href="https://tilde.zone/tags/wikipedia" class="mention hashtag" rel="nofollow noopener" target="_blank">#wikipedia</a> and <a href="https://tilde.zone/tags/internetArchive" class="mention hashtag" rel="nofollow noopener" target="_blank">#internetArchive</a> with arson in their hearts, as they always do with the great librariesbetween this and the web continuing to enshittify with AI slop and critical mass of advertising, it's probably time to start thinking about things in terms of offline-first make local copies of resources that are important, get your personal content off of cloud providers, and archive everything you canold phones, random flash drives, unused laptops - all of that can be put to good use as self-sovereign libraries. and if you have the financial means, seriously consider building or investing in a NASwe have plenty of tools to make this possible:kiwix is an offline reader for Wikipedia, Project Gutenberg, and several other online sources - there's even a method to turn a raspi into a hotspot that serves the archived content: <a href="https://kiwix.org/en/how-to-set-up-kiwix-hotspot/" rel="nofollow noopener" translate="no" target="_blank">https://kiwix.org/en/how-to-set-up-kiwix-hotspot/</a>youtube-dl is a program that you can use to download content from youtube, including full channels: <a href="https://ytdl-org.github.io/youtube-dl/" rel="nofollow noopener" translate="no" target="_blank">https://ytdl-org.github.io/youtube-dl/</a>the Internet Archive also has a command line utility to bulk download content: <a href="https://archive.org/developers/internetarchive/cli.html" rel="nofollow noopener" translate="no" target="_blank">https://archive.org/developers/internetarchive/cli.html</a>and take a look at <a href="https://tilde.zone/tags/archiveBox" class="mention hashtag" rel="nofollow noopener" target="_blank">#archiveBox</a> - a self-hosted project that takes in urls and downloads relevant content while stripping out all of the extra shit you don't need: <a href="https://archivebox.io/" rel="nofollow noopener" translate="no" target="_blank">https://archivebox.io/</a>

black lipstick on your flight controlssurviving and healthy bookmark archiver tools from last time i looked into replacing Pocket include <a href="https://www.linkace.org/" rel="nofollow noopener" target="_blank">LinkAce</a> and <a href="https://archivebox.io/" rel="nofollow noopener" target="_blank">ArchiveBox</a>. i don't see a clear winner based on two minutes of browsing docs except based on implementation language vibes: ArchiveBox is Python, while LinkAce is PHP.<a href="https://princess.industries/tags/mozilla" class="mention hashtag" rel="nofollow noopener" target="_blank">#Mozilla</a> <a href="https://princess.industries/tags/pocket" class="mention hashtag" rel="nofollow noopener" target="_blank">#Pocket</a> <a href="https://princess.industries/tags/linkace" class="mention hashtag" rel="nofollow noopener" target="_blank">#LinkAce</a> <a href="https://princess.industries/tags/archivebox" class="mention hashtag" rel="nofollow noopener" target="_blank">#ArchiveBox</a> <a href="https://princess.industries/tags/selfhost" class="mention hashtag" rel="nofollow noopener" target="_blank">#SelfHost</a> <a href="https://princess.industries/tags/selfhosted" class="mention hashtag" rel="nofollow noopener" target="_blank">#SelfHosted</a> <a href="https://princess.industries/tags/selfhosting" class="mention hashtag" rel="nofollow noopener" target="_blank">#SelfHosting</a>

Recent searches

Search options

Administered by:

Server stats:

#ArchiveBox