shakedown.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A community for live music fans with roots in the jam scene. Shakedown Social is run by a team of volunteers (led by @clifff and @sethadam1) and funded by donations.

Administered by:

Server stats:

250
active users

#ArchiveBox

0 posts0 participants0 posts today

I've mirrored a relatively simple website (redsails.org; it's mostly text, some images) for posterity via #wget. However, I also wanted to grab snapshots of any outlinks (of which there are many, as citations/references). By default, I couldn't figure out a configuration where wget would do that out of the box, without endlessly, recursively spidering the whole internet. I ended up making a kind-of poor man's #ArchiveBox instead:

for i in $(cat others.txt) ; do dirname=$(echo "$i" | sha256sum | cut -d' ' -f 1) ; mkdir -p $dirname ; wget --span-hosts --page-requisites --convert-links --backup-converted --adjust-extension --tries=5 --warc-file="$dirname/$dirname" --execute robots=off --wait 1 --waitretry 5 --timeout 60 -o "$dirname/wget-$dirname.log" --directory-prefix="$dirname/" $i ; done

Basically, there's a list of bookmarks^W URLs in others.txt that I grabbed from the initial mirror of the website with some #grep foo. I want to do as good of a mirror/snapshot of each specific URL as I can, without spidering/mirroring endlessly all over. So, I hash the URL, and kick off a specific wget job for it that will span hosts, but only for the purposes of making the specific URL as usable locally/offline as possible. I know from experience that this isn't perfect. But... it'll be good enough for my purposes. I'm also stashing a WARC file. Probably a bit overkill, but I figure it might be nice to have.

Continued thread

of course, neofeudal lords are looking to #wikipedia and #internetArchive with arson in their hearts, as they always do with the great libraries

between this and the web continuing to enshittify with AI slop and critical mass of advertising, it's probably time to start thinking about things in terms of offline-first

make local copies of resources that are important, get your personal content off of cloud providers, and archive everything you can

old phones, random flash drives, unused laptops - all of that can be put to good use as self-sovereign libraries. and if you have the financial means, seriously consider building or investing in a NAS

we have plenty of tools to make this possible:

kiwix is an offline reader for Wikipedia, Project Gutenberg, and several other online sources - there's even a method to turn a raspi into a hotspot that serves the archived content: kiwix.org/en/how-to-set-up-kiw

youtube-dl is a program that you can use to download content from youtube, including full channels: ytdl-org.github.io/youtube-dl/

the Internet Archive also has a command line utility to bulk download content: archive.org/developers/interne

and take a look at #archiveBox - a self-hosted project that takes in urls and downloads relevant content while stripping out all of the extra shit you don't need: archivebox.io/

KiwixSetting Up Kiwix Hotspot - Offline Knowledge Access Guide- KiwixFollow the guide to set up your Kiwix hotspot and enjoy seamless offline knowledge access. Explore step-by-step instructions for easy setup.
Replied in thread

surviving and healthy bookmark archiver tools from last time i looked into replacing Pocket include LinkAce and ArchiveBox. i don't see a clear winner based on two minutes of browsing docs except based on implementation language vibes: ArchiveBox is Python, while LinkAce is PHP.

LinkAceLinkAce - Your self-hosted bookmark archive. Free and open source.LinkAce is a free and open source bookmark archive for long-term storage and organization of your favorite links.