This makes it way easier to save WayBack content:
[WayBack] GitHub – pastpages/savepagenow: A simple Python wrapper for archive.org’s “Save Page Now” capturing service
A poor-mans alternative is the below bash script from [WayBack] Saving of public Google+ content at the Internet Archive’s Wayback Machine by the Archive Team has begun : plexodus:
For Linux, MacOS / OSX, BSD, and other Unix-like operating systems (including Android with Termux, or Windows, with a Unix/Linux environment), the following script (I’ve saved this as archive-url
) will archive the requested URL:
#!/bin/bash
# archive-url
# Archive selected URL at the Internet Archive
curl -s -I -H "Accept: application/json" "https://web.archive.org/save/${1}" |
grep '^x-cache-key:' | sed "s,https,&://,; s,\(${1}\).*$,\1,"
Save that to your execution path (I’ve chosen ~/bin
, you might use /usr/local/bin
or another location on your $PATH
, and invoke as, say (again referring to the G+MM homepage):
$ archive-url https://plus.google.com/communities/112164273001338979772
If you have a list of URLs in a file (or pipelined from command output), you can request all of them to be archived in a simple bash loop. I’m using xargs here to run ten simultaneous requests from the file gplus-urllist
:
cat gplus_urllist | while read url do xargs -I{} -P 10 archive-url {}; done
I’ve run this on over 10,000 URLs over a modest residential broadband connection in a hair over two hours.
Note that such requests trigger an archive by the Internet Archive from one of its archiving nodes, you’re not sending the page to the Archive yourself. In particular, archival from regions defaulting to another language may result in the Google+ site content (but not post or comments) being in a different language. I’ve frequently seen my pages turning up in Japanese, for instance.
–jeroen
Like this:
Like Loading...