Archive for the ‘WayBack machine’ Category
Posted by jpluimers on 2021/09/14
Besides manual upload at [Archive.is] Upload to Internet Archive, there are also automated ways of uploading content.
One day I need this to archive pages or sites into the WayBack machine: [WayBack] Overview of Client Libraries · Internet Archive (most of which is Python based):
Read the rest of this entry »
Posted in Bookmarklet, Development, Internet, InternetArchive, Power User, Python, Scripting, Software Development, WayBack machine, Web Browsers | Leave a Comment »
Posted by jpluimers on 2021/06/16
On my list of things to play with: [WayBack] GitHub – jjjake/internetarchive: A Python and Command-Line Interface to Archive.org.
Via:
Related:
- [WayBack] The Internet Archive Python Library — Internet Archive item APIs 1.8.5 documentation
- [WayBack] Command-Line Interface — Internet Archive item APIs 1.8.5 documentation
- [WayBack] Quickstart — Internet Archive item APIs 1.8.5 documentation, including:
Configuring
Certain functionality of the internetarchive Python library requires your archive.org credentials. Your IA-S3 keys are required for uploading, searching, and modifying metadata, and your archive.org logged-in cookies are required for downloading access-restricted content and viewing your task history. To automatically create a config file with your archive.org credentials, you can use the ia command-line tool:
$ ia configure
Enter your archive.org credentials below to configure 'ia'.
Email address: user@example.com
Password:
Config saved to: /home/user/.config/ia.ini
Your config file will be saved to $HOME/.config/ia.ini, or $HOME/.ia if you do not have a .configdirectory in $HOME. Alternatively, you can specify your own path to save the config to via ia --config-file '~/.ia-custom-config' configure.
If you have a netc file with your archive.org credentials in it, you can simply run ia configure --netrc. Note that Python’s netrc library does not currently support passphrases, or passwords with spaces in them, and therefore not currently suported here.
–jeroen
Read the rest of this entry »
Posted in Development, Internet, InternetArchive, Power User, Python, Scripting, Software Development, WayBack machine | Leave a Comment »
Posted by jpluimers on 2021/06/07
For my link archive, some tweets. [WayBack] Mark Graham is the person to contact in case archiving a link in the WayBack machine fails.
These are the steps for my link archival:
- check if it saves and renders with the WayBack machine, if so, copy the saved URL and the original URL
- check if it saves and renders with archive.is, if so, copy the saved URL and the original URL
- if neither saved, then use the original URL and link text, but note it was unsavable; otherwise prepend the original URL and link text with [WayBack] or [Archive.is] containing the saved URL
Reporting history gist: https://gist.github.com/jpluimers/6115b3cd6dab568ebd1c10ebddfaf140
–jeroen
Read the rest of this entry »
Posted in Internet, InternetArchive, Power User, WayBack machine | Leave a Comment »
Posted by jpluimers on 2021/05/05
A while ago I wrote about Helping the WayBack ArchiveTeam team: running their Warrior virtual appliance on ESXi.
Since it was scheduled before my cancer treatment started and got posted when still recovering from it, I missed that version 3.2 of the [Wayback] ArchiveTeam Warrior appliance appeared in the [Wayback] Releases · ArchiveTeam/Ubuntu-Warrior at [Wayback] Release v3.2 · ArchiveTeam/Ubuntu-Warrior. You can download it form these places:
These two sites have not yet been updated, so they contain the older versions:
The source code now has been moved three times:
Read the rest of this entry »
Posted in *nix, *nix-tools, ArchiveTeamWarrior, Cloud, Containers, diff, Docker, ESXi5, ESXi5.1, ESXi5.5, ESXi6, ESXi6.5, ESXi6.7, ESXi7, Infrastructure, Internet, InternetArchive, Kubernetes (k8n), KVM Kernel-based Virtual Machine, patch, Power User, VirtualBox, Virtualization, VMware, VMware ESXi, VMware Workstation, WayBack machine | Leave a Comment »
Posted by jpluimers on 2021/03/19
The [WayBack] Archiveteam helps the WayBack machine with feeding new content.
You can help that team by running one or more “warrior” virtual machine instances. The VM is distributed as a virtual appliance in an ova file according to the Open Virtualization Format.
That format sounds more generic than it actually is, so the (at the time of writing) archiveteam-warrior-v3-20171013.ova file at [WayBack] Index of /downloads/warrior3/ was created for VirtualBox.X
This meant running it on VMware ESXi or VMware vSphere takes a few steps for patching it, then uploading it to your VMware host.
Since I might want to run the appliance on multiple places or multiple instances, I wanted to have a ready-to-go solution, I created a git repository with both the patch instructions and the update at [WayBack] wiert.me / public / ova / archiveteam-warrior-v3-20171013.ESXi · GitLab.
Read the rest of this entry »
Posted in ArchiveTeamWarrior, Cloud, Containers, Docker, Infrastructure, Internet, InternetArchive, Kubernetes (k8n), Power User, WayBack machine | Leave a Comment »
Posted by jpluimers on 2020/11/13
Archiving Google Product Forum URLs is a pain in the butt for a couple of reasons:
So the trick for saving is:
- Get from the
/forum/#!topic/ based URL to the /d/topic/ based one
- Put it after the
archive.is/?run=1&url=, then save
--jeroen
Posted in Conference Topics, Conferences, Event, Internet, InternetArchive, Power User, WayBack machine | Leave a Comment »
Posted by jpluimers on 2019/10/19
Got this a while ago while saving a bunch of links for my blog; unfortunately the email address did not respond for information
Too Many Requests
We are limiting the number of URLs you can submit to be Archived to the Wayback Machine, using the Save Page Now features, to no more than 15 per minute.
If you submit more than that we will block Save Page Now requests from your IP number for one day.
Please feel free to write to us at info@archive.org if you have questions about this. Please include your IP address and any URLs in the email so we can provide you with better service.
I wish there was a queue service that would make you wait longer, but does fulfill the request.
–jeroen
Posted in Internet, InternetArchive, Power User, WayBack machine | Leave a Comment »
Posted by jpluimers on 2019/08/16
When archiving pages in the WayBack machine, despite Privacy Badger having set to “save no cookies”, it still managed to set truckloads of cookies.
So I used the Chrome settings in chrome://settings/content/cookies to disable cookies and now everything is fine.
–jeroen
Read the rest of this entry »
Posted in Chrome, Google, Internet, InternetArchive, Power User, Privacy, WayBack machine | Leave a Comment »
Posted by jpluimers on 2019/05/27
When you get the response “web.archive.org unexpectedly closed the connection” without even returning an HTTP code, but:
- it works in anonymous mode
- it works with all extensions turned off
then likely there are too many cookies for archive.org or/and web.archive.org: in my case, I had 90 cookies.
Cleaning these cookies out resolved the problem (I used [WayBack] Awesome Cookie Manager for this).
Edit 20231230: Awesome Cookie Manager source repository at [Wayback/Archive] Phatsuo/awesome-cookie-manager: Awesome Cookie Manager.

--jeroen
Posted in Chrome, Google, Internet, InternetArchive, Power User, WayBack machine | Leave a Comment »