The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,418 other followers

Helping the WayBack ArchiveTeam team: running their Warrior virtual appliance on ESXi

Posted by jpluimers on 2021/03/19

The [WayBack] Archiveteam helps the WayBack machine with feeding new content.

You can help that team by running one or more “warrior” virtual machine instances. The VM is distributed as a virtual appliance in an ova file according to the Open Virtualization Format.

That format sounds more generic than it actually is, so the (at the time of writing) archiveteam-warrior-v3-20171013.ova file at [WayBack] Index of /downloads/warrior3/ was created for VirtualBox.X

This meant running it on VMware ESXi or VMware vSphere takes a few steps for patching it, then uploading it to your VMware host.

Since I might want to run the appliance on multiple places or multiple instances, I wanted to have a ready-to-go solution, I created a git repository with both the patch instructions and the update at [WayBack] wiert.me / public / ova / archiveteam-warrior-v3-20171013.ESXi · GitLab.

Both of them are here:

Thanks to these for both the patch and for getting me to write instructions:

  1. [WayBack] www.archiveteam.org/index.php?title=Deploy_OVA_to_VMware_ESXi
  2. [WayBack] edvoncken.net/2014/08/archiveteam-warrior-on-esxi
  3. [WayBack] gist.github.com/Kipari/c32192de52c2a86d56ee0d67472d4dc6

I raised the memory of each instance to 2 gibibyte (from the default 400 mebibyte) and 3.6 GHz CPU speed (from the default 1 core). This was more than enough so they would saturate less than 60% of the maximum:

 

That way I could run each instance at the maximum allowed settings on the right:

  • 6 concurrent items
  • 3 rsync threads

Together they would fluctuate at about 50 mebibit/second combined down and upstream throughput: about half of one of my home fiber connections.

More on the warrior

The download above is a virtual appliance that deploys as an Alpine Linux instance with an embedded docker container.

You can scale it up either by running multiple virtual appliances (easiest way for just a few) or many orchestrated docker instances (if you already a docker infrastructure running: grab the [Archive.is] archiveteam/warrior-dockerfile – Docker Hub).

More on that, and how the warriors fit in the archiving scheme:

Docker

If you have Docker installed, the following command will help preserve…: docker run --publish 127.0.0.1:8001:8001 --restart always archiveteam/warrior-dockerfile https://twitter.com/SteveMcLaugh/status/1112433277359529984

– Edward Morbius – Google+

Via:

–jeroen


PS

PS: I wrote this when Google+ (G+) was a few days from being shut down, but there are always sites that go down in need for archiving.

The Wayback Archive Team Warriors have a site listing projects, but since they are better in archiving and tool development than writing WiKi content, the easiest is to have your warrior instance run the “ArchiveTeam’s Choice” project: it will work on the most pressing downloads, and continue even after one project is finished.

On the Google+ and the G+ archiving project

Some stats:

The more interesting project tracker, showing updates in realtime, is: http://tracker.archiveteam.org/googleplus/

Note that this shows only 1/50th of the total project at a time. “Items” are sitemap subsets of 100 profiles, and 50 batches of 1,000 sitemaps at a time, each with about 680 or so items, will be processed over the course of this archival. The tracker shows the status only of the current batch. Total profiles archived are 50 batches * 1,000 sitemaps/batch * 680 items/sitemap * 100 profiles/item = 3.4 billion profiles, or the total number of Google+ profiles (as of March, 2017). There will be 34 million items, total, in the overall process.

Some Twitter tid-bits on the last day of archival:

Some of my old G+ starting places

On VMware and uploading OVA/OVF files

Quoted in part because the VMware documentation site hates to be archived

[empty WayBack/empty Archive.is] OVF and OVA Limitations for the VMware Host Client

To deploy a large OVA file, VMware recommends to first extract the OVA on your system by running the command tar -xvf <file.ova>. Then you can provide the deployment wizard with the OVF and VMDKs as separate files.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

 
%d bloggers like this: