The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,433 other followers

Archive for the ‘WayBack machine’ Category

GitHub – jjjake/internetarchive: A Python and Command-Line Interface to Archive.org

Posted by jpluimers on 2021/06/16

On my list of things to play with: [WayBack] GitHub – jjjake/internetarchive: A Python and Command-Line Interface to Archive.org.

Via:

Related:

  • [WayBack] The Internet Archive Python Library — Internet Archive item APIs 1.8.5 documentation
  • [WayBack] Command-Line Interface — Internet Archive item APIs 1.8.5 documentation
  • [WayBack] Quickstart — Internet Archive item APIs 1.8.5 documentation, including:

    Configuring

    Certain functionality of the internetarchive Python library requires your archive.org credentials. Your IA-S3 keys are required for uploading, searching, and modifying metadata, and your archive.org logged-in cookies are required for downloading access-restricted content and viewing your task history. To automatically create a config file with your archive.org credentials, you can use the ia command-line tool:

    $ ia configure
    Enter your archive.org credentials below to configure 'ia'.
    
    Email address: user@example.com
    Password:
    
    Config saved to: /home/user/.config/ia.ini
    

    Your config file will be saved to $HOME/.config/ia.ini, or $HOME/.ia if you do not have a .configdirectory in $HOME. Alternatively, you can specify your own path to save the config to via ia --config-file '~/.ia-custom-config' configure.

    If you have a netc file with your archive.org credentials in it, you can simply run ia configure --netrc. Note that Python’s netrc library does not currently support passphrases, or passwords with spaces in them, and therefore not currently suported here.

–jeroen

Read the rest of this entry »

Posted in Development, Internet, InternetArchive, Power User, Python, Scripting, Software Development, WayBack machine | Leave a Comment »

Check if this still happens: some Twitter content in the WayBack machine gets a slash in the URL removed during rendering on Chrome

Posted by jpluimers on 2021/06/11

From my research list; check if this still happens: [WayBack] Saving Twitter content in the WayBack archive: the fully loaded page has a wrong trailing URL (missing the second slash before the authority) · GitHub

  1. Visited https://twitter.com/MarkGraham
  2. Saved it using https://web.archive.org/save/https://twitter.com/MarkGraham
  3. Waited for the save to complete and the page to fully load and got https://web.archive.org/web/20190607081047/https:/twitter.com/MarkGraham
  4. Observed the trailing part is not a valid URL any more https:/twitter.com/MarkGraham: it is missing the second slash before the authority (see https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Generic_syntax)

This might be a Twitter.com thing:

Notes:

  • I have only tested this with my Chrome configurations on various machines (both regular and anonymous tabs) over at least a year; I need to figure out what happens when using different browsers.
  • It does not always happen.

Via: [WayBack] Jeroen Pluimers on Twitter: “I understand that the sites themselves pay a big role in this. That’s why I have the mangling of URLs that sometimes happens on my research list. I made this quick summary: …”

–jeroen

Read the rest of this entry »

Posted in Internet, InternetArchive, Power User, SocialMedia, Twitter, WayBack machine | Leave a Comment »

Contact for when WayBack internet archival fails to grab content

Posted by jpluimers on 2021/06/07

For my link archive, some tweets. [WayBack] Mark Graham is the person to contact in case archiving a link in the WayBack machine fails.

These are the steps for my link archival:

  1. check if it saves and renders with the WayBack machine, if so, copy the saved URL and the original URL
  2. check if it saves and renders with archive.is, if so, copy the saved URL and the original URL
  3. if neither saved, then use the original URL and link text, but note it was unsavable; otherwise prepend the original URL and link text with [WayBack] or [Archive.is] containing the saved URL

Reporting history gist: https://gist.github.com/jpluimers/6115b3cd6dab568ebd1c10ebddfaf140

–jeroen

Read the rest of this entry »

Posted in Internet, InternetArchive, Power User, WayBack machine | Leave a Comment »

Running ArchiveTeam Warrior version 3.2 on ESXi

Posted by jpluimers on 2021/05/05

A while ago I wrote about Helping the WayBack ArchiveTeam team: running their Warrior virtual appliance on ESXi.

Since it was scheduled before my cancer treatment started and got posted when still recovering from it, I missed that version 3.2 of the [Wayback] ArchiveTeam Warrior appliance appeared in the [Wayback] Releases · ArchiveTeam/Ubuntu-Warrior at [Wayback] Release v3.2 · ArchiveTeam/Ubuntu-Warrior. You can download it form these places:

These two sites have not yet been updated, so they contain the older versions:

The source code now has been moved three times:

  1. [Wayback] ArchiveTeam/warrior-code
  2. [Wayback] ArchiveTeam/warrior-code2 · GitHub
  3. [Wayback] ArchiveTeam/Ubuntu-Warrior at master (this is version 3 and up)

The docker container

The new version of Archive Team Warrior now is basically a shell around [Wayback] Watchtower and the [Wayback] ArchiveTeam/warrior-dockerfile: A Dockerfile for the ArchiveTeam Warrior docker container. This makes updating the core way easier.

More on the docker container (in case you want to run it yourself) is at [Wayback] ArchiveTeam Warrior – Archiveteam – Installing and running with Docker:

You’ll need Docker (open source) and the Warrior Docker image.

  1. Download Docker from the link above and install it.
  2. Open your terminal. On Windows, you can use either Command Prompt (CMD) or PowerShell. On macOS and Linux you can use Terminal (Bash).
  3. Use the following command to start the Warrior as well as Watchtower, which will automatically keep your Warrior updated:
    docker run --detach --name watchtower --restart=on-failure --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --label-enable --cleanup --interval 3600 && docker run --detach --name archiveteam-warrior --label=com.centurylinklabs.watchtower.enable=true --restart=on-failure --publish 8001:8001 atdr.meo.ws/archiveteam/warrior-dockerfile

    (For a full explanation of this command, see items 3 and 4 here.)

  4. Using your regular web browser, visit http://localhost:8001/.

The virtual appliance

The virtual appliance is released as virtual appliance aimed by default at VirtualBox and steps to run with VMware: [Wayback] ArchiveTeam Warrior – Archiveteam.

Totally agreeing with Kristian Kohntopp, I do not understand why people use Virtualbox at all: I just run in too much issues like [Archive.is] Kristian Köhntopp on Twitter: “Hint: Wenn die Installation einer Linux-Distro in Virtualbox mit wechselnden, unbekannten Fehlern scheitert, hilft es, stattdessen einmal VMware Workstation oder kvm zu probieren. In meinem Fall hat es dann jedes einzelne Mal mit demselben Iso geklappt.”.

Inspecting the .ova file, which is basically a tar compressed file consisting of an OVF directory as per Open Virtualization Format:Design – Wikipedia

The entire directory can be distributed as an Open Virtual Appliance (OVA) package, which is a tar archive file with the OVF directory inside.

Inspecting the disk image inside the directory learned me that pure one-file binary VMDK disk images start with a KMDV signature in big-endian and KDMV in little-endian (first four bytes are 4b 44 4d 56). More on the VMDK file format can be found in these links (all via [Wayback] vmdk file format specification – Google Search):

So here are some steps to get the .ova image to run on ESXi. I think it should work for ESXI 5.1 and up, but I have tested only on ESXi 6.7:

Read the rest of this entry »

Posted in *nix, *nix-tools, Cloud, Containers, diff, Docker, ESXi5, ESXi5.1, ESXi5.5, ESXi6, ESXi6.5, ESXi6.7, ESXi7, Infrastructure, Internet, InternetArchive, Kubernetes (k8n), patch, Power User, VirtualBox, Virtualization, VMware, VMware ESXi, VMware Workstation, WayBack machine | Leave a Comment »

Helping the WayBack ArchiveTeam team: running their Warrior virtual appliance on ESXi

Posted by jpluimers on 2021/03/19

The [WayBack] Archiveteam helps the WayBack machine with feeding new content.

You can help that team by running one or more “warrior” virtual machine instances. The VM is distributed as a virtual appliance in an ova file according to the Open Virtualization Format.

That format sounds more generic than it actually is, so the (at the time of writing) archiveteam-warrior-v3-20171013.ova file at [WayBack] Index of /downloads/warrior3/ was created for VirtualBox.X

This meant running it on VMware ESXi or VMware vSphere takes a few steps for patching it, then uploading it to your VMware host.

Since I might want to run the appliance on multiple places or multiple instances, I wanted to have a ready-to-go solution, I created a git repository with both the patch instructions and the update at [WayBack] wiert.me / public / ova / archiveteam-warrior-v3-20171013.ESXi · GitLab.

Read the rest of this entry »

Posted in Cloud, Containers, Docker, Infrastructure, Internet, InternetArchive, Kubernetes (k8n), Power User, WayBack machine | Leave a Comment »

 
%d bloggers like this: