The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,861 other subscribers

Archive for the ‘InternetArchive’ Category

Bookmarklet to navigate from a page to the most recent saved WayBack machine entry

Posted by jpluimers on 2023/10/04

A while ago, while writing last weeks post XPath based bookmarklets for Archive.is: more JavaScript fiddling!, I needed the most recent WayBack Machine archival of

https://developer.mozilla.org/en-US/docs/Web/XPath/Introduction_to_using_XPath_in_JavaScript

I vaguely remembered replacing the normal timestamp with a 3 and 13 zeros, so I tried this

https://web.archive.org/web/30000000000000/https://developer.mozilla.org/en-US/docs/Web/XPath/Introduction_to_using_XPath_in_JavaScript

And indeed, it did a HTTP 302 redirect to

https://web.archive.org/web/20220312161117/https://developer.mozilla.org/en-US/docs/Web/XPath/Introduction_to_using_XPath_in_JavaScript

So I quickly made this bookmarklet:

javascript:location.href='https://web.archive.org/web/30000000000000/'+document.location.href;

Then I created another one for getting the screenshot:

javascript:location=location.href.replace(/^https:\/\/web\.archive\.org\/save\/http/,'https://web.archive.org/web/30000000000000/http://web.archive.org/screenshot/http')

That works for screenshots archived with a Wayback Machine account, as these are related because of the inserted http://web.archive.org/screenshot/ fragment:

Since the Wayback Machine always looks for the closest savet timestamp, it does not matter the timestamps in these archived pages have a slight mismatch.

Memory lane

20231006: I edited this section referring two prior blog posts instead of one because of [Wayback/Archive] pbeccard: “@wiert @oliof You can also use…” – Mastodon (clearly showing that Mastodon like any social media platform does mangle backtick quoted code):

@wiert @oliof You can also use `javascript:location.href=’web.archive.org/web/*/’+docume to get the overview. I find this quite useful since I often want an older version of a page.

And later in the reply chain:

[Wayback/Archive] pbeccard: “@wiert @oliof Ah, I thought b…” – Mastodon

@wiert @oliof Ah, I thought by now that maybe Markdown is supported. I pulled the bookmarklet out of my bookmarklet bookmark folder. Here is a copy: https://gist.github.com/corppneq/d61e3…

[Wayback/Archive] Gist: Bookmarklets

I also found back two blog posts:

  1. Need to write a proper bookmarklet for the wayback archive (: mentioning many useful Wayback Machine JavaScript Bookmarklets from my gist [Wayback/Archive] Ideas/inspiration for writing a proper WayBack archive.org bookmarklet including this one:

    [Wayback/Archive] http://www.gyford.com/misc/wayback.html

      • WayBack:

        javascript:location.href='http://web.archive.org/web/*/'+document.location.href;
        

    I also archived this referred page: [Wayback/Archive] Bookmarklets.com – What’s New.

  2. JavaScript bookmarklet to replace part of the WayBack machine URL with a bookmarklet replacing

    JavaScript bookmarklet to replace part of the WayBack machine URL:

    A bookmarklet that goes to the latest rendered saved version (sometimes saved versions have not been rendered yet, so you get the latest available render):

    javascript:location=location.href.replace(/^https:\/\/web\.archive\.org\/save\/http/,'https://web.archive.org/web/30000000000000/http')

    The WayBack Machine uses a 14-position ID and tries to find the render that is the most close by. This is the format of the ID:

    yyyymmddhhmmss

    This is granular enough, as the WayBack machine only allows new saves that are usually 30+ minutes apart.

    (Note that period by now seems to be increased from 30+ minutes to 45+ minutes)

It also found back this post having the same huge number: 0.30000000000000004.com. How cool is WordPress search (:

–jeroen

Posted in Bookmarklet, Development, Internet, InternetArchive, JavaScript/ECMAScript, Power User, Scripting, Software Development, WayBack machine, Web Browsers | Leave a Comment »

Bookmarklet for Archive.is to navigate to the canonical link

Posted by jpluimers on 2023/08/15

This is a follow-up to Bookmarklets for Archive.is and the WayBack Machine to go to the original page.

Archive.is has two kinds of URLs:

  1. The encoded version is the short form without any meta-information,
  2. The canonical version is a long form and has metadata about Archive date and time, and the Archived URL,

You get the first URL both after archiving and when browsing from an archived page to another archived page (if it is not archived you will go the unarchived full page URL).

Read the rest of this entry »

Posted in archive.is / archive.today, Development, Internet, InternetArchive, JavaScript/ECMAScript, Power User, Scripting, Software Development, WayBack machine | Leave a Comment »

DPReview archives: how accessible will they be?

Posted by jpluimers on 2023/04/10

There are various posts indicating part or all of DPreview will be archived:

  1. [Wayback/Archive] DPReview closure: an update: Digital Photography Review
  2. [Wayback/Archive] The Wayback Machine on Twitter: “@jpluimers @geerlingguy @internetarchive We are “on it””
  3. [Wayback/Archive] DPReview – Archiveteam
  4. [Wayback/Archive] Digicam Finder · The most complete and accurate digital camera data source on the internet (1994 — 2023)  which is open source at [Wayback/Archive] open-product-data/digital-cameras: The most complete and accurate digital camera* data on the internet, assembled and maintained by the community. (via [Wayback/Archive] Good news — the camera feature search and all data is saved | Migration | DPRevived)

I wonder how accessible each form of archive will be. The last entry in the above list is very accessible, but only has the camera data (which is a very important aspect, but do not underestimate the forum with millions of posts either).

–jeroen

Posted in ArchiveTeamWarrior, Internet, InternetArchive, Photography, Power User | Leave a Comment »

Working around Archive.is/.today/.ph/.li/.vn/.fo/.md eternal spinner “Loading” when trying to archive a page

Posted by jpluimers on 2023/01/13

I have had the below Archive.is spinner “Loading” without any progress indication on a couple of URLs the last few months and I think they are tied to having special characters in the URL-to-be-archived.

My usual workaround was to first archive in the Wayback Machine, then archive the resulting URL in Archive.is as it would automatically follow the path up to the original URL,

That of course failed when  https://web.archive.org/web/*/vx-underground.org did not want to save in Archive.is: either these would give an eternal spinner on the “Loading” page no matter the browser you were using either the escaped %2A or *:

Read the rest of this entry »

Posted in archive.is / archive.today, Conference Topics, Conferences, Event, Internet, InternetArchive, LifeHacker, Power User, WayBack machine | Leave a Comment »

Interactive @waybackmachine achievement unlocked while manually archiving 4 pages.: HTTP 429 Too Many Requests

Posted by jpluimers on 2022/06/20

[Wayback/Archive] Jeroen Wiert Pluimers on Twitter: “Interactive @waybackmachine achievement unlocked while manually archiving 4 pages. web.archive.org/429.html.

The below error took a few hours to recover from. The submitted URLs were indeed already archived when checking if they were.

It was about the URLs in my blog post earlier today: Vanaf 1 juli kost opheffen oude spaarrekening EUR 75, dus wees er snel bij: Beëindig je oude spaarproduct – ING – Sparen.

I really wish Archive.org had a status page to show system status, as right now you have to guess by pages like below about their status.

You can find the error page at [Archive] https://web.archive.org/429.html (but not all HTTP response codes have pages like this and some respond in a different way like [Archive] https://web.archive.org/404.html).

Read the rest of this entry »

Posted in Internet, InternetArchive, Power User, WayBack machine | Leave a Comment »

Wayback machine and VMware KB links

Posted by jpluimers on 2022/03/22

The VMware KB is notoriously bad into being saved in the WayBack Machine: saved links hardly render at all because of the VMware KB dynamic page loading structure.

But VMware KB articles expire, so a lot of web-pages point to non-existing links and end up through redirections at [Archive.is] https://kb.vmware.com/s/pagenotfound.

Below are a few link forms of the same VMware KB 2011818 article that vanished from the regular web. The first is saved in the WayBack Machine (but does not render), the second is saved and does render after a redirect to a saved third form, the most recent saved fourth form is actually a 404-error redirecting to a prior third form.

  1. https://kb.vmware.com/s/article/2011818
  2. http://kb.vmware.com/kb/2011818
  3. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011818
  4. http://kb.vmware.com:80/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011818

The first link form does archive as a rendered page in Archive.is if is is archived. t wasn’t, so the current archived version points to the “pagenotfound” page mentioned above.

Sometimes you have to dig deeper, as not all rendering archived versions contain actual content.

Here the first one is not even archived, the other ones are, but none of them have actual usable content:

  1. https://kb.vmware.com/s/article/2007922
  2. http://kb.vmware.com/kb/2007922
  3. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007922
  4. http://kb.vmware.com:80/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007922

This means you have to dig further in history:

  1. https://web.archive.org/web/20140123114343/http://kb.vmware.com:80/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007922 indicates not authorized
  2. https://web.archive.org/web/20130117041323/http://kb.vmware.com:80/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007922 shows the actual content.

–jeroen

Posted in Internet, InternetArchive, link rot, Power User, WayBack machine, WWW - the World Wide Web of information | Leave a Comment »

Digital accessibility is hard; Wayback archival of: Formulieren – CIZ

Posted by jpluimers on 2022/03/17

I know that digital accessibility does not come for free, but it is mandatory in Europe for at least documents and web-sites provided by government and semi-government as per [Wayback] EN 301 549 – Wikipedia

EN 301 549 is a European standard for digital accessibility. It specifies requirements for information and communications technology to be accessible for people with disabilities.

I bumped into numerous tab-order issues when filling out CIZ forms. This makes it way harder for my, as now I require a mouse despite having RSI symptoms for some 30+ years.

So, for my link archive so I can document that all these forms have severe tab-order issues (some fields are not even accessible by keyboard, are being emptied when you leave the field, or not even accessible by mouse): [Wayback] Formulieren – CIZ

Doet u een aanvraag bij het CIZ? Op deze pagina vindt u een overzicht van onze formulieren, zoals een machtigingsformulier en het Wlz-aanvraagformulier.

Hopefully by now the forms have been fixed.

Via:

Read the rest of this entry »

Posted in About, InternetArchive, LifeHacker, Personal, Power User, WayBack machine | Leave a Comment »

ESXi: some notes on .vswp files; there are actually two types of them!

Posted by jpluimers on 2022/02/23

Earlier this month, I ended ESXi: editing /etc/vmware/hostd/vmInventory.xml to fix the datastore UUID for unavailable VMs part 2 with this:

A final note: I need to check out if .vswp files need to be there at all, as my ESXi servers have plenty of physical memory in order not to swap out to disk. More on that in a future blog post.

Browsing back through my blog posts, I mentioned .vswp files before, but never really dug into them:

Read the rest of this entry »

Posted in ArchiveTeamWarrior, ESXi6, ESXi6.5, ESXi6.7, ESXi7, Internet, InternetArchive, Power User, Virtualization, VMware, VMware ESXi, WayBack machine | Leave a Comment »

Archive.is is more like a thread unroll service than an archival service

Posted by jpluimers on 2022/02/14

An interesting take a while ago on [Wayback] Archive.is blog — People often compare various features of…

People often compare various features of archive.is to those of archive.org being mistaken by name similarity (and recently added “save a page” function to archive.org).

This project is different in at least two respects:

  1. We have no goal to save the entire Internet. Only manually submitted pages which may be deleted/altered soon. We are about 100x smaller than archive.org in the storage space (700TB vs. 70PB) and expenses (X,000 $/mo vs. X00,000 $/mo).
  2. The pages are not saved in their network form. Archive.today launches real browsers (not even headless) and tries to load lazy images, unroll folded content, login into accounts if prompted with login form, remove “subscribe our maillist” modals, … So archive.today is not suitable for making notarized or digitally signed snapshots.

It would be more correct to compare it with other thread unrollers.

The RSS feed of blog.archive.today is at blog.archive.today/rss

Read the rest of this entry »

Posted in archive.is / archive.today, Bookmarklet, Conference Topics, Conferences, Development, Event, Internet, InternetArchive, JavaScript/ECMAScript, Power User, Scripting, Software Development, Web Browsers | Leave a Comment »

When high SEO ranking fails to give you a reliable result: IsItDownRightNow.com failed to detect the WayBack Machine outage

Posted by jpluimers on 2022/02/11

A high SEO ranking does not automatically indicate a reliable result.

When the WayBack Machine was down a while ago (it responded to traceroute UDP requests, but would not establish TCP connections on ports 80 and 443), the first Google hit for detecting down status (searching for [Archive.is] waybackmachine down – Google Search) failed miserably because it redirected web.archive.org (which fails) to http://www.archive.org (which succeeds):

IsIdDownRightNow failing to detect web.archive.org downtime

IsIdDownRightNow failing to detect web.archive.org downtime

Luckily when asking around on Twitter:

  • others were experiencing the same problem, not just in The Netherlands, but also in other countries
  • after trying a few things, the WayBack machine got backup [Archive.is] before I could try cURL.
  • I got pointed at www.uptrends.com/tools/uptime which correctly does check the right subdomain and shows it is down from many locations:

Read the rest of this entry »

Posted in *nix, cURL, Infrastructure, Internet, InternetArchive, LifeHacker, Power User, WayBack machine | Leave a Comment »