The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,806 other followers

Archive for the ‘InternetArchive’ Category

Interactive @waybackmachine achievement unlocked while manually archiving 4 pages.: HTTP 429 Too Many Requests

Posted by jpluimers on 2022/06/20

[Wayback/Archive] Jeroen Wiert Pluimers on Twitter: “Interactive @waybackmachine achievement unlocked while manually archiving 4 pages. web.archive.org/429.html.

The below error took a few hours to recover from. The submitted URLs were indeed already archived when checking if they were.

It was about the URLs in my blog post earlier today: Vanaf 1 juli kost opheffen oude spaarrekening EUR 75, dus wees er snel bij: Beëindig je oude spaarproduct – ING – Sparen.

I really wish Archive.org had a status page to show system status, as right now you have to guess by pages like below about their status.

You can find the error page at [Archive] https://web.archive.org/429.html (but not all HTTP response codes have pages like this and some respond in a different way like [Archive] https://web.archive.org/404.html).

Read the rest of this entry »

Posted in Internet, InternetArchive, Power User, WayBack machine | Leave a Comment »

Wayback machine and VMware KB links

Posted by jpluimers on 2022/03/22

The VMware KB is notoriously bad into being saved in the WayBack Machine: saved links hardly render at all because of the VMware KB dynamic page loading structure.

But VMware KB articles expire, so a lot of web-pages point to non-existing links and end up through redirections at [Archive.is] https://kb.vmware.com/s/pagenotfound.

Below are a few link forms of the same VMware KB 2011818 article that vanished from the regular web. The first is saved in the WayBack Machine (but does not render), the second is saved and does render after a redirect to a saved third form, the most recent saved fourth form is actually a 404-error redirecting to a prior third form.

  1. https://kb.vmware.com/s/article/2011818
  2. http://kb.vmware.com/kb/2011818
  3. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011818
  4. http://kb.vmware.com:80/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011818

The first link form does archive as a rendered page in Archive.is if is is archived. t wasn’t, so the current archived version points to the “pagenotfound” page mentioned above.

Sometimes you have to dig deeper, as not all rendering archived versions contain actual content.

Here the first one is not even archived, the other ones are, but none of them have actual usable content:

  1. https://kb.vmware.com/s/article/2007922
  2. http://kb.vmware.com/kb/2007922
  3. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007922
  4. http://kb.vmware.com:80/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007922

This means you have to dig further in history:

  1. https://web.archive.org/web/20140123114343/http://kb.vmware.com:80/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007922 indicates not authorized
  2. https://web.archive.org/web/20130117041323/http://kb.vmware.com:80/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007922 shows the actual content.

–jeroen

Posted in Internet, InternetArchive, link rot, Power User, WayBack machine, WWW - the World Wide Web of information | Leave a Comment »

Digital accessibility is hard; Wayback archival of: Formulieren – CIZ

Posted by jpluimers on 2022/03/17

I know that digital accessibility does not come for free, but it is mandatory in Europe for at least documents and web-sites provided by government and semi-government as per [Wayback] EN 301 549 – Wikipedia

EN 301 549 is a European standard for digital accessibility. It specifies requirements for information and communications technology to be accessible for people with disabilities.

I bumped into numerous tab-order issues when filling out CIZ forms. This makes it way harder for my, as now I require a mouse despite having RSI symptoms for some 30+ years.

So, for my link archive so I can document that all these forms have severe tab-order issues (some fields are not even accessible by keyboard, are being emptied when you leave the field, or not even accessible by mouse): [Wayback] Formulieren – CIZ

Doet u een aanvraag bij het CIZ? Op deze pagina vindt u een overzicht van onze formulieren, zoals een machtigingsformulier en het Wlz-aanvraagformulier.

Hopefully by now the forms have been fixed.

Via:

Read the rest of this entry »

Posted in About, InternetArchive, LifeHacker, Personal, Power User, WayBack machine | Leave a Comment »

ESXi: some notes on .vswp files; there are actually two types of them!

Posted by jpluimers on 2022/02/23

Earlier this month, I ended ESXi: editing /etc/vmware/hostd/vmInventory.xml to fix the datastore UUID for unavailable VMs part 2 with this:

A final note: I need to check out if .vswp files need to be there at all, as my ESXi servers have plenty of physical memory in order not to swap out to disk. More on that in a future blog post.

Browsing back through my blog posts, I mentioned .vswp files before, but never really dug into them:

Read the rest of this entry »

Posted in Power User, VMware, Internet, VMware ESXi, Virtualization, ESXi6, InternetArchive, WayBack machine, ESXi6.5, ESXi6.7, ESXi7, ArchiveTeamWarrior | Leave a Comment »

Archive.is is more like a thread unroll service than an archival service

Posted by jpluimers on 2022/02/14

An interesting take a while ago on [Wayback] Archive.is blog — People often compare various features of…

People often compare various features of archive.is to those of archive.org being mistaken by name similarity (and recently added “save a page” function to archive.org).

This project is different in at least two respects:

  1. We have no goal to save the entire Internet. Only manually submitted pages which may be deleted/altered soon. We are about 100x smaller than archive.org in the storage space (700TB vs. 70PB) and expenses (X,000 $/mo vs. X00,000 $/mo).
  2. The pages are not saved in their network form. Archive.today launches real browsers (not even headless) and tries to load lazy images, unroll folded content, login into accounts if prompted with login form, remove “subscribe our maillist” modals, … So archive.today is not suitable for making notarized or digitally signed snapshots.

It would be more correct to compare it with other thread unrollers.

The RSS feed of blog.archive.today is at blog.archive.today/rss

Read the rest of this entry »

Posted in archive.is / archive.today, Bookmarklet, Internet, InternetArchive, Power User, Web Browsers | Leave a Comment »

 
%d bloggers like this: