The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,860 other subscribers

Google Search teamed up with the Internet Archive’s Wayback Machine: the good, the bad, the ugly

Posted by jpluimers on 2024/09/16

tTL;DR: Google Search also (after 3+ manual steps) showing the most recent Wayback Machine archived page for a web-page search result, helps tremendously for pages that are temporarily off-line (everyone knows how stable the cloud – someone else’s computers – or on-premise computing is), but takes too many steps and still doesn’t index the full Wayback Machine.

But there is a Clint Eastwood movie title here, even after the devastating fact that Google now off-loads its Google Cache to the Wayback Machine (which many sites refuse to be archived in), as per [Wayback/Archive] Google will no longer back up the Internet: Cached webpages are dead | Ars Technica:

The good

Many posted the links to the big news last week:

  • [Wayback/Archive] Learn search tips & how results relate to your search on Google – Google Search Help:

    Find search information about a result

    1. Start a search on Google.
    2. After the URL for a search result, select More .
    3. In theAbout this result” panel, scroll to “Your search & this result.”
    Tip: To find Search tips in the “Your search & this result” section, hover over or tap the underlined terms.

    Despite the screenshot in the Tweet below that referenced it, the above was not updated yet to include Wayback Machine information.

  • [Wayback/Archive] New Feature Alert: Access Archived Webpages Directly Through Google Search | Internet Archive Blogs

    In a significant step forward for digital preservation, Google Search is now making it easier than ever to access the past. Starting today, users everywhere can view archived versions of webpages directly through Google Search, with a simple link to the Internet Archive’s Wayback Machine.

    How It Works

    To access this new feature, conduct a search on Google as usual. Next to each search result, you’ll find three dots—clicking on these will bring up the “About this Result” panel. Within this panel, select “More About This Page” to reveal a link to the Wayback Machine page for that website.

    Through this direct link, you’ll be able to view previous versions of a webpage via the Wayback Machine, offering a snapshot of how it appeared at different points in time.

The last point in the second bullet isn’t true, as you can see in…

The bad

It is at least 3 clicks and some browsing away:

  1. Click on the ellipsis (three dots …) next to a result.
  2. Choose “More about this page” (when available).
  3. Scroll down and click “See previous versions” to view the most recently archived Wayback Machine entry for the page.

Note that contrary to the Internet Archive link above explains, it actually only show the latest page available without an overlay about more history.

Example for www.google.com/search?q=EProgrammerNotFound, showing the Wayback Machine archival of the Embarcadero docwiki as that site tends to go down irregularly and sometimes for quite long and getting content destroyed.

  1. https://www.google.com/search?q=EProgrammerNotFound

    https://web.archive.org/web/20240912185353if_/https://private-user-images.githubusercontent.com/2033367/367023048-90f7a2e8-e329-4a3f-bfcf-e7005f0294e2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjYxNjc1MDIsIm5iZiI6MTcyNjE2NzIwMiwicGF0aCI6Ii8yMDMzMzY3LzM2NzAyMzA0OC05MGY3YTJlOC1lMzI5LTRhM2YtYmZjZi1lNzAwNWYwMjk0ZTIucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDkxMiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA5MTJUMTg1MzIyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NmZkMmMxMDg1NTM4NzkxY2Q4MGUyZGVlNmRiNDgzZDFiMGQyZTM0MTlkY2JmM2MwMzRkNTI3MTdlMzAzNjI3ZiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.D96jPER5M9mKnHplXims9r4FUYXwS731iFwxn_QQXLs

    [Wayback/Archive] 367023048-90f7a2e8-e329-4a3f-bfcf-e7005f0294e2.png (692×723)

  2. https://www.google.com/search?q=EProgrammerNotFound#vhid=zephyr:0&vssid=atritem-http://docwiki.embarcadero.com/Libraries/Athens/en/System.SysUtils.EProgrammerNotFound

    https://web.archive.org/web/20240912185412if_/https://private-user-images.githubusercontent.com/2033367/367023545-44d091d2-d482-48e8-bad6-059e151710b5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjYxNjc1MDIsIm5iZiI6MTcyNjE2NzIwMiwicGF0aCI6Ii8yMDMzMzY3LzM2NzAyMzU0NS00NGQwOTFkMi1kNDgyLTQ4ZTgtYmFkNi0wNTllMTUxNzEwYjUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDkxMiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA5MTJUMTg1MzIyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MDdlZGMwYTQxODE4ODY2ZWJlNTZiMTExN2RhOGYzNDJlNzg0YjBhNDdmM2VhYTdjMmEwNjMyZTYzMjAxN2Y0YyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.F4GF_bB5wgVnzKXAF3vUt8Z9CkSiJsraoyy8rWEug14

    [Wayback/Archive] 367023545-44d091d2-d482-48e8-bad6-059e151710b5.png (948×727)

  3. https://www.google.com/search?q=About+http://docwiki.embarcadero.com/Libraries/Athens/en/System.SysUtils.EProgrammerNotFound&tbm=ilp&ctx=atr&sa=X&ved=2ahUKEwiuxK7j3L-IAxV37rsIHRakAHcQv5AHegQIABAD

    https://web.archive.org/web/20240912185433if_/https://private-user-images.githubusercontent.com/2033367/367023859-8dc3da08-b93f-42cc-b296-3f09767ccb9d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjYxNjc1MDIsIm5iZiI6MTcyNjE2NzIwMiwicGF0aCI6Ii8yMDMzMzY3LzM2NzAyMzg1OS04ZGMzZGEwOC1iOTNmLTQyY2MtYjI5Ni0zZjA5NzY3Y2NiOWQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDkxMiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA5MTJUMTg1MzIyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NTM5YzhjMWFmMDVjYTJmNWI0ZDc4ZWMwY2RiMDE2YTg3YjEzY2ExZWEyYWIxN2YwMGZmMjAwNGFmNTRlNDYzNiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.iOGz1CAu2RwVowecK8Xsb8Fz5qeqeQ0ZWTKP7QGt_m0

    [Wayback/Archive] 367023859-8dc3da08-b93f-42cc-b296-3f09767ccb9d.png (658×1311)

  4. https://web.archive.org/web/2if_/http://docwiki.embarcadero.com/Libraries/Athens/en/System.SysUtils.EProgrammerNotFound -> https://web.archive.org/web/20240912141605if_/https://docwiki.embarcadero.com/Libraries/Athens/en/System.SysUtils.EProgrammerNotFound

    [Wayback/Archive] 367024073-11acdceb-b878-42bd-b354-925dc267953f.png (970×1064)

Note that the “About” URL can be shortened to https://www.google.com/search?q=http://docwiki.embarcadero.com/Libraries/Athens/en/System.SysUtils.EProgrammerNotFound&tbm=ilp where the bold &tbm=ilp part switches on the “About” behaviour.

The reason for not showing more Wayback Machine archived history of the web page is in the encoded URL “if_“: this displays the plain archived page without an archival history overlay at the top.

In a future blog post I will elaborate more on those embedded parameters in URLs on the Wayback Machine, but if you remove the “if_“, so the URLs become

  1. https://web.archive.org/web/2/https://docwiki.embarcadero.com/Libraries/Athens/en/System.SysUtils.EProgrammerNotFound -> https://web.archive.org/web/20240912141605/https://docwiki.embarcadero.com/Libraries/Athens/en/System.SysUtils.EProgrammerNotFound

    [Wayback/Archive] 367262584-6313af88-304d-4042-913d-9554e6018b23.png (966×1064)

The ugly

A long standing wish from me is that Google Search indexes the vast Wayback Machine content as currently it is tough to query. You really have to know what you are searching and where it might have been stored in the past.

Google still removed results from the search index of pages that have disappeared, even if they have current content.

People passing away, sites being sunset, companies that went belly up but many still using their products and more: if the pages or sites get removed, Google removes them from the index, despite content being in the Wayback Machine and information being relevant.

Hopefully the latter will changes some day, making the content in the vast content in the Wayback Machine easier to query: currently Google Search only has some 100 indexed pages on the Wayback Machine.

Note the Wayback Machine cannot save Google Search results of itself in a normal way. When you try, you get a result like this:

Sorry

This URL is in the Save Page Now service block list and cannot be captured. Please email us at “info@archive.org” if you would like to discuss this more.

I found this out by archiving the Google Search query results of the first and last page searching for “site:web.archive.org”:

  1. https://web.archive.org/save/https://www.google.com/search?q=site%3Aweb.archive.org
  2. https://web.archive.org/save/https://www.google.com/search?q=site:web.archive.org&start=100

Luckily these two function OK after encoding the final . into %2E (yes: URL-encoding does not need to be limited to characters that have to be encoded):

  1. https://web.archive.org/save/https://www.google.com/search?q=site%3Aweb.archive%2Eorg
  2. https://web.archive.org/save/https://www.google.com/search?q=site%3Aweb.archive%2Eorg&start=100

The results:

  1. [Wayback/Archive] site:web.archive.org – Google Search
  2. [Wayback/Archive] site:web.archive.org – Google Search – page 11
  3. [Wayback/Archive] site:web.archive.org – Google Search – page 14

It looks like archived results have slightly more entries than what Google Search shows me, apparently a localisation issue, but still limited to 139 entries: far less than the entries present in the Wayback Machine (some 866*10^9).

Reasoning

The reasoning for building this Google Search feature addition is really good as mentioned in [Wayback/Archive] New Feature Alert: Access Archived Webpages Directly Through Google Search | Internet Archive Blogs:

As Mark Graham, director of the Wayback Machine, explains:
“The web is aging, and with it, countless URLs now lead to digital ghosts. Businesses fold, governments shift, disasters strike, and content management systems evolve—all erasing swaths of online history. Sometimes, creators themselves hit delete, or bow to political pressure. Enter the Internet Archive’s Wayback Machine: for more than 25 years, it’s been preserving snapshots of the public web. This digital time capsule transforms our “now-only” browsing into a journey through internet history. And now, it’s just a click away from Google search results, opening a portal to a fuller, richer web—one that remembers what others have forgotten.”

Via

Many pointed me to this, some of their links:

--jeroen

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.