Bookmarklets for Archive.is and the WayBack Machine to go to the original page
Posted by jpluimers on 2023/06/07
Quite often, when browsing an archived page on Archive.is or the WayBack Machine, I want to check the current status of the original page.
So I wrote a few Bookmarklets.
Archive.is
Default field
Any Archive.is page has a Saved from field which is an input html element having a name attribute with value q and a value property containing the URL, which is navigated to by assigning the location in the above code.
So my goto Bookmarklet is this one:
javascript:open(document.getElementsByName("q")[0]?.value)
It uses [0]? as there is no getElementsByName, but there is [Wayback/Archive] Document.getElementsByName() – Web APIs | MDN as name values need not to be unique but id values have to be.
Other Archive.is fields
The above works on all types of Archive.is page types:
- search pages like https://archive.is/https://example.org
- actual archived pages like https://archive.is/LkpeZ and https://archive.ph/2022.01.22-165646/https://example.org/
- these only have a
Saved fromfield.
- these only have a
- redirected archived pages like https://archive.ph/UEQeg and https://archive.ph/2013.01.03-111457/http://www.iana.org/domains/example/
- these both have
Saved fromandRedirected fromfields.
- these both have
- complex pages like https://archive.ph/5iVVH and https://archive.ph/2015.11.14-044109/http://www.example.org/
- those have even more fields: in addition to fields
Saved fromandRedirected from, the fieldsViaandOriginalalso are added.
- those have even more fields: in addition to fields
To get the additional fields from the other fields, we need to figure out a way to access them.
All these additional fields lack name or id attributes, but we might have a selection criterion as they are always readonly:
- [Wayback/Archive]
<input>: The Input (Form Input) element;readonlyattribute – HTML: HyperText Markup Language | MDN
First option for alternate fields: CSS selectors
The additional fields also are always in the same order, so we can use see [Wayback/Archive] Element.querySelector() – Web APIs | MDN or [Wayback/Archive] for the first or from first to last Document.querySelectorAll() – Web APIs | MDN (both use [Wayback/Archive] CSS selectors – CSS: Cascading Style Sheets | MDN) to access them:
- The
Redirected fromfield:
javascript:open(document.querySelector('input[readonly]')?.value)or
javascript:open(document.querySelectorAll('input[readonly]')[0]?.value) - The
Viafield:
javascript:open(document.querySelectorAll('input[readonly]')[1]?.value) - The
Originalfield
javascript:open(document.querySelectorAll('input[readonly]')[2]?.value)
Note I tried document.querySelector('input[readonly]:nth-of-type(2)') to access the Via field (and :nth-of-type(3) for the Original field), but both failed (because each input has a div as parent). The failure is explained by [Wayback/Archive] James Donnelly answering [Wayback/Archive] html – Matching the first/nth element of a certain type in the entire document – Stack Overflow (thanks [Wayback/Archive] user3289092 for asking):
With CSS alone this unfortunately isn’t possible. The documentation for the
:first-of-typepseudo-class [Wayback/Archive] states:The
:first-of-typepseudo-class represents an element that is the first sibling of its type in the list of children of its parent element.This means that
:first-of-typeis applied to the first element of its type relative to its parent and not the document’s root (or thebodyelement, in this case).
Combining CSS selectors with the Document Object Model (DOM) means you have to know both the HTML elements and how they map to the DOM equivalents. Links like these are of big help:
- [Wayback/Archive] HTML elements reference – HTML: Hypertext Markup Language | MDN
- [Wayback/Archive]
<input>: The Input (Form Input) element – HTML: HyperText Markup Language | MDN - [Wayback/Archive]
HTMLInputElement– Web APIs | MDN - [Wayback/Archive]
HTMLElement– Web APIs | MDN - [Wayback/Archive]
Element– Web APIs | MDN
Notes:
- The four bookmarklets for the additional fields depend on the order of elements in the page; the bookmarklet for the
Saved fromfield depends on the naming of theinputelement. - All bookmarklets will open an
undefinedpage (usuallyabout:blank) if the underlying element is not found. This is the result of how the?.operator works (see [Wayback/Archive] Optional chaining (?.) – JavaScript | MDN) and could be worked around with the??operator (see [Wayback/Archive] Nullish coalescing operator (??) – JavaScript | MDN). - The first bookmarklet fails for https://archive.is/example.org (it opens the same page in a new tabjust reloads the page). I will need to figure out a way to prepend a protocol to the URI when there is none (as
document.querySelectorAll('input[readonly]')[0]?.valuenow returnsexample.orgwhich has noprotocol).
Second option for alternate fields: XPath queries
This is a cool solution, but given that XPath is totally different from CSS selectors, I am going to do this in a future blog post (even a few months from now as the queue has already filled up quite a bit since starting writing the current post).
Hopefully the page title will stay the same, but here is the link already: XPath based bookmarklets for Archive.is: more JavaScript fiddling!
WayBack Machine search page
javascript:optn(document.querySelectorAll('input[type=text]')[0]?.value)
This is for pages like https://web.archive.org/web/*/nu.nl or . It has an input html element without id attribute or name attribute, but with type of text, so it is queryable with document.querySelectorAll('input[type=text]').
WayBack Machine archived page
This was hard, as WayBack Machine archived pages like https://web.archive.org/web/20220122112002/https://example.org/ “hide” their input elements in a closed Shadow DOM to separate them outside of the regular DOM, likely to prevent side effects from these elements to the archived paged and vice versa.
Whereas an open Shadow DOM can be retrieved from a web-page, a closed Shadow DOM Cannot. So I did the second best, which is search/replace the current URL and remove the bolded parts: https://web.archive.org/web/20220122112002/https://example.org/. Since the WayBack Machine can use both http and https, this is the bookmarklet:
javascript:open(location.href.replace(/^http[s]?:\/\/web\.archive\.org\/web\/\d{14}\/http/,'http'))
That looks a bit similar to the solution of Source: JavaScript bookmarklet to replace part of the WayBack machine URL.
Note that like the Archive.is bookmarklets, the above one fails for https://web.archive.org/web/*/example.org/. Here to I need to think about how to prepend a missing protocol.
Fixing the protocol issue
Besides http:// and https://, I also need to consider ftp://, ftps:// and sftp:// as ftp://ftp.adobe.com/pub/adobe/acrobat/win/11.x/11.0.23/misc/ is archived in the WayBack Machine as seen by https://web.archive.org/web/*/ftp://ftp.adobe.com/pub/adobe/acrobat/win/11.x/11.0.23/misc/.
These links might help me fixing the protocol issue eventually:
- [Wayback/Archive] String.prototype.replace() – JavaScript | MDN
- Negative lookahead:
- [Wayback/Archive] regex – Javascript regular expression to add protocol to url string – Stack Overflow
- [Wayback/Archive] Regex Tutorial – Lookahead and Lookbehind Zero-Length Assertions
- [Wayback/Archive] “negative lookahead” add missing protocol to uri – Google Search
- [Wayback/Archive] php – Add http:// prefix to URL when missing – Stack Overflow (thanks [Wayback/Archive] DiegoP., [Wayback/Archive] Evan Kennedy and [Wayback/Archive] dhaupin)
Your regex is smart that it preserves
https://however it doesn’t work with relative urls such as//www.example.com. It also appendshttp://ontoftp://or other protocols. Try this instead:preg_replace('/^\/\/|^(?!https?:)(?!ftps?:)/', 'http://', $src)(note: you can remove the check for ftp if you don’t need it) - [Wayback/Archive] .net – How can I apply a negative lookahead to a whole capture group? – Stack Overflow
- [Wayback/Archive] Url Regex with mandatory Protocol – Regex Tester/Debugger
- [Wayback/Archive] regex – Javascript regular expression to add protocol to url string – Stack Overflow
Basically the original -> replaced string testcases would be these:
ftp://example.org->ftp://example.orgftps://example.org->ftps://example.orgsftp://example.org->sftp://example.orghttp://example.org->http://example.orghttps://example.org->https://example.orgexample.org->https://example.org
Helpful links
The below links tremendously helped me figure out writing the above blog post.
- [Wayback/Archive] Accessing elements by type in JavaScript – Stack Overflow (thanks [Wayback/Archive] Mic)
- [Wayback/Archive] html – What is shadow root – Stack Overflow (thanks [Wayback/Archive] Garbee and [Wayback/Archive] Aniket Thakur)
- [Wayback/Archive] java – Need help to click on the element under the shadow Root (closed) type – Stack Overflow (thanks [Wayback/Archive] Y_Sh and [Wayback/Archive] undetected Selenium)
- [Wayback/Archive] Open vs. Closed Shadow DOM. Out of the four specifications created… | by Leon Revill | RevillWeb
- [Wayback/Archive] Shadow DOM v1: Self-Contained Web Components | Web Fundamentals | Google Developers
- [Wayback/Archive] javascript – Check if element contains #shadow-root – Stack Overflow (thanks [Wayback/Archive] Chase and [Wayback/Archive] KevBot)
–jeroen






Side effect-free bookmarklets: wrap them in an IIFE (Immediately Invoked Function Expression) « The Wiert Corner – irregular stream of stuff said
[…] Bookmarklets for Archive.is and the WayBack Machine to go to the original page […]