The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,860 other subscribers

The JavaScript bookmarklets that saved me a lot of time documenting the Embarcadero docwiki outage

Posted by jpluimers on 2023/09/28

Winter 2022, the Embarcadero docwiki (their most active site which contains all documentation for all their products) was down. Twice. First for a week, then parts of it for almost a week, then only parts of the Alexandria got up in a stable way.

Back then I published The Delphi documentation site docwiki.embarcadero.com has been down/up oscillating for 4 days is now down for almost a day.. The product and library documentation for the most recent version got back up in a week, but the Code Examples and older product versions took much longers.

Usually once learns way more about a system when it is failing then when it is working. That was the case this system as well.

Documenting the failing system took considerable time, but would have taken way more if not for these two JavaScript browser bookmarklets:

  1. Archiving a page in Archive.is (as the Wayback Machine does not archive web pages throwing http errors):
    javascript:void(open('https://archive.is/?run=1&url='+encodeURIComponent(document.location)))
  2. From the archived page, create an html list-item with link to archived and actual page plus some information from the page:
    javascript:{
    function x(xpath, parent) {
      result = document.evaluate(xpath, parent || document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);
      nodes = []
      while (node = result.iterateNext())
        nodes.push(node);
      return nodes;
    }
    aua=document.createElement("a");
    oua=document.createElement("a");
    aua.href=document.querySelector('link[rel="canonical"]')?.href;
    o=document.querySelectorAll('input[readonly]')[0]?.value;
    o=o??document.getElementsByName("q")[0]?.value;
    oua.href=o;
    aua.text="Archive";
    oua.text=document.title;
    aua.target="_blank";
    oua.target="_blank";
    aua.rel="noopener";
    oua.rel="noopener";
    ouaps=oua.pathname.split("/");
    r=new Intl.DisplayNames(['en'], {type: 'language'});
    l=ouaps[3];
    l=r.of(l==="e"?"en":l);
    s=ouaps[4]??'';
    if(!!s){s=" "+decodeURI(s)};
    divs=x('//div[contains(., "1146")]');
    d=divs[divs.length-1]?.textContent.split("`")[1]??'';
    if(!!d){d=` <code>${d}</code>`};
    li=`<li>[${aua.outerHTML}] ${oua.outerHTML} ${l} ${ouaps[2]} ${ouaps[1]}${s}${d}</li>`;
    prompt("li", li);
    }

I mentioned the first in Archive.is is more like a thread unroll service than an archival service and it is pretty straightforward: it converts the current URL (which is in document.location) in URL encoded form (using encodeURIComponent), then appends it to https://archive.is/?run=1&url= and opens a new browser tab with it.

The latter is important as when archiving more than a few pages at a time, Archive.is will show a captcha (sometimes again after a bunch of pages) and won’t save any page until you have solved the captcha.

The second bookmarklet is way more complex. First, it uses abbreviates variable names to keep it short. In retrospect, I should have made them longer, so here is the translation table:

Short Long Description
aua archivedUrlAnchor HTML anchor having the Archive.is canonical URL of the page
oua originalUrlAnchor HTML anchor having the original URL of the page
ouaps originalUrlAnchorPathsSplitted Path portion of the original URL of the page splitted on / boundaries
r regionHelper Helper to convert 2-character language code into readable English form
l language
s suffix
d databaseName database name of the l10n_cache database (used for localisation/l10n)

Second, it uses quite a few JavaScript language tricks and framework knowledge to keep things short.

Let’s dig into these.

Getting data from the document

Filling the URLs is done in the below code which I already explained in Source: Bookmarklet for Archive.is to navivate to the canonical link and Bookmarklets for Archive.is and the WayBack Machine to go to the original page:

aua.href=document.querySelector('link[rel="canonical"]')?.href;
o=document.querySelectorAll('input[readonly]')[0]?.value;
o=o??document.getElementsByName("q")[0]?.value;
oua.href=o;

The third line can also be replaced with this one:

o??=document.getElementsByName("q")[0]?.value;

This uses three operators that have to do with the JavaScript concept Nullish. Before digging into Nullish however, first let me point back to the two query functions I explained for Archive.is in Bookmarklets for Archive.is and the WayBack Machine to go to the original page.

The operators are:

I will dig into Nullish in a few headings.

This bit uses to JavaScript operators for handling undefined/null values, which together JavaScript calls Nullish and are a subset of Falsy:

In addition to the ?. optional chaining operator to handle the nullish cases, this code also uses the [Wayback/Archive] Nullish coalescing operator (??) – JavaScript | MDN:

The nullish coalescing operator (??) is a logical operator that returns its right-hand side operand when its left-hand side operand is null or undefined, and otherwise returns its left-hand side operand.

I wrote a tiny bit about both operators before in Bookmarklets for Archive.is and the WayBack Machine to go to the original page, but it is worth repeating in more detail here as the concept is crucial and used often.

Postprocessing the data: the right format part 1

The docwiki URLs use standard 2-character language codes which you can find back in List of ISO 639-1 codes – Wikipedia. The conversion is being done through [Wayback/Archive] Intl.DisplayNames – JavaScript | MDN:

The Intl.DisplayNames object enables the consistent translation of language, region and script display names.

Sometimes however, the docwiki uses invalid language codes:

An URL like https://docwiki.embarcadero.com/CodeExamples/Sydney/de/Special:Search/Main%20Page redirects to https://docwiki.embarcadero.com/CodeExamples/Sydney/e/Special:Search/Main%20Page which uses e as language code (which un turn is not a valid language).

Sometimes this reflects in the Archive.is archival, for instance compare these two:

Solving this is done in this line:

l=r.of(l==="e"?"en":l);

It uses both the operator [Wayback/Archive] Strict equality (===) – JavaScript | MDN

The strict equality operator (===) checks whether its two operands are equal, returning a Boolean result. Unlike the equality operator, the strict equality operator always considers operands of different types to be different.

and the [Wayback/Archive] Conditional (ternary) operator – JavaScript | MDN

The conditional (ternary) operator is the only JavaScript operator that takes three operands: a condition followed by a question mark (?), then an expression to execute if the condition is truthy followed by a colon (:), and finally the expression to execute if the condition is falsy. This operator is frequently used as an alternative to an if...else statement.

Lax typing is both a strength and weakness of JavaScript. Hence the usage of both the === operator and if(!!s) trick in my code.

Let’s dig in the latter now.

Postprocessing: more nullish values

Depending on the original page, not all bits are present in the archived page. When not they are either undefined or null, which JavaScript collectively calls Nullish and is very similar to Falsy (for both terms, see the references below).

Furthermore, the information might not be in the right format.

Both are solved with these JavaScript tricks:

if(!!s){s=" "+decodeURI(s)};

Sometimes s is of the form Main%20Page which is Main Page in URL-encoding and can be decoded using [Wayback/Archive] decodeURI() – JavaScript | MDN.

Sometimes s has no value. The if(!!s) trick covers that (it is based on Falsy, which is explained below) and executes the {s=" "+decodeURI(s)} part only if s has a value. The first bit can also be if(Boolean(s)). Handling Nullish values is a common problem and is explained by [Wayback/Archive] karthick.sk in [Wayback/Archive] How can I check for an empty/undefined/null string in JavaScript? – Stack Overflow (thanks [Wayback/Archive] casademora for asking!):

All the previous answers are good, but this will be even better. Use dual NOT operators (!!):
if (!!str) {
    // Some code here
}
Or use type casting:
if (Boolean(str)) {
    // Code here
}
Both do the same function. Typecast the variable to Boolean, where str is a variable.
  • It returns false for nullundefined0000""false.
  • It returns true for all string values other than the empty string (including strings like "0" and " ")

There are many other ways to check for this. Some performance measurements have been done by [Wayback/Archive] Kamil Kiełczewski at [Wayback/Archive] How can I check for an empty/undefined/null string in JavaScript? – Stack Overflow:

I perform tests on macOS v10.13.6 (High Sierra) for 18 chosen solutions. Solutions works slightly different (for corner-case input data) which was presented in the snippet below.
Conclusions
  • the simple solutions based on !str,==,=== and length are fast for all browsers (A,B,C,G,I,J)
  • the solutions based on the regular expression (test,replace) and charAt are slowest for all browsers (H,L,M,P)
  • the solutions marked as fastest was fastest only for one test run – but in many runs it changes inside ‘fast’ solutions group

Back to my code.

Getting the database name

That is: if there is one. Not all saved pages had an error on them indicating the database name. The ones that did look like [ArchiveInternal error – RAD Studio: XE8 main page:

[53d58941e2d881306538a66d] /RADStudio/XE8/en/Main_Page WikimediaRdbmsDBQueryError from line 1457 of /var/www/html/shared/BaseWiki31/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading?
Query: SELECT lc_value FROM `rad_xe8_en_l10n_cache` WHERE lc_lang = 'en' AND lc_key = 'deps' LIMIT 1
Function: LCStoreDB::get
Error: 1146 Table 'wikidb.rad_xe8_en_l10n_cache' doesn't exist (10.50.1.120)
Backtrace:

This line gets the div elements (see (see HTML element: div – Wikipedia)) containing 1146:

divs=x('//div[contains(., "1146")]');

The x function is a condensed version of the getElementsByXPath I described last week in XPath based bookmarklets for Archive.is: more JavaScript fiddling!.

Note it uses the || to perform default assignment as shown in in [Wayback/Archive] 3 Ways to Set Default Value in JavaScript | SamanthaMing.com.

More Nullish undefined/null handling

There is another bit which has to do with values:

d=divs[divs.length-1]?.textContent.split("`")[1]??'';

One statement having both the ?. and ?? operator to handle both cases of Nullish: the div not existing, or the split not returning enough elements.

Wut, no regex?

When you look at the above code, there are no regular expressions in them.

I hesitated a bit when writing this part:

d=divs[divs.length-1]?.textContent.split("`")[1]??'';

It assumes the empirically observed pattern that the last div in the result contained the language name within back-ticks. If there was one at all. Doing it in regex would be at least as complex and require yet another language skill.

Anchor assembly

The easiest way to build an HTML anchor is by using an instance of [Archive/Archive] HTMLAnchorElement – Web APIs | MDN. These lines use the various properties of it:

aua.href=document.querySelector('link[rel="canonical"]')?.href;
oua.href=ou=document.getElementsByName("q")[0]?.value;
aua.text="Archive";
oua.text=document.title;
aua.target="_blank";
oua.target="_blank";
aua.rel="noopener";
oua.rel="noopener";
ouaps=oua.pathname.split("/");

Creation is easily done through [Wayback/Archive] Document.createElement() – Web APIs | MDN, but you have to know that a maps to HTMLAnchorElement:

aua=document.createElement("a");
oua=document.createElement("a");

String assembly

Assembling strings can be a tedious job. I prefer to use backticks for this as it allows to embed the JavaScript expression within the template string which are in all the bolded parts below:

li=`<li>[${aua.outerHTML}] ${oua.outerHTML} ${l} ${ouaps[2]} ${ouaps[1]}${s}${d}</li>`;

The mechanism is called [Wayback/Archive] Template literals (Template strings) – JavaScript | MDN

Template literals are string literals allowing embedded expressions. You can use multi-line strings and string interpolation features with them.
They were called “template strings” in prior editions of the ES2015 specification.

I briefly mentioned them 5 years ago in I wish the Delphi language supported multi-line strings and were part of a few examples in the more recent post Source: For my reading list: some links on Twitter bookmarklets.

The are cool and make code a lot more readable: it is immediately clear how the List Element (see HTML element: li – Wikipedia) is being built.

--jeroen



javascript:{
function x(xpath, parent) {
result = document.evaluate(xpath, parent || document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);
nodes = []
while (node = result.iterateNext())
nodes.push(node);
return nodes;
}
aua=document.createElement("a");
oua=document.createElement("a");
aua.href=document.querySelector('link[rel="canonical"]')?.href;
o=document.querySelectorAll('input[readonly]')[0]?.value;
o=o??document.getElementsByName("q")[0]?.value;
oua.href=o;
aua.text="Archive";
oua.text=document.title;
aua.target="_blank";
oua.target="_blank";
aua.rel="noopener";
oua.rel="noopener";
ouaps=oua.pathname.split("/");
r=new Intl.DisplayNames(['en'], {type: 'language'});
l=ouaps[3];
l=r.of(l==="e"?"en":l);
s=ouaps[4]??'';
if(!!s){s=" "+decodeURI(s)};
divs=x('//div[contains(., "1146")]');
d=divs[divs.length-1]?.textContent.split("`")[1]??'';
if(!!d){d=` <code>${d}</code>`};
li=`<li>[${aua.outerHTML}] ${oua.outerHTML} ${l} ${ouaps[2]} ${ouaps[1]}${s}${d}</li>`;
prompt("li", li);
}

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.