XPath based bookmarklets for Archive.is: more JavaScript fiddling!

September 2023
M	T	W	T	F	S	S
	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Posted by jpluimers on 2023/09/20

As I promised a few months back in Bookmarklets for Archive.is and the WayBack Machine to go to the original page, moar JavaScript fiddling, this time with XPath based bookmarklets to navigate from Archive.is pages to Saved From, Redirected from, Via and Original pages.

An alternative would be using XPath as the additional fields are always structured in a table like the html below (taking complex pages like https://archive.ph/5iVVH and https://archive.ph/2015.11.14-044109/http://www.example.org/ as an example).

I got triggered to using XPath from this answer from [Wayback/Archive] gdyrrahitis at [Wayback/Archive] Javascript .querySelector find by innerTEXT – Stack Overflow (thanks [Wayback/Archive] passwd for asking):

OP’s question is about plain JavaScript and not jQuery. Although there are plenty of answers and I like @Pawan Nogariya answer, please check this alternative out.

You can use XPATH in JavaScript. More info on the MDN article here.

The document.evaluate() method evaluates an XPATH query/expression. So you can pass XPATH expressions there, traverse into the HTML document and locate the desired element.

In XPATH you can select an element, by the text node like the following, whch gets the div that has the following text node.
//div[text()="Hello World"]
To get an element that contains some text use the following:
//div[contains(., 'Hello')]
The contains() method in XPATH takes a node as first parameter and the text to search for as second parameter.

Check this plunk here, this is an example use of XPATH in JavaScript

Here is a code snippet:
var headings = document.evaluate("//h1[contains(., 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
var thisHeading = headings.iterateNext();

console.log(thisHeading); // Prints the html element in console
console.log(thisHeading.textContent); // prints the text content in console

thisHeading.innerHTML += "<br />Modified contents";
As you can see, I can grab the HTML element and modify it as I like.

The same question also has answers showing you can do the search in a mix of CSS selectors, pure (sometimes functional) JavaScript coding and some even regular expressions. These exactly show the reason why I opted for XPath: there the whole query is done in one language and is not spread over two.

More on non-XPath based solutions

Some examples (note that any thing selecting all div elements will usually be slow as the outer div will likely contain various inner div elements; see also my /html/body remark on XPath below):

Answer by [Wayback/Archive] Pawan Nogariya
Since you have asked it in javascript so you can have something like this
```
function contains(selector, text) {
  var elements = document.querySelectorAll(selector);
  return Array.prototype.filter.call(elements, function(element){
    return RegExp(text).test(element.textContent);
  });
}
```
And then call it like this
```
contains('div', 'sometext'); // find "div" that contain "sometext"
contains('div', /^sometext/); // find "div" that start with "sometext"
contains('div', /sometext$/i); // find "div" that end with "sometext", case-insensitive
```
Note that similar contains functions are present in various other answers, for instance [Wayback/Archive] Native javascript equivalent of jQuery :contains() selector – Stack Overflow (answered by [Wayback/Archive] elclanrs , asked by [Wayback/Archive] coulbourne) which has an interesting remark by [Wayback/Archive] avalanche1:

This is incorrect because it also includes results for all child nodes. I.e. if child node of element will contain text – element will be included into contains result; which is wrong.

by anonymous:

You could use this pretty simple solution:

Array.from(document.querySelectorAll('div'))
  .find(el => el.textContent === 'SomeText, text continues.');

by [Wayback/Archive] Andrew Willems:
This solution does the following:
- Uses the ES6 spread operator to convert the NodeList of all divs to an array.
- Provides output if the div contains the query string, not just if it exactly equals the query string (which happens for some of the other answers). e.g. It should provide output not just for ‘SomeText’ but also for ‘SomeText, text continues’.
- Outputs the entire div contents, not just the query string. e.g. For ‘SomeText, text continues’ it should output that whole string, not just ‘SomeText’.
- Allows for multiple divs to contain the string, not just a single div.
```
[...document.querySelectorAll('div')]      // get all the divs in an array
  .map(div => div.innerHTML)               // get their contents
  .filter(txt => txt.includes('SomeText')) // keep only those containing the query
  .forEach(txt => console.log(txt));       // output the entire contents of those
```
```
<div>SomeText, text continues.</div>
<div>Not in this div.</div>
<div>Here is more SomeText.</div>
```
and a similar one by [Wayback/Archive] Redu:
You best see if you have a parent element of the div you are querying. If so get the parent element and perform an element.querySelectorAll("div"). Once you get the nodeList apply a filter on it over the innerText property. Assume that a parent element of the div that we are querying has an id of container. You can normally access container directly from the id but let’s do it the proper way.
```
var conty = document.getElementById("container"),
     divs = conty.querySelectorAll("div"),
    myDiv = [...divs].filter(e => e.innerText == "SomeText");
```

The last one – despite not my preference – did teach me a few things I did not have much experience with:

[Wayback/Archive] Spread syntax (...) – JavaScript | MDN

Spread syntax (...) allows an iterable such as an array expression or string to be expanded in places where zero or more arguments (for function calls) or elements (for array literals) are expected, or an object expression to be expanded in places where zero or more key-value pairs (for object literals) are expected.
[Wayback/Archive] Array.prototype.map() – JavaScript | MDN

The map() method creates a new array populated with the results of calling a provided function on every element in the calling array.
[Wayback/Archive] Array.prototype.forEach() – JavaScript | MDN

The forEach() method executes a provided function once for each array element.
[Wayback/Archive] Array – JavaScript | MDN
- JavaScript arrays are resizable and can contain a mix of different data types. (When those characteristics are undesirable, use typed arrays instead.)
- JavaScript arrays are not associative arrays and so, array elements cannot be accessed using strings as indexes, but must be accessed using integers as indexes.
- JavaScript arrays are zero-indexed: the first element of an array is at index 0, the second is at index 1, and so on — and the last element is at the value of the array’s length property minus 1.
- JavaScript array-copy operations create shallow copies. (All standard built-in copy operations with any JavaScript objects create shallow copies, rather than deep copies).
[Wayback/Archive] => Arrow function expressions – JavaScript | MDN
An arrow function expression is a compact alternative to a traditional function expression, but is limited and can’t be used in all situations.

Differences & Limitations:
- Does not have its own bindings to this or super, and should not be used as methods.
- Does not have new.target keyword.
- Not suitable for call, apply and bind methods, which generally rely on establishing a scope.
- Can not be used as constructors.
- Can not use yield, within its body.
[Wayback/Archive] Function expression – JavaScript | MDN
The function keyword can be used to define a function inside an expression.

You can also define functions using the Function constructor and a function declaration.
```
const getRectArea = function(width, height) {
return width * height;
};

console.log(getRectArea(3, 4));
// expected output: 12x
```
[Wayback/Archive] Array.prototype.filter() – JavaScript | MDN

The filter() method creates a new array with all elements that pass the test implemented by the provided function.

I found another archived page having these 4 fields as well: [Wayback/Archive] Get Windows Terminal – Microsoft Store.

The common aspect with the previous page is that both come from WayBack Machine links:

[Wayback/Archive] Jeroen Wiert Pluimers on Twitter: “Cool, some @archiveis archived pages have not 1, not 2, but 4 fields!”

Back to XPath and the HTML I based the XPath expressions on:

<table border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td rowspan="6">
<div>archive.today</div>
<div>webpage capture</div>
</td>
<td>Saved from</td>
<td><form action="https://archive.fo/search/" method="get">
<table border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td><input name="q" type="text" value="http://web.archive.org/web/20151114044109/http://www.example.org/" /><input name="t" type="hidden" value="1447631011200" /><input name="id" type="hidden" value="5iVVH" /></td>
<td><input tabindex="-1" type="submit" value="search" /></td>
</tr>
</tbody>
</table>
<textarea name=""></textarea></form></td>
<td rowspan="2"><time datetime="2015-11-15T23:43:31Z">15 Nov 2015 23:43:31 UTC</time></td>
</tr>
<tr>
<td>Redirected from</td>
<td>
<div><input readonly="readonly" type="text" value="http://web.archive.org/web/2/example.org" /></div>
</td>
</tr>
<tr>
<td>Via</td>
<td>
<div><input readonly="readonly" type="text" value="http://www.example.org/" /></div>
</td>
<td><time datetime="2015-11-14T04:41:09Z">14 Nov 2015 04:41:09 UTC</time></td>
</tr>
<tr>
<td>Original</td>
<td>
<div><input readonly="readonly" type="text" value="http://example.org/" /></div>
</td>
<td><time datetime="2015-11-14T04:41:09Z">14 Nov 2015 04:41:09 UTC</time></td>
</tr>
</tbody>
</table>

On the one hand, XPath requires knowing yet another language. On the other hand: it is very well supported in web browsers, see:

The table in Comparison of web browsers: JavaScript support – Wikipedia, and the remarks right above it:

Information about what JavaScript technologies the browsers support. Note that although XPath is used by XSLT, it is only considered here if it can be accessed using JavaScript. External links lead to information about support in future versions of the browsers or extensions that provide such functionality, e.g., Babel.
[Wayback/Archive] Introduction to using XPath in JavaScript – XPath | MDN
[Wayback/Archive] Document.evaluate() – Web APIs | MDN

Returns an XPathResult based on an XPath expression and other given parameters.

Note that the returned XPathResult will never be null.

[Wayback/Archive] Document.evaluate(): values of resultType – Web APIs | MDN

These are supported values for the resultType parameter of the evaluate method:

Result Type Value Description

ANY_TYPE 0 Whatever type naturally results from the given expression.

NUMBER_TYPE 1 A result set containing a single number. Useful, for example, in an XPath expression using the count() function.

STRING_TYPE 2 A result set containing a single string.

BOOLEAN_TYPE 3 A result set containing a single boolean value. Useful, for example, an XPath expression using the not() function.

UNORDERED_NODE_ITERATOR_TYPE 4 A result set containing all the nodes matching the expression. The nodes in the result set are not necessarily in the same order they appear in the document.

ORDERED_NODE_ITERATOR_TYPE 5 A result set containing all the nodes matching the expression. The nodes in the result set are in the same order they appear in the document.

UNORDERED_NODE_SNAPSHOT_TYPE 6 A result set containing snapshots of all the nodes matching the expression. The nodes in the result set are not necessarily in the same order they appear in the document.

ORDERED_NODE_SNAPSHOT_TYPE 7 A result set containing snapshots of all the nodes matching the expression. The nodes in the result set are in the same order they appear in the document.

ANY_UNORDERED_NODE_TYPE 8 A result set containing any single node that matches the expression. The node is not necessarily the first node in the document that matches the expression.

FIRST_ORDERED_NODE_TYPE 9 A result set containing the first node in the document that matches the expression.

Results of NODE_ITERATOR types contain references to nodes in the document. Modifying a node will invalidate the iterator. After modifying a node, attempting to iterate through the results will result in an error.

Results of NODE_SNAPSHOT types are snapshots, which are essentially lists of matched nodes. You can make changes to the document by altering snapshot nodes. Modifying the document doesn’t invalidate the snapshot; however, if the document is changed, the snapshot may not correspond to the current state of the document, since nodes may have moved, been changed, added, or removed.

Result Type	Value	Description
`ANY_TYPE`	0	Whatever type naturally results from the given expression.
`NUMBER_TYPE`	1	A result set containing a single number. Useful, for example, in an XPath expression using the `count()` function.
`STRING_TYPE`	2	A result set containing a single string.
`BOOLEAN_TYPE`	3	A result set containing a single boolean value. Useful, for example, an XPath expression using the `not()` function.
`UNORDERED_NODE_ITERATOR_TYPE`	4	A result set containing all the nodes matching the expression. The nodes in the result set are not necessarily in the same order they appear in the document.
`ORDERED_NODE_ITERATOR_TYPE`	5	A result set containing all the nodes matching the expression. The nodes in the result set are in the same order they appear in the document.
`UNORDERED_NODE_SNAPSHOT_TYPE`	6	A result set containing snapshots of all the nodes matching the expression. The nodes in the result set are not necessarily in the same order they appear in the document.
`ORDERED_NODE_SNAPSHOT_TYPE`	7	A result set containing snapshots of all the nodes matching the expression. The nodes in the result set are in the same order they appear in the document.
`ANY_UNORDERED_NODE_TYPE`	8	A result set containing any single node that matches the expression. The node is not necessarily the first node in the document that matches the expression.
`FIRST_ORDERED_NODE_TYPE`	9	A result set containing the first node in the document that matches the expression.

For read-only access to nodes, use NODE_ITERATOR (with XPathResult.snapshotItem() and XPathResult.snapshotLength).
For read-write access, use NODE_SNAPSHOT (with XPathResult.iterateNext() until that returns null)
- or (UN)ORDERED_NODE (the last two with XPathResult.singleNodeValue), see [Wayback/Archive] javascript – How to modify the content of a div with no id and no class? – Stack Overflow (thanks [Wayback/Archive] Elie and [Wayback/Archive] Arash Oshnoudi).

[Wayback/Archive] XPathResult – Web APIs | MDN

The XPathResult interface represents the results generated by evaluating an XPath expression within the context of a given node.

Since XPath expressions can result in a variety of result types, this interface makes it possible to determine and handle the type and value of the result.
[Wayback/Archive] XPathResult.singleNodeValue – Web APIs | MDN
[Wayback/Archive] XPathResult.iterateNext() – Web APIs | MDN
[Wayback/Archive] XPathResult.snapshotLength – Web APIs | MDN
[Wayback/Archive] XPathResult.snapshotItem() – Web APIs | MDN

The drawback of XPath is similar to CSS selectors: yet another language to master. So for XPath, I wrote two helper functions separating the JavaScript framework bits and the XPath bits:

function getElementByXPath(xpath, parent) { // use when expecting zero or one result; default value for parent is document
  result = document.evaluate(xpath, parent || document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
  return result.singleNodeValue;
}

function getArchiveIsInputValue(tdText) {
  input = getElementByXPath(`/html/body//td[text()="${tdText}"]/following-sibling::td/div/input`);
  return input?.value;
}

First the getElementByXPath method. It calls document.evaluate to execute the xpath expression and get an XPathResult into the result variable. The parent parameter will be passed as document when null or undefined via the || trick. I have underlined the non-Boolean usage of the [Wayback/Archive] Logical OR (||) – JavaScript | MDN

The logical OR (||) operator (logical disjunction) for a set of operands is true if and only if one or more of its operands is true. It is typically used with boolean (logical) values. When it is, it returns a Boolean value. However, the || operator actually returns the value of one of the specified operands, so if this operator is used with non-Boolean values, it will return a non-Boolean value.

…

The following code shows examples of the || (logical OR) operator.
o1  = true  || true       // t || t returns true
o2  = false || true       // f || t returns true
o3  = true  || false      // t || f returns true
o4  = false || (3 == 4)   // f || f returns false
o5  = 'Cat' || 'Dog'      // t || t returns "Cat"
o6  = false || 'Cat'      // f || t returns "Cat"
o7  = 'Cat' || false      // t || f returns "Cat"
o8  = ''    || false      // f || f returns false
o9  = false || ''         // f || f returns ""
o10 = false || varObject // f || object returns varObject

The above use of || shows you can use it as a kind of default operator as shown in [Wayback/Archive] 3 Ways to Set Default Value in JavaScript | SamanthaMing.com:

My go-to has always been the ternary operator for assigning a value to a variable conditionally. But ever since I discovered that || can be used as a selector operator, I’ve been using that more. I find my code so much easier to read 👍

Yes, it takes some time to wrap your head around it. But once you grasp the concept, it’s super handy. Now I don’t think less code makes your code better. But in this instance, I prefer the || operator 🤩
let isHappyHour = '🍺';

// Logical Operator
isHappyHour = isHappyHour || '🍵'; // '🍺'

// Ternary
isHappyHour = isHappyHour ? isHappyHour : '🍵'; // '🍺'

// If/Else
if (isHappyHour) {
  isHappyHour = isHappyHour;
} else {
  isHappyHour = '🍵';
}

console.log(isHappyHour); // '🍺'

Passing XPathResult.FIRST_ORDERED_NODE_TYPE ensures that the result will the first matching node in the order of the matches in the browser DOM, which is the order you see in the HTML source code. Since XPathResult.FIRST_ORDERED_NODE_TYPE is passed, result will have a valid singleNodeValue field which is returned. It contains the actual HTML element we are after or null if the HTML element cannot be found.

The to the second bit: the XPath expression in the getArchiveIsInputValue method. It passes a string template `/html/body//td[text()="${tdText}"]/following-sibling::td/div/input` which has the /html/body bit to very much speedup XPath query processing (otherwise the whole document tree needs to be walked for td nodes, now only the ones in the <body> of the <html>). It also depends on the value of the tdText parameter. These are valid values for the tdText parameter and the resulting XPath expression that is being assembled:

Redirected from field:

/html/body//td[text()="Redirected from"]/following-sibling::td/div/input

Via field:

/html/body//td[text()="Via"]/following-sibling::td/div/input

Original field:

/html/body//td[text()="Original"]/following-sibling::td/div/input

This will first match <td> elements inside the /html/body path of the document that have the text() function (see below) match the passed value (either "Redirected from", "Via" or "Original").

From that match (since we passed XPathResult.FIRST_ORDERED_NODE_TYPE), it looks for the /following-sibling::td to get the first sibbling <td> element and inside that, it follows /div/input to get the <input> element we want.

These helped me understanding following-sibing:::

[Wayback/Archive] javascript – Get the text value of nextsibling through document.evaluate – Stack Overflow (thanks [Wayback/Archive] schnydszch), introducing me to following-sibling::td.
[Wayback/Archive] How to select following sibling/XML tag using XPath – Stack Overflow (thanks [Wayback/Archive] Corey Farwell , [Wayback/Archive] Dimitre Novatchev, [Wayback/Archive] Philipp, and [Wayback/Archive] Milan) explaining me the difference between following-sibling::td, following-sibling::td[1] and following-sibling::*.

You see that XPath is already farily complex. Luckily, this is XPath version 1.0 as virtually no web browser supports any higher XPath versions, see what [Wayback/Archive] Andriy Ivaneyko answered in [Wayback/Archive] xml – What browsers support Xpath 2.0? – Stack Overflow (question by [Wayback/Archive] User Default – Stack Overflow):

Majority of the browsers do not support XPATH 2.0, please see Comparison of layout engines to get more information.

On the text() function:

There is a nice short description on XPath: Node tests – Wikipedia for it:

text() finds a node of type text excluding any children, e.g. the hello in <k>hello<m> world</m></k>
[Wayback/Archive] XML Path Language (XPath): location paths has the more formal
- child::text() selects all text node children of the context node

[Wayback/Archive] XML Path Language (XPath): location paths also has a few bits on following-sibling:::
- following-sibling::chapter[position()=1] selects the next chapter sibling of the context node
- the following-sibling axis contains all the following siblings of the context node; if the context node is an attribute node or namespace node, the following-sibling axis is empty

You see that the XPath selection language is big. It might still be bigger than CSS selectors. Even on MDN there is much information:

I have the luck of having prior knowledge of doing XPath related work with XML in the past. Many are no so lucky, so hence the documentation references I made here.

MDN is not even the standard, which is at [Wayback/Archive] XML Path Language (XPath). That is incomplete too, hence a truckload of books, videos and web-sites cover it. The MDN links above reference a few good ones. Enjoy them!

Be prepared for oddities like explained by [Wayback/Archive] Mathias Müller in [Wayback/Archive] selenium – XPath: difference between dot and text() – Stack Overflow (question by [Wayback/Archive] Andersson):

the meaning of the two predicates (everything between [ and ]) is different. [text()="Ask Question"] actually means: return true if any of the text nodes of an element contains exactly the text “Ask Question”. On the other hand, [.="Ask Question"] means: return true if the string value of an element is identical to “Ask Question”.

In the XPath model, text inside XML elements can be partitioned into a number of text nodes if other elements interfere with the text,

In the same question, [Wayback/Archive] ggorlen answers [Wayback/Archive] this:

Although many browsers have $x(xPath) as a console built-in, here’s an aggregation of the useful-but-hardcoded snippets from [Wayback/Archive] Introduction to using XPath in JavaScript ready for use in scripts:

Snapshot

This gives a one-off snapshot of the xpath result set. Data may be stale after DOM mutations.
const $x = xp => {
  const snapshot = document.evaluate(
    xp, document, null, 
    XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null
  );
  return [...Array(snapshot.snapshotLength)]
    .map((_, i) => snapshot.snapshotItem(i))
  ;
};

console.log($x('//h2[contains(., "foo")]'));
<h2>foo</h2>
<h2>foobar</h2>
<h2>bar</h2>
First ordered node
const $xOne = xp => 
  document.evaluate(
    xp, document, null,
    XPathResult.FIRST_ORDERED_NODE_TYPE, null
  ).singleNodeValue
;

console.log($xOne('//h2[contains(., "foo")]'));
<h2>foo</h2>
<h2>foobar</h2>
<h2>bar</h2>
Iterator

Note however, that if the document is mutated (the document tree is modified) between iterations that will invalidate the iteration and the invalidIteratorState property of XPathResult is set to true, and a NS_ERROR_DOM_INVALID_STATE_ERR exception is thrown.
function *$xIter(xp) {
  const iter = document.evaluate(
    xp, document, null, 
    XPathResult.ORDERED_NODE_ITERATOR_TYPE, null
  );

  for (;;) {
    const node = iter.iterateNext();
    
    if (!node) {
      break;
    }
    
    yield node;
  }
}

// dump to array
console.log([...$xIter('//h2[contains(., "foo")]')]);

// return next item from generator
const xpGen = $xIter('//h2[text()="foo"]');
console.log(xpGen.next().value);
<h2>foo</h2>
<h2>foobar</h2>
<h2>bar</h2>

Back to the code (:

Based on the above two helper methods, I made these three bookmarklets:

Redirected from field:

javascript:{
function getElementByXPath(xpath, parent) { // use when expecting zero or one result; default value for parent is document
  result = document.evaluate(xpath, parent || document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
  return result.singleNodeValue;
}

function getArchiveIsInputValue(tdText) {
  input = getElementByXPath(`/html/body//td[text()="${tdText}"]/following-sibling::td/div/input`);
  return input?.value;
}

open(getArchiveIsInputValue("Redirected from"))
}

Via field:

javascript:{
function getElementByXPath(xpath, parent) { // use when expecting zero or one result; default value for parent is document
  result = document.evaluate(xpath, parent || document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
  return result.singleNodeValue;
}

function getArchiveIsInputValue(tdText) {
  input = getElementByXPath(`/html/body//td[text()="${tdText}"]/following-sibling::td/div/input`);
  return input?.value;
}

open(getArchiveIsInputValue("Via"))
}

Original field:

javascript:{
function getElementByXPath(xpath, parent) { // use when expecting zero or one result; default value for parent is document
  result = document.evaluate(xpath, parent || document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
  return result.singleNodeValue;
}

function getArchiveIsInputValue(tdText) {
  input = getElementByXPath(`/html/body//td[text()="${tdText}"]/following-sibling::td/div/input`);
  return input?.value;
}

open(getArchiveIsInputValue("Original"))
}

This was a lot more work than the CSS selector based code!

Here too, the mapping between HTML elements and the DOM interface is crucial. A few more links here:

JSFiddle and gist!

The project to fiddle around with the base functions: [Wayback/Archive] https://jsfiddle.net/h0495krm/

I also saved the base functions in [Wayback/Archive] Browser JavaScript getElement(s)ByXPath functions.

Moar references

If your HTML is XHTML (i.e. XML that is at least well-formed), then [Wayback/Archive] XPath online real-time tester, evaluator and generator for XML & HTML is cool (contrary to the title: if your HTML is not well-formed XML, then it only works on the bits that are well-formed).

Just a few questions are tagged as [Wayback/Archive] ‘document.evaluate‘ Questions – Stack Overflow.

These are in the gist but not in the above post:

[Wayback/Archive] javascript – Evaluating an XPath with document.evaluate() to get an array of nodes – Code Review Stack Exchange (thanks [Wayback/Archive] alecxe and [Wayback/Archive] Jonah)
[Wayback/Archive] Is there a way to get element by XPath using JavaScript in Selenium WebDriver? – Stack Overflow (thanks [Wayback/Archive] undetected Selenium for answering and [Wayback/Archive] pMan for asking)
To identify a WebElement using xpath and javascript you have to use the evaluate() method which evaluates an xpath expression and returns a result.

document.evaluate()

document.evaluate() returns an XPathResult based on an XPath expression and other given parameters.

The syntax is:
```
var xpathResult = document.evaluate(
  xpathExpression,
  contextNode,
  namespaceResolver,
  resultType,
  result
);
```
Where:
- xpathExpression: The string representing the XPath to be evaluated.
- contextNode: Specifies the context node for the query. Common practice is to pass document as the context node.
- namespaceResolver: The function that will be passed any namespace prefixes and should return a string representing the namespace URI associated with that prefix. It will be used to resolve prefixes within the XPath itself, so that they can be matched with the document. null is common for HTML documents or when no namespace prefixes are used.
- resultType: An integer that corresponds to the type of result XPathResult to return using named constant properties, such as XPathResult.ANY_TYPE, of the XPathResult constructor, which correspond to integers from 0 to 9.
- result: An existing XPathResult to use for the results. null is the most common and will create a new XPathResult
Demonstration

As an example the Search Box within the Google Home Page which can be identified uniquely using the xpath as //*[@name='q'] can also be identified using the google-chrome-devtools Console by the following command:
```
$x("//*[@name='q']")
```
Snapshot:

The same element can can also be identified using document.evaluate() and the xpath expression as follows:
```
document.evaluate("//*[@name='q']", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
```
Snapshot:

–jeroen

	// based on:
	// – https://codereview.stackexchange.com/questions/167571/evaluating-an-xpath-with-document-evaluate-to-get-an-array-of-nodes
	// – https://stackoverflow.com/questions/10596417/is-there-a-way-to-get-element-by-xpath-using-javascript-in-selenium-webdriver
	// – https://developer.mozilla.org/en-US/docs/Web/XPath/Introduction_to_using_XPath_in_JavaScript
	// – https://developer.mozilla.org/en-US/docs/Web/API/Document/evaluate#result_types
	// – https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Logical_OR


	function getElementsByXPath(xpath, parent) { // use when expecting zero or more results; default value for parent is document
	result = document.evaluate(xpath, parent \|\| document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);
	nodes = []
	while (node = result.iterateNext())
	nodes.push(node);
	return nodes;
	}

	function getElementByXPath(xpath, parent) { // use when expecting zero or one result; default value for parent is document
	result = document.evaluate(xpath, parent \|\| document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
	return result.singleNodeValue;
	}

view raw

XPathFunctions.js

hosted with ❤ by GitHub

This entry was posted on 2023/09/20 at 12:00 and is filed under Agile, Bookmarklet, Code Quality, Code Review, Development, HTML, JavaScript/ECMAScript, Power User, Scripting, Software Development, Web Browsers, Web Development, XML/XSD, XPath. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

	Lars Fosdal on Security alarm provider Woonve…
	Thomas Mueller on Question got closed in May 202…
	Thaddy de Koning on Formulier voor bewindvoerders…
	Thaddy de Koning on Formulier voor bewindvoerders…
	Thaddy de Koning on Formulier voor bewindvoerders…

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

My Flickr Stream

Pages

All categories

Email Subscription