OP’s question is about plain JavaScript and not jQuery. Although there are plenty of answers and I like @Pawan Nogariya answer, please check this alternative out.
You can use XPATH in JavaScript. More info on the MDN article here.
The document.evaluate() method evaluates an XPATH query/expression. So you can pass XPATH expressions there, traverse into the HTML document and locate the desired element.
In XPATH you can select an element, by the text node like the following, whch gets the div that has the following text node.
//div[text()="Hello World"]
To get an element that contains some text use the following:
//div[contains(., 'Hello')]
The contains() method in XPATH takes a node as first parameter and the text to search for as second parameter.
Check this plunk here, this is an example use of XPATH in JavaScript
Here is a code snippet:
var headings = document.evaluate("//h1[contains(., 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
var thisHeading = headings.iterateNext();
console.log(thisHeading); // Prints the html element in consoleconsole.log(thisHeading.textContent); // prints the text content in console
thisHeading.innerHTML += "<br />Modified contents";
As you can see, I can grab the HTML element and modify it as I like.
The same question also has answers showing you can do the search in a mix of CSS selectors, pure (sometimes functional) JavaScript coding and some even regular expressions. These exactly show the reason why I opted for XPath: there the whole query is done in one language and is not spread over two.
More on non-XPath based solutions
Some examples (note that any thing selecting alldiv elements will usually be slow as the outer div will likely contain various inner div elements; see also my /html/body remark on XPath below):
Since you have asked it in javascript so you can have something like this
functioncontains(selector, text) {
var elements = document.querySelectorAll(selector);
returnArray.prototype.filter.call(elements, function(element){
returnRegExp(text).test(element.textContent);
});
}
And then call it like this
contains('div', 'sometext'); // find "div" that contain "sometext"contains('div', /^sometext/); // find "div" that start with "sometext"contains('div', /sometext$/i); // find "div" that end with "sometext", case-insensitive
This is incorrect because it also includes results for all child nodes. I.e. if child node of element will contain text – element will be included into contains result; which is wrong.
by anonymous:
You could use this pretty simple solution:
Array.from(document.querySelectorAll('div'))
.find(el => el.textContent === 'SomeText, text continues.');
Uses the ES6 spread operator to convert the NodeList of all divs to an array.
Provides output if the divcontains the query string, not just if it exactly equals the query string (which happens for some of the other answers). e.g. It should provide output not just for ‘SomeText’ but also for ‘SomeText, text continues’.
Outputs the entire div contents, not just the query string. e.g. For ‘SomeText, text continues’ it should output that whole string, not just ‘SomeText’.
Allows for multiple divs to contain the string, not just a single div.
[...document.querySelectorAll('div')] // get all the divs in an array
.map(div => div.innerHTML) // get their contents
.filter(txt => txt.includes('SomeText')) // keep only those containing the query
.forEach(txt =>console.log(txt)); // output the entire contents of those
<div>SomeText, text continues.</div><div>Not in this div.</div><div>Here is more SomeText.</div>
You best see if you have a parent element of the div you are querying. If so get the parent element and perform an element.querySelectorAll("div"). Once you get the nodeList apply a filter on it over the innerText property. Assume that a parent element of the div that we are querying has an id of container. You can normally access container directly from the id but let’s do it the proper way.
Spread syntax (...) allows an iterable such as an array expression or string to be expanded in places where zero or more arguments (for function calls) or elements (for array literals) are expected, or an object expression to be expanded in places where zero or more key-value pairs (for object literals) are expected.
JavaScript arrays are zero-indexed: the first element of an array is at index 0, the second is at index 1, and so on — and the last element is at the value of the array’s length property minus 1.
Information about what JavaScript technologies the browsers support. Note that although XPath is used by XSLT, it is only considered here if it can be accessed using JavaScript. External links lead to information about support in future versions of the browsers or extensions that provide such functionality, e.g., Babel.
These are supported values for the resultType parameter of the evaluate method:
Result Type
Value
Description
ANY_TYPE
0
Whatever type naturally results from the given expression.
NUMBER_TYPE
1
A result set containing a single number. Useful, for example, in an XPath expression using the count() function.
STRING_TYPE
2
A result set containing a single string.
BOOLEAN_TYPE
3
A result set containing a single boolean value. Useful, for example, an XPath expression using the not() function.
UNORDERED_NODE_ITERATOR_TYPE
4
A result set containing all the nodes matching the expression. The nodes in the result set are not necessarily in the same order they appear in the document.
ORDERED_NODE_ITERATOR_TYPE
5
A result set containing all the nodes matching the expression. The nodes in the result set are in the same order they appear in the document.
UNORDERED_NODE_SNAPSHOT_TYPE
6
A result set containing snapshots of all the nodes matching the expression. The nodes in the result set are not necessarily in the same order they appear in the document.
ORDERED_NODE_SNAPSHOT_TYPE
7
A result set containing snapshots of all the nodes matching the expression. The nodes in the result set are in the same order they appear in the document.
ANY_UNORDERED_NODE_TYPE
8
A result set containing any single node that matches the expression. The node is not necessarily the first node in the document that matches the expression.
FIRST_ORDERED_NODE_TYPE
9
A result set containing the first node in the document that matches the expression.
Results of NODE_ITERATOR types contain references to nodes in the document. Modifying a node will invalidate the iterator. After modifying a node, attempting to iterate through the results will result in an error.
Results of NODE_SNAPSHOT types are snapshots, which are essentially lists of matched nodes. You can make changes to the document by altering snapshot nodes. Modifying the document doesn’t invalidate the snapshot; however, if the document is changed, the snapshot may not correspond to the current state of the document, since nodes may have moved, been changed, added, or removed.
For read-only access to nodes, use NODE_ITERATOR (with XPathResult.snapshotItem() and XPathResult.snapshotLength).
For read-write access, use NODE_SNAPSHOT (with XPathResult.iterateNext() until that returns null)
The XPathResult interface represents the results generated by evaluating an XPath expression within the context of a given node.
Since XPath expressions can result in a variety of result types, this interface makes it possible to determine and handle the type and value of the result.
The drawback of XPath is similar to CSS selectors: yet another language to master. So for XPath, I wrote two helper functions separating the JavaScript framework bits and the XPath bits:
function getElementByXPath(xpath, parent) { // use when expecting zero or one result; default value for parent is document
result = document.evaluate(xpath, parent || document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
return result.singleNodeValue;
}
function getArchiveIsInputValue(tdText) {
input = getElementByXPath(`/html/body//td[text()="${tdText}"]/following-sibling::td/div/input`);
return input?.value;
}
First the getElementByXPath method. It calls document.evaluate to execute the xpath expression and get an XPathResult into the result variable. The parent parameter will be passed as document when null or undefined via the || trick. I have underlined the non-Boolean usage of the [Wayback/Archive] Logical OR (||) – JavaScript | MDN
The logical OR (||) operator (logical disjunction) for a set of operands is true if and only if one or more of its operands is true. It is typically used with boolean (logical) values. When it is, it returns a Boolean value. However, the || operator actually returns the value of one of the specified operands, so if this operator is used with non-Boolean values, it will return a non-Boolean value.
…
The following code shows examples of the || (logical OR) operator.
o1 = true || true // t || t returns true
o2 = false || true // f || t returns true
o3 = true || false // t || f returns true
o4 = false || (3 == 4) // f || f returns false
o5 = 'Cat' || 'Dog' // t || t returns "Cat"
o6 = false || 'Cat' // f || t returns "Cat"
o7 = 'Cat' || false // t || f returns "Cat"
o8 = '' || false // f || f returns false
o9 = false || '' // f || f returns ""o10 = false || varObject // f || object returns varObject
My go-to has always been the ternary operator for assigning a value to a variable conditionally. But ever since I discovered that || can be used as a selector operator, I’ve been using that more. I find my code so much easier to read 👍
Yes, it takes some time to wrap your head around it. But once you grasp the concept, it’s super handy. Now I don’t think less code makes your code better. But in this instance, I prefer the || operator 🤩
Passing XPathResult.FIRST_ORDERED_NODE_TYPE ensures that the result will the first matching node in the order of the matches in the browser DOM, which is the order you see in the HTML source code. Since XPathResult.FIRST_ORDERED_NODE_TYPE is passed, result will have a valid singleNodeValue field which is returned. It contains the actual HTML element we are after or null if the HTML element cannot be found.
The to the second bit: the XPath expression in the getArchiveIsInputValue method. It passes a string template `/html/body//td[text()="${tdText}"]/following-sibling::td/div/input` which has the /html/body bit to very much speedup XPath query processing (otherwise the whole document tree needs to be walked for td nodes, now only the ones in the <body> of the <html>). It also depends on the value of the tdText parameter. These are valid values for the tdText parameter and the resulting XPath expression that is being assembled:
This will first match <td> elements inside the /html/body path of the document that have the text() function (see below) match the passed value (either "Redirected from", "Via" or "Original").
From that match (since we passed XPathResult.FIRST_ORDERED_NODE_TYPE), it looks for the /following-sibling::td to get the first sibbling <td> element and inside that, it follows /div/input to get the <input> element we want.
following-sibling::chapter[position()=1] selects the next chapter sibling of the context node
the following-sibling axis contains all the following siblings of the context node; if the context node is an attribute node or namespace node, the following-sibling axis is empty
You see that the XPath selection language is big. It might still be bigger than CSS selectors. Even on MDN there is much information:
I have the luck of having prior knowledge of doing XPath related work with XML in the past. Many are no so lucky, so hence the documentation references I made here.
MDN is not even the standard, which is at [Wayback/Archive] XML Path Language (XPath). That is incomplete too, hence a truckload of books, videos and web-sites cover it. The MDN links above reference a few good ones. Enjoy them!
the meaning of the two predicates (everything between [ and ]) is different. [text()="Ask Question"] actually means: return true if any of the text nodes of an element contains exactly the text “Ask Question”. On the other hand, [.="Ask Question"] means: return true if the string value of an element is identical to “Ask Question”.
In the XPath model, text inside XML elements can be partitioned into a number of text nodes if other elements interfere with the text,
Note however, that if the document is mutated (the document tree is modified) between iterations that will invalidate the iteration and the invalidIteratorState property of XPathResult is set to true, and a NS_ERROR_DOM_INVALID_STATE_ERR exception is thrown.
function *$xIter(xp) {
const iter = document.evaluate(
xp, document, null,
XPathResult.ORDERED_NODE_ITERATOR_TYPE, null
);
for (;;) {
const node = iter.iterateNext();
if (!node) {
break;
}
yield node;
}
}
// dump to arrayconsole.log([...$xIter('//h2[contains(., "foo")]')]);
// return next item from generatorconst xpGen = $xIter('//h2[text()="foo"]');
console.log(xpGen.next().value);
<h2>foo</h2><h2>foobar</h2><h2>bar</h2>
Back to the code (:
Based on the above two helper methods, I made these three bookmarklets:
Redirected from field:
javascript:{
functiongetElementByXPath(xpath, parent) { // use when expecting zero or one result; default value for parent is document
result = document.evaluate(xpath, parent || document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
return result.singleNodeValue;
}
functiongetArchiveIsInputValue(tdText) {
input = getElementByXPath(`/html/body//td[text()="${tdText}"]/following-sibling::td/div/input`);
return input?.value;
}
open(getArchiveIsInputValue("Redirected from"))
}
Via field:
javascript:{
functiongetElementByXPath(xpath, parent) { // use when expecting zero or one result; default value for parent is document
result = document.evaluate(xpath, parent || document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
return result.singleNodeValue;
}
functiongetArchiveIsInputValue(tdText) {
input = getElementByXPath(`/html/body//td[text()="${tdText}"]/following-sibling::td/div/input`);
return input?.value;
}
open(getArchiveIsInputValue("Via"))
}
Original field:
javascript:{
functiongetElementByXPath(xpath, parent) { // use when expecting zero or one result; default value for parent is document
result = document.evaluate(xpath, parent || document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
return result.singleNodeValue;
}
functiongetArchiveIsInputValue(tdText) {
input = getElementByXPath(`/html/body//td[text()="${tdText}"]/following-sibling::td/div/input`);
return input?.value;
}
open(getArchiveIsInputValue("Original"))
}
This was a lot more work than the CSS selector based code!
Here too, the mapping between HTML elements and the DOM interface is crucial. A few more links here:
var xpathResult = document.evaluate(
xpathExpression,
contextNode,
namespaceResolver,
resultType,
result
);
Where:
xpathExpression: The string representing the XPath to be evaluated.
contextNode: Specifies the context node for the query. Common practice is to pass document as the context node.
namespaceResolver: The function that will be passed any namespace prefixes and should return a string representing the namespace URI associated with that prefix. It will be used to resolve prefixes within the XPath itself, so that they can be matched with the document. null is common for HTML documents or when no namespace prefixes are used.
resultType: An integer that corresponds to the type of result XPathResult to return using named constant properties, such as XPathResult.ANY_TYPE, of the XPathResult constructor, which correspond to integers from 0 to 9.
result: An existing XPathResult to use for the results. null is the most common and will create a new XPathResult
Demonstration
As an example the Search Box within the Google Home Page which can be identified uniquely using the xpath as //*[@name='q'] can also be identified using the google-chrome-devtoolsConsole by the following command:
$x("//*[@name='q']")
Snapshot:
The same element can can also be identified using document.evaluate() and the xpath expression as follows:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Leave a comment