The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,854 other subscribers

Only available on Window, but sometimes useful, the Excel FILTERXML function

Posted by jpluimers on 2026/04/09

Some links on the [Wayback/Archive] FILTERXML function – Microsoft Support.

It is only available on Windows (because of the underlying XPath libraries used, I think it is MSXML), and “only” as of Excel 2013, but still can be useful.

Some links below on FILTERXML and related XPath information so I can more easily find their content back.

Notes:

  • FILTERXML only supports XPath 1.0
  • The quotes are huge, for one because I don’t use Excel enough to be an expert, but have enough software experience to sometimes want to use complex concepts in Excel. Having all this in one place helps me with that goal.
  • You need to ensure your data is either XML in a well-formed document format, or you can translate your data to well-formed XML.

The links and quotes starting with the question that sparked my interest:

[Wayback/Archive] How to extract the last part of the string in Excel after the last underscore – Stack Overflow

(thanks [Wayback/Archive] cyborg and [Wayback/Archive] JvdV)

Q

I have the following example data:
1. animated_brand_300x250
2. animated_brand_300x600
3. customaffin_greenliving_solarhome_anim_outage_offer
How to extract the string from the last underscore in Microsoft Excel?

A

Some other options could be:
=TRIM(RIGHT(SUBSTITUTE(A1,"_",REPT(" ",100)),100))
Or using FILTERXML, being theoretically a better option:
=FILTERXML("<t><s>"&SUBSTITUTE(A1,"_","</s><s>")&"</s></t>","//s[last()]")
An ExcelO365 exclusive method could even be to use XLOOKUP:
=REPLACE(A1,1,XLOOKUP("_",MID(A1,SEQUENCE(LEN(A1)),1),SEQUENCE(LEN(A1)),,0,-1),"")
Or:
=RIGHT(A1,MATCH("_",MID(A1,SEQUENCE(LEN(A1),,LEN(A1),-1),1),0)-1)

[Wayback/Archive] FILTERXML function – Microsoft Support

The FILTERXML function returns specific data from XML content by using the specified xpath.

Notes: 

  • The FILTERXML function is not available in Excel for the web and Excel for Mac.
  • This function may appear in the function gallery in Excel for Mac, but it relies on features of the Windows operating system, so it will not return results on Mac.

  1. You can also use FILTERXML in conjunction with the ENCODEURL and WEBSERVICE functions. For example, you can use the markitondemand.com api service to return information about a company stock:=WEBSERVICE(“http://dev.markitondemand.com/MODApis/Api/Quote/xml?symbol=”&ENCODEURL(C2))Where you have a stock ticker in cell C2.You can then use the FILTERXML function to extract specific data about that stock. For example, to get the Last Price of the stock ticker in cell C2, you can use:=FILTERXML(WEBSERVICE(“http://dev.markitondemand.com/MODApis/Api/Quote/xml?symbol=”&ENCODEURL(C2)),”//QuoteApiModel/Data/LastPrice”)

Referenced links:

[Wayback/Archive] arrays – Excel – Extract substring(s) from string using FILTERXML – Stack Overflow

(thanks [Wayback/Archive] JvdV)

Q

Background

Lately I’ve been trying to get more familiar with the concept of changing a delimited string into an XML to parse with Excel’s FILTERXML and retrieve those substrings that are of interest. Please note that this function came available from Excel 2013 and is not available on Excel for Mac nor Excel Online.

With a delimited string, I meant anything from a normal sentence using spaces as delimiters or any other combination of characters that could be used to define substrings within a string. For example let’s imagine the following:

ABC|123|DEF|456|XY-1A|ZY-2F|XY-3F|XY-4f|xyz|123

Question

So, where a lot of people know how to get the nth element (e.g.: =TRIM(MID(SUBSTITUTE(A1,"|",REPT(" ",LEN(A1))),3*LEN(A1)+1,LEN(A1))) to retrieve 456). Or other combinationes with LEN()MID()FIND() and all those constructs, how do we use FILTERXML to use more specific criteria to extract substrings of concern and clean up the full string? For example, how to retrieve:

  • elements by position
  • numeric or non-numeric elements
  • elements that contain a substring on their own
  • elements that start or end with a substring
  • elements that are upper- or lowercase
  • elements holding numbers
  • unique values

A

Excel’s FILTERXML uses XPATH 1.0 which unfortunately means it is not as diverse as we would maybe want it to be. Also, Excel seems to not allow returning reworked node values and exclusively allows you to select nodes in order of appearance. However there is a fair share of functions we can still utilize. More information about that can be found here.

The function takes two parameters: =FILTERXML(<A string in valid XML format>,<A string in valid XPATH format>)

Let’s say cell A1 holds the string: ABC|123|DEF|456|XY-1A|ZY-2F|XY-3F|XY-4f|xyz|123. To create a valid XML string we use SUBSTITUTE to change the delimiter to valid end- and start-tag constructs. So to get a valid XML construct for the given example we could do:

"<t><s>"&SUBSTITUTE(A1,"|","</s><s>")&"</s></t>"

For readability reasons I’ll refer to the above construct with the word <XML> as a placeholder. Below you’ll find different usefull XPATH functions in a valid construct to filter nodes:


1) All Elements:

=FILTERXML(<XML>,"//s")

Returns: ABC123DEF456XY-1AZY-2FXY-3FXY-4fxyz and 123 (all nodes)


2) Elements by position:

=FILTERXML(<XML>,"//s[position()=4]")

Or:

=FILTERXML(<XML>,"//s[4]")

Returns: 456 (node on index 4)

=FILTERXML(<XML>,"//s[position()<4]")

Returns: ABC123 and DEF (nodes on index < 4)

=FILTERXML(<XML>,"//s[position()=2 or position()>5]")

Returns: 123ZY-2FXY-3FXY-4fxyz and 123 (nodes on index 2 or > 5)

=FILTERXML(<XML>,"//s[last()]")

Returns: 123 (node on last index)

=FILTERXML(<XML>,"//s[position() mod 2 = 1]")

Returns: ABCDEFXY-1AXY-3F and xyz (odd nodes)

=FILTERXML(<XML>,"//s[position() mod 2 = 0]")

Returns: 123456ZF-2FXY-4f and 123 (even nodes)


3) (Non) numeric elements:

=FILTERXML(<XML>,"//s[number()=.]")

Or:

=FILTERXML(<XML>,"//s[.*0=0]")

Returns: 123456, and 123 (numeric nodes)

=FILTERXML(<XML>,"//s[not(number()=.)]")

Or:

=FILTERXML(<XML>,"//s[.*0!=0)]")

Returns: ABCDEFXY-1AZY-2FXY-3FXY-4f and xyz (non-numeric nodes)


4) Elements that (not) contain:

=FILTERXML(<XML>,"//s[contains(., 'Y')]")

Returns: XY-1AZY-2FXY-3F and XY-4f (containing ‘Y’, notice XPATH is case sensitive, exclusing xyz)

=FILTERXML(<XML>,"//s[not(contains(., 'Y'))]")

Returns: ABC123DEF456xyz and 123 (not containing ‘Y’, notice XPATH is case sensitive, including xyz)


5) Elements that (not) start or/and end with:

=FILTERXML(<XML>,"//s[starts-with(., 'XY')]")

Returns: XY-1AXY-3F and XY-4f (starting with ‘XY’)

=FILTERXML(<XML>,"//s[not(starts-with(., 'XY'))]")

Returns: ABC123DEF456ZY-2Fxyz and 123 (don’t start with ‘XY’)

=FILTERXML(<XML>,"//s[substring(., string-length(.) - string-length('F') +1) = 'F']")

Returns: DEFZY-2F and XY-3F (end with ‘F’, notice XPATH 1.0 does not support ends-with)

=FILTERXML(<XML>,"//s[not(substring(., string-length(.) - string-length('F') +1) = 'F')]")

Returns: ABC123456XY-1AXY-4fxyz and 123 (don’t end with ‘F’)

=FILTERXML(<XML>,"//s[starts-with(., 'X') and substring(., string-length(.) - string-length('A') +1) = 'A']")

Returns: XY-1A (start with ‘X’ and end with ‘A’)


6) Elements that are upper- or lowercase:

=FILTERXML(<XML>,"//s[translate(.,'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ')=.]")

Returns: ABC123DEF456XY-1AZY-2FXY-3F and 123 (uppercase nodes)

=FILTERXML(<XML>,"//s[translate(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')=.]")

Returns: 123456xyz and 123 (lowercase nodes)

NOTE: Unfortunately XPATH 1.0 does not support upper-case() nor lower-case() so the above is a workaround. Add special characters if need be.


7) Elements that (not) contain any number:

=FILTERXML(<XML>,"//s[translate(.,'1234567890','')!=.]")

Returns: 123456XY-1AZY-2FXY-3FXY-4f and 123 (contain any digit)

=FILTERXML(<XML>,"//s[translate(.,'1234567890','')=.]")

Returns: ABCDEF and xyz (don’t contain any digit)

=FILTERXML(<XML>,"//s[translate(.,'1234567890','')!=. and .*0!=0]")

Returns: XY-1AZY-2FXY-3F and XY-4f (holding digits but not a a number on it’s own)


8) Unique elements or duplicates:

=FILTERXML(<XML>,"//s[preceding::*=.]")

Returns: 123 (duplicate nodes)

=FILTERXML(<XML>,"//s[not(preceding::*=.)]")

Returns: ABC123DEF456XY-1AZY-2FXY-3FXY-4f and xyz (unique nodes)

=FILTERXML(<XML>,"//s[not(following::*=. or preceding::*=.)]")

Returns: ABCDEF456XY-1AZY-2FXY-3F and XY-4f (nodes that have no similar sibling)


9) Elements of certain length:

=FILTERXML(<XML>,"//s[string-length()=5]")

Returns: XY-1AZY-2FXY-3F and XY-4f (5 characters long)

=FILTERXML(<XML>,"//s[string-length()<4]")

Returns: ABC123DEF456xyz and 123 (shorter than 4 characters)


10) Elements based on preceding/following:

=FILTERXML(<XML>,"//s[preceding::*[1]='456']")

Returns: XY-1A (previous node equals ‘456’)

=FILTERXML(<XML>,"//s[starts-with(preceding::*[1],'XY')]")

Returns: ZY-2FXY-4f, and xyz (previous node starts with ‘XY’)

=FILTERXML(<XML>,"//s[following::*[1]='123']")

Returns: ABC, and xyz (following node equals ‘123’)

=FILTERXML(<XML>,"//s[contains(following::*[1],'1')]")

Returns: ABC456, and xyz (following node contains ‘1’)

=FILTERXML(<XML>,"//s[preceding::*='ABC' and following::*='XY-3F']")

Or:

=FILTERXML(<XML>,"//s[.='ABC']/following::s[following::s='XY-3F']")    

Returns: 123DEF456XY-1A and ZY-2F (everything between ‘ABC’ and ‘XY-3F’)


11) Elements based on sub-strings:

=FILTERXML(<XML>,"//s[substring-after(., '-') = '3F']")

Returns: XY-3F (nodes ending with ‘3F’ after hyphen)

=FILTERXML(<XML>,"//s[contains(substring-after(., '-') , 'F')]")

Returns: ZY-2F and XY-3F (nodes containing ‘F’ after hyphen)

=FILTERXML(<XML>,"//s[substring-before(., '-') = 'ZY']")

Returns: ZY-2F (nodes starting with ‘ZY’ before hyphen)

=FILTERXML(<XML>,"//s[contains(substring-before(., '-'), 'Y')]")

Returns: XY-1AZY-2FXY-3F and XY-4f (nodes containing ‘Y’ before hyphen)


12) Elements based on concatenation:

=FILTERXML(<XML>,"//s[concat(., '|', following::*[1])='ZY-2F|XY-3F']")

Returns: ZY-2F (nodes when concatenated with ‘|’ and following sibling equals ‘ZY-2F|XY-3F’)

=FILTERXML(<XML>,"//s[contains(concat(., preceding::*[2]), 'FA')]")

Returns: DEF (nodes when concatenated with sibling two indices to the left contains ‘FA’)


13) Empty vs. Non-empty:

=FILTERXML(<XML>,"//s[count(node())>0]")

Or:

=FILTERXML(<XML>,"//s[node()]")

Returns: ABC123DEF456XY-1AZY-2FXY-3FXY-4fxyz and 123 (all nodes that are not empty)

=FILTERXML(<XML>,"//s[count(node())=0]")

Or:

=FILTERXML(<XML>,"//s[not(node())]")

Returns: None (all nodes that are empty)


14) Preceding or Following:

=FILTERXML(<XML>,"//s[substring(., string-length(.) - string-length('F') +1) = 'F'][last()]/following::*")

Returns: XY-4fxyz and 123 (all nodes to the right of the last node that ends with an uppercase ‘F’)

=FILTERXML(<XML>,"//s[substring(., string-length(.) - string-length('F') +1) = 'F'][1]/preceding::*")

Returns: ABC and 123 (all nodes to the left of the first node that ends with an uppercase ‘F’)


15) (Preceding or Following) and self:

=FILTERXML(<XML>,"(//s[.*0!=0][last()]|//s[.*0!=0][last()]/preceding::*)")

Returns: ABC123DEF456XY-1AZY-2FXY-3FXY-4f and xyz (trim all numeric nodes from the right)††

=FILTERXML(<XML>,"(//s[.*0=0][1]|//s[.*0=0][1]/following::*)")

Returns: 123DEF456XY-1AZY-2FXY-3FXY-4fxyz and 123 (trim all non-numeric nodes from the left)


16) Maximum or Minimum:

=FILTERXML(<XML>,"(//s[.*0=0][not(.<//s[.*0=0])])[1]")

Returns: 456 (The maximum value looking at numeric nodes)

=FILTERXML(<XML>,"(//s[.*0=0][not(.>//s[.*0=0])])[1]")

Returns: 123 (The minimum value looking at numeric nodes)

NOTE: This is the equivalent to returning all numeric nodes as per #3 and post-process the array using Excel’s MIN() and MAX() functions.


Now obviously the above is a demonstration of possibilities with XPATH 1.0 functions and you can get a whole range of combinations of the above and more! I tried to cover most commonly used string functions. If you are missing any please feel free to comment.

Whereas the question is quite broad on itself, I was hoping to give some general direction on how to use FILTERXML for the queries at hand. The formula returns an array of nodes to be used in any other way. A lot of the times I would use it in TEXTJOIN() or INDEX(). But I guess other options would be new DA-functions to spill results.

Be alert that while parsing a string through FILTERXML(), the ampersand character (&) and the left angle bracket (<) must not appear in their literal form. They will respectively need to be substituted with either &amp; or &lt;. Another option would be to use their numeric ISO/IEC 10646 character code being & or < respectively. After parsing, the function will return these characters back to you in their literal form. Needless to say that splitting a string by the semi-colon therefor just became tricky.


 Each predicate, the structure between the opening and closing square brackets, is a filter of a given nodelist. To write multiple of these structures is in fact anding such predicates.

‡‡ There isn’t really an easy, following/preceding siblings and self, construct. Therefor I used the Union operator. This, however, requires multiple expressions to be inside paranthesis. Much like alternations within a capture group if one would think about regular expressions.

Referenced links from this Q&A:

[Wayback/Archive] html – XPath: What do nested square brackets mean? – Stack Overflow

(thanks [Wayback/Archive] blablaalb and [Wayback/Archive] kjhughes)

Q

I’m learning XPath for web scraping and stumbled across these two XPath examples:
//div[@class="head"][@id="top"]
and
//div[@class='canvas- graph']//a[@href='/accounting.html'][i[@class='icon-usd']]/following-sibling::h4
I wonder what does the div[@class="head"][@id="top"] mean. Does it mean that the @id=top property belongs to the div element? Is it the same as //div[@class="head" and @id="top"]?
And what does it mean when square brackets are nested inside another as in the second example? What would the HTML DOM look like for the second xpath expression to match it?

A

Square brackets delimit predicates, and predicates filter items††.

You anticipate two ways in which predicates can be combined:

  1. Consecutively: Yes, this is equivalent to logically anding the predicates. So, correct, //div[@class="head"][@id="top"] is equivalent to //div[@class="head" and @id="top"].
  2. Recursively: Yes, XPath allows predicates within predicates (nesting, as you observe).So, a[@href='/accounting.html'][i[@class='icon-usd']] filters those a elements with an @href attribute value equal to '/accounting.html' and a child i element with a @class attribute value equal to 'icon-usd'.

Together these composition mechanisms provide a powerful means of building predicates out of more basic conditions.


 Predicate references: XPath 1.0XPath 3.1.
†† Node-sets in XPath 1.0; sequences in XPath 2.0+.

Referenced links in the answer:

[Wayback/Archive] .net – xpath: preceding-sibling-or-self – Stack Overflow

(thanks [Wayback/Archive] toddmo)

Q

I’d like preceding-sibling-or-self functionality in my xpath and I’m using ../(self::*|preceding-sibling::*) and it’s giving me Expression must evaluate to a node-set., so I must be doing it wrong.

This bombs no matter what context node you are on or what the xml looks like. I want to use this functionality after a / so I need the right syntax for that.

The xpath expression (self::*|preceding-sibling::*) does not give an error, so it has to do with the place in the xpath which I’m trying to use it.

EDIT:

My mistake is actually more basic. You can’t even do ../(.|.) without that error. In general, I want to go to a node, then look through a set of nodes from there. The error seems to correlate with trying to use the union operator | after a slash /.

A

In XPath, if you want to use the union operator, you must use it at the beginning of the path.

(self::*|preceding-sibling::*)/Child is ok.

If you need to do Something/(self::*|preceding-sibling::*)/Child, you have to expand the left part out like this:

(Something/self::*|Something/preceding-sibling::*)/Child

[Wayback/Archive] excel – Divide numbers into unique sorted digits displayed in a label on a userform – Stack Overflow

(thanks [Wayback/Archive] JLuc01 and [Wayback/Archive] T.M.)

Q

I want to divide numbers into unique sorted digits. For example, the number can be 127425 and I would like 12457 as the result, meaning sorted and duplicate removed. I think the best is to explain with example:

+---------+--------+
| Number  | Result |
+---------+--------+
| 127425  | 12457  |
+---------+--------+
| 2784425 | 24578  |
+---------+--------+
| 121     | 12     |
+---------+--------+
| 22222   | 2      |
+---------+--------+
| 9271    | 1279   |
+---------+--------+

The longest result can be only 123456789.

I don’t think we need an array for that (no delimiter), but the use of substring could probably do the job. I just don’t know where to begin, hence no code.

Any ideas are welcome. Thanks.

A

Alternative to the newer dynamic array functions

Loving the above nice solutions it’s always a challenge to think over additional approaches (via Byte array, Filter() and FilterXML() function):

Function UniqueDigits(ByVal txt) As String
    Dim by() As Byte: by = txt
    Dim digits: digits = Array(49, 50, 51, 52, 53, 54, 55, 56, 57, 48, 0) ' equalling Asc values
'a) create 1-based 1-dim array with digit positions
    Dim tmp: tmp = Filter(Application.Match(by, digits, 0), 11, False)
'b) get uniques
    tmp = Uniques(tmp)
'c) sort it (don't execute before getting uniques)
    BubbleSort tmp
'd) return function result
    UniqueDigits = Join(tmp, "")
End Function
Function Uniques(arr)
'Note: using FilterXML() available since vers. 2013+
    Dim content As String       ' replacing "10" referring to zero indexed as 10th digit
    content = Replace("<t><s>" & Join(arr, "</s><s>") & "</s></t>", "10", "0")
    arr = WorksheetFunction.FilterXML(content, "//s[not(preceding::*=.)]")
    Uniques = Application.Transpose(arr)
End Function
Sub BubbleSort(arr)
    Dim cnt As Long, nxt As Long, temp
    For cnt = LBound(arr) To UBound(arr) - 1
        For nxt = cnt + 1 To UBound(arr)
            If arr(cnt) > arr(nxt) Then
                temp = arr(cnt)
                arr(cnt) = arr(nxt)
                arr(nxt) = temp
            End If
        Next nxt
    Next cnt
End Sub

Further hints :-) tl;dr

…explaining

a) how to transform a string to a digits array, b) how to get uniques via FilterXML instead of a dictionary c) (executing BubbleSort needs no further explanation).

ad a) the tricky way to get a pure digits array Transforming a string of digits into an array of single characters may need some explanation.

  1. A string (here txt) can assigned easily to a byte array via Dim by() As Byte: by = txt. (Note that classical characters would be represented in a byte array by a pair of Asc values, where the second value mostly is 0; so digit 1 is represented by 49 and 0, 2 by 50 and 0 up to 9 by 57 and 0).

Digits are defined in a 1-based Asc value array from 1~>49 to 9~>57, followed by the 10th item 0~>48 and eventually the Asc value 0 as 11th item related to each second byte pair.

Dim digits: digits = Array(49, 50, 51, 52, 53, 54, 55, 56, 57, 48, 0) ' equalling Asc values
  1. Usually the Match() function searches for a specified item in order to get its relative position within an array (here digits) and would be executed by the following syntax: “.

MATCH(lookup_value, lookup_array, [match_type]) where the lookup_value argument can be a value (number, text, or logical value) or a cell reference to a number, text, or logical value.

An undocumented feature is that instead searching e.g. 2 in the lookup_array digits via Application.Match(2, digits,0) you can use the byte array as first argument serving as 1-based array pattern where VBA replaces the current Asc values by their position found within the digits array.

Application.Match(by, digits, 0)

Finally a negative filtering removes the companion Asc 0 values (11 plus argument False) via

Dim tmp: tmp = Filter(Application.Match(by, digits, 0), 11, False)

ad b) get uniques via FilterXML

Help reference for the WorksheetFunction.FilterXML method demands two string parameters

FilterXML(XMLContentString, XPathQueryString)

The first argument doesn’t reference a file, but needs a valid (“wellformed”) XML text string starting with a root node (DocumentElement) which is roughly comparable to a html structure starting with the enclosing pair of <html>...</html> tags.

So a wellformed content string representing e.g. number 121 could be:

<t>
    <s>1</s>
    <s>2</s>
    <s>1</s>
</t>

The second argument (limited to 1024 characters) must be a valid XPath query string like the following find non-duplicates

"//s[not(preceding::*=.)]"

where the double slash // allows to find s nodes at any hierarchy level and under the condition that it is not preceded by any nodes * with the same value content =.

Recommended readings

@RonRosenfeld is a pioneer author of numerous answers covering the FilterXML method, such as e.g. Split string cell….

@JvDV wrote a nearly encyclopaedic overview at Extract substrings from string using FilterXML.

Note the last referenced answer got deleted because the question to it got deleted, so I included it as the bottom most question below.

Referenced in this Q&A:

[Wayback/Archive] Split a string (cell) in Excel without VBA (e.g. for array formula) – Stack Overflow

(thanks [Wayback/Archive] terry87, [Wayback/Archive] Vafā Sarmast and [Wayback/Archive] Ron Rosenfeld)

Q

I have data in an Excel spreadsheet with delimited strings. For simplicity, think of something like "4#8#10", with an arbitrary number of pieces.

Is there any way to split this into an array for use with an array formula? For example, something like SUM(SplitStr(A1,"#")). (This does not work – it just returns "4".)

Is there a good way of approaching this? My ideal solution would end up with an array, as if the user entered {4,8,10} for use in array formulas.

The application of my sheet does not allow for delimiter-based text to columns and summing that. I would prefer to not use VBA if possible as I will not be the primary consumer of the sheet.

A

Use XML functionality:

={SUM(FILTERXML("<t><s>" & SUBSTITUTE(A1, "#", "</s><s>") & "</s></t>", "//s"))}

Update April 2022: Use new TEXTSPLIT function.

A

To sum the entries 4,8,10 you could use something like:

=SUMPRODUCT(1*TRIM(MID(SUBSTITUTE(A1,"#",REPT(" ",99)),(ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1))-1)*99+((ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)))=1),99)))

The array that is returned is a text array, so the 1* at the beginning is one way to convert them to numbers

This part:

TRIM(MID(SUBSTITUTE(A1,"#",REPT(" ",99)),(ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1))-1)*99+((ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)))=1),99))

returns the array:

{"4";"8";"10"}

And with 1* before it:

{4;8;10}

Edit After six years, more succinct formulas are available to create the array:

With Excel O365 and the SEQUENCE function:

=1*(TRIM(MID(SUBSTITUTE(A1,"#",REPT(" ",99)),IF(SEQUENCE(LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)=1,1,(SEQUENCE(LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)-1)*99),99)))

With Excel 2010+ for Windows (not MAC versions) with the FILTERXML funcdtion:

=FILTERXML("<t><s>" & SUBSTITUTE(A1,"#","</s><s>") & "</s></t>","//s")

Note that the FILTERXML function extracts the values as numbers, whereas the first formula extracts the numbers as text which must be converted to numbers before being used in a numeric formula.

Referenced from this Q&A:

[Wayback/Archive] excel – How to use a string from a single cell as an array in a function without VBA – Stack Overflow

(thanks [Wayback/Archive] Matthias Tidlund and [Wayback/Archive] Ron Rosenfeld)

Q

How do I convert a string from a cell to a text array inside of a function, that should use the array, without using VBA and without adding the array into any other part of the document? It will be one of these arrays on more than 1000 rows. The string format is ^[a-zA-Z0-9.,-]*$ with “,” as delimiter.

This is the functionality I would like to achieve

I have an excel table with the following columns
A: ID numbers to compare, separated by comma (delimiter can be changed if needed). About 100 ID’s would be good to support at least.
B: ID (Each value on the rows in the column are unique but not sorted and can’t be sorted because sorting is needed based on other criterias)
C: Value (Several rows in the column can have the same value)
D: Output the one ID of the comma separated ID’s that has the highest value on its row

The problem part of the output
So far I have made a function which find the correct ID in column B based on the values in column C but only if I enter the string from column A as an array constant manually within the function. I have not managed to get the function to create the array itself from column A, which is required.

Working part of the code

In this code I have entered the values from column A manually and this is working as it should.

=INDEX({"1.01-1","1.01-3","1.08-1","1.01-1-1A"},MATCH(MAX(INDEX(C$10:C$20,N(IF(1,MATCH({"1.01-1","1.01-3","1.08-1","1.01-1-1A"},B$10:B$20,0))))),INDEX(C$10:C$20,N(IF(1,MATCH({"1.01-1","1.01-3","1.08-1","1.01-1-1A"},B$10:B$20,0)))),0))

Note that the start row is not the first row and the array is used 3 times in the function.

Code to try to convert the string to a text array

Not working but if wrapped in SUMPRODUCT() it provide an array to the SUMPRODUCT() function, of course not usable since I then can’t pass on the array. The background to this code can be found in question Split a string (cell) in Excel without VBA (e.g. for array formula)!.

=TRIM(MID(SUBSTITUTE(A10,",",REPT(" ",99)),(ROW(OFFSET($A$1,,,LEN(A10)-LEN(SUBSTITUTE(A10,",",""))+1))-1)*99+((ROW(OFFSET($A$1,,,LEN(A10)-LEN(SUBSTITUTE(A10,",",""))+1)))=1),99))

The second code output the first item of the array and inserted in the first code do not change this result as it did when wrapping the second code in SUMPRODUCT().

Here is a picture of my simplified test setup in Excel for this case, identical to what is described above.

Simplified test setup

A

I’m not really sure what you are doing with your formula. But to convert contents of a cell to a comma separated text array to be used as the array argument to the INDEX or MATCH functions, you can use the FILTERXML function. You’ll need to educate yourself about XML and XPATH to understand what’s going on, but there are plenty of web resource for this.

For example, with

A10:  "1.01-1","1.01-3","1.08-1","1.01-1-1A"

The formula below will return 1.08-1. Note the 3 for the row argument to the INDEX function.

=INDEX(FILTERXML("<t><s>" & SUBSTITUTE(SUBSTITUTE(A10,"""",""), ",", "</s><s>") & "</s></t>", "//s"),3)

C

It worked as it should to insert the FILTERXML function and enter the function as an array function with CTRL+SHIFT+ENTER. The resulting function then became "{=IFERROR(INDEX(FILTERXML("<t><s>" & SUBSTITUTE(SUBSTITUTE(A10,"""",""), ",", "</s><s>") & "</s></t>", "//s"),MATCH(MAX(INDEX(C$10:C$20,N(IF(1,MATCH(FILTERXML("<t><s>" & SUBSTITUTE(SUBSTITUTE(A10,"""",""), ",", "</s><s>") & "</s></t>", "//s"),B$10:B$20,0))))),INDEX(C$10:C$20,N(IF(1,MATCH(FILTERXML("<t><s>" & SUBSTITUTE(SUBSTITUTE(A10,"""",""), ",", "</s><s>") & "</s></t>", "//s"),B$10:B$20,0)))),0)),"")}"

This is the “Simplified test setup”:

 

 

 

 

 

 

[Wayback/Archive] XPath Operators

 

 

 

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

 

 

 

 

[Wayback/Archive] excel – How to find an element in an array through a specific character in the element? – Stack Overflow

This shows FilterXML is accessible from VBA as well. Slim chance I will ever use it (because of the abuse of VBA by hostile actors), but just in case…

(thanks [Wayback/Archive] nufabiaz and [Wayback/Archive] Demi Dee)

Q

I have list of elements in array. For example:
arr = ("numbers", "matriculation", "123", "444.909", "465", "897.907")
From the array, I would like to find the elements (index/position) that contains “.90” . When I tried, it returned empty/nonexistant.
From what I found from other websites, you can only find specific elements when it is exactly the same. For example:
arr = ("numbers", "matriculation", "123", "444.909", "465", "897.907", "123")
If  arr("123") = "123" 
strMsg = "True"

A

In order to complete @PEH ‘s valid solutions an approach via FilterXML() (available since vers. 2013); by default it returns a vertical array (procedure chkResults provides for special cases of non-findings or single values) with the additional benefit that it recognizes numeric values as such .

Note that the results returned by the following Filt() function always return a 2-dim array. An example call is listed eventually.

Function Filt(flatArray, ByVal srch As String)
'a) define wellformed xml string
    Dim xml As String
    xml = "<ii><i>" & Join(flatArray, "</i><i>") & "</i></ii>"
'b) define XPath search string
    Dim xPth As String
    xPth = "//i[contains(.,'" & srch & "')]"
'c) filter xml and return findings
    Filt = Application.FilterXML(xml, xPth)
    chkResults Filt ' check for special cases
End Function

Help procedure chkResults

Sub chkResults(ByRef FilterXMLResults)
'Purp:  provide for special cases zero or 1 finding
'Note:  by default FilterXML returns a 2-dim array for more findings,
'       otherwise results have to be redimmed
    Select Case VarType(FilterXMLResults)
        Case Is >= vbArray: Exit Sub    ' 2 or more findings
        Case vbError                    ' 0 findings
            ReDim xtra(0 To 0, 1 To 1)
        Case Else                       ' 1 finding only
            ReDim xtra(1 To 1, 1 To 1)
            xtra(LBound(xtra), 1) = FilterXMLResults
    End Select
    FilterXMLResults = xtra ' return special cases as 2-dim array, too
End Sub

Example call

Sub testFilt()
Const srch As String = ".90"     ' ".90x"
'1. define data
    Dim data() As Variant
    data = Array("numbers", "matriculation", "123", "444.909", "465", "897.907")
'2. get results
    Dim results As Variant
    results = Filt(data, srch)      ' << help function Filt() via FilterXML
'3. show results
    Debug.Print UBound(results) & " item(s) found:"
    Dim i As Long
    For i = LBound(results) To UBound(results)
        Debug.Print i, results(i, 1)
    Next i
End Sub

Example results in VB Editor’s immediate window

'2 item(s) found:
1             444.909
2             897.907

Extended version //Edit as of 2022-09-27

You might want to apply Filt() similar to VBA’s Filter() function (VBA.Strings.Filter) to include the possibilities of

  • negative filtering via additional boolean argument Include:=False as well as
  • case insensitive filtering via boolean argument TxtComp:=True

Caveat using FilterXML

Note that the XPath search expressions seem to require the xmldom functions themselves – like translate()etc. – to be written in lower casesso a FilterXML xpath starting with e.g. //i[contains(Translate(., ...),'xyz')] would fail.

Function Filt(flatArray, ByVal srch As String, _
    Optional Include As Boolean = True, _
    Optional TxtComp As Boolean = False)
'a) define wellformed xml string
    Dim xml As String
    xml = "<ii><i>" & Join(flatArray, "</i><i>") & "</i></ii>"
'b) define XPath search string
    Dim xPth As String
    If TxtComp Then srch = LCase(srch)
    If Include Then
        xPth = "//i[contains(.,'" & srch & "')]"
    Else
        xPth = "//i[not(contains(.,'" & srch & "'))]"
    End If
    If TxtComp Then ' extend above XPath by xmldom translate function
        Dim ABC As String: ABC = UCase(srch)
        Dim translate As String
        translate = "translate(.,'" & ABC & "','" & LCase(ABC) & "')"
        xPth = Replace(xPth, "(.,", "(" & translate & ",")
    End If
'c) filter xml and return findings
    Filt = Application.FilterXML(xml, xPth)
    chkResults Filt ' check for special cases
End Function

Related helpful links

Referenced from the answer:

[Wayback/Archive] excel – Extract sentence containing specific word – Stack Overflow

(thanks [Wayback/Archive] Demi Dee, [Wayback/Archive] Harun24hr  and [Wayback/Archive] T.M.)

Q

I’ve seen a variation of this question being asked; however, I cannot find my exact problem. I am trying to extract every sentence that contains a specific word and paste the sentence to the column on the right of A1. In the example below, the key word is cold.

Example

Column A1 – (What I have):
It is very cold outside. I want to go skiing. I love a cold vacation. I love the snow.

Column A2 – (what I want to see):
It is very cold outside. I love a cold vacation.

Can anyone assist? VBA appears to be best. I also wouldn’t mind typing in my keyword in a cell and have a VBA code that extracts every sentence containing the keyword. But anything helps!

A

FILTERXML() will assist you. Try-

=TEXTJOIN(". ",TRUE,FILTERXML("<t><s>"&SUBSTITUTE(A1,".","</s><s>")&"</s></t>","//s[contains(., 'cold')]"))

More about FILTERXML() here by JvdV.

A

Allow case insensitive search via FilterXML

Just as helpful extension to Harun24HR ‘s FilterXML() solution, you may use the XMLDom function Translate() within the XPath expression to define a node value output as lower (or upper) case.

a) To find not only the lower-cased string cold, but also Cold (Camel-cased) it would suffice to include a translation pattern “change any character within the current node . equalling uppercase C to lowercase c via "//s[contains(translate(.,'C','c'), 'cold')]"

=TEXTJOIN(". ",TRUE,FILTERXML("<t><s>"&SUBSTITUTE(A1,".","</s><s>")&"</s></t>","//s[contains(translate(.,'C','c'), 'cold')]"))

Alternatively you might include an or condition to the XPath expression:

=TEXTJOIN(". ",TRUE,FILTERXML("<t><s>"&SUBSTITUTE(A1,".","</s><s>")&"</s></t>","//s[contains(., 'cold') or contains(.,'Cold')]"))

b) Allow completely case insensitive search

To include also COLD or mixtures like cOLd you’d have to list all needed characters via translate(.,'COLD','cold'); if more than a few it’s preferrable to alphabetisize (see 2nd formula):

=TEXTJOIN(". ",TRUE,FILTERXML("<t><s>"&SUBSTITUTE(A1,".","</s><s>")&"</s></t>","//s[contains(translate(.,'COLD','cold'), 'cold')]"))
=TEXTJOIN(". ",TRUE,FILTERXML("<t><s>"&SUBSTITUTE(A1,".","</s><s>")&"</s></t>","//s[contains(translate(.,'CDLO','cdlo'), 'cold')]"

Referenced from the answers:

[Wayback/Archive] excel – Getting 3 digits from a set of numbers using RegEx for VBA – Stack Overflow

This too shows FilterXML is accessible from VBA as well.

(thanks [Wayback/Archive] Arkaelum and [Wayback/Archive] T.M.)

Q

I have a set of numbers (15+ digits) and need to get the value of the 10th – 11th digit. I am working on using RegEx but so far have only made it to select the first 9 digits and can’t go further from there.

^[1-9\][0-9\][0-9\][0-9\][0-9\][0-9\][0-9\][0-9\][0-9\]

Sample digits are:

188202143002267.50 (need to get only 002)
188202143-002267.50 (need to get only 002 —- already resolved this)

Or if there is a better way to do this? I am working on Excel VBA.

A

Alternative via FilterXML()

My intention is not to show a better or more elegant way by this alternative to RegEx, but to point out how one can use FilterXML (available after all since 2013+) in VBA to return filtered results.

Since the handling of XML is still seen as a bit cryptic, I would like to contribute with some hints to a better understanding of how the FilterXML function works.

What does FilterXML:

    • The function returns specific data
    • from a (wellformed) XML content string (roughly comparable to a html-tag structure with individual, but case sensitive node names)
    • by using a specified XPath expression string (c.f. @JvdV ‘s excellent examples at Extract substrings ..)

Syntax – see section 2

    FILTERXML(xml, xpath)

Argument 1: XML content – see section 1

The XML standard allows to design use a customized markup language by your own. If you regard each character of an assumed input “A123” as node element, you could

  • name the starting DocumentElement e.g. r (symbolizing root) and
  • the individual nodes containing the input characters e.g. i
  • (and take care of correctly set closing tags

This results in a simple well formed hierarchy like

    <r>
        <i>A</i>
        <i>1</i>
        <i>2</i>
        <i>3</i>
    </r>

Note that the conversion to a valid xml content in this example is executed by a tricky splitting into single characters plus eventual joining together to insert the wanted node names (~ “tags”) into the final xml content string.

Argument 2: XPath expression – see section 2

The XML hierarchy allows XPath expressions to search within any indicated node and sub-node references. Thus such an XPath expression string to isolate and return digits only like in the posted question could be

    "//i[.*0=0]"

where //i indicates to search occurrencies of i at any hierarchy level, the brackets [] enclose the filter condition to return only digits, expressed by the multiplication of the referenced node content . with zero if resulting in zero .*0=0, thus excluding other elements than digits.

In contrast to XMLDoM (Document Object Model) coding, filtering via Application.FilterXML ‘s XPath expressions does not return node objects (or defined node types), but nothing other than the “assigned” node content values (textual child elements in XMLDoM), which Excel for this special question would interpret automatically as numbers (here even: Double!)

*) XMLDoM is a cross-platform and language-independent interface treating the document as a tree structure and allowing programmatic access to the tree

Returned results – see section 3

Results are returned as follows:

  • a) in the case of several results as a vertical 2-dim generally variant array (in tabular Excel 2019+ as dynamic spill range),
  • b) if there is only one result: as a single value,
  • c) if there is no result (or an incorrect node query): as an error value (the worksheet function would return #VALUE!, the late-bound Application.FilterXML() function Error 2015)

Before transforming these different filtering results to any wanted output (e.g. by transposing or type transformations) you would check them via VarType() – see section 2) in the following code example:

Function GetDigitsOnly()

Example call due to OP: Mid$(DigitsOnly(s), 10, 2) where s represents a string input

Function DigitsOnly(ByVal s As String) As String
    If Not Len(s) Then Exit Function        ' escape if empty string
'1) Get XML content (as FilterXML's first argument)
    'a) make string splitable via vbNullChar delimiter
    s = StrConv(s, vbUnicode)
    'b) atomize via Split()
    Dim tmp
    tmp = Split(Left(s, Len(s) - 1), vbNullChar, Len(s) \ 2 + 1)
    'c) get wellformed (html-like) xml content
    tmp = "<r><i>" & Join(tmp, "</i><i>") & "</i></r>"
'2) apply FilterXML({content},{XPath}) to get digits only
    tmp = Application.FilterXML(tmp, "//i[.*0=0]")
'3) return string result
    Select Case VarType(tmp)
        Case vbError                                 ' not any digit found
            Exit Function
        Case vbArray + vbVariant                      ' vertical digits array
            DigitsOnly = Join(Application.Transpose(tmp), vbNullString)
        Case Else                                    ' single (here e.g.: vbDouble) result
            DigitsOnly = tmp
    End Select
End Function

I quoted the relevant bits above as the question regrettably has been deleted and not available in archived form, but the answer shows a very interesting approach.

Related blog post

Excel: operations involving the last occurence of a substring (with examples getting the current Worksheet, Workbook and file/directory path).

--jeroen

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.