RegEx character classes in “Searching | Notepad++ User Manual” « The Wiert Corner

All categories

February 2022
M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28

RegEx character classes in “Searching | Notepad++ User Manual”

Posted by jpluimers on 2022/02/03

I needed to search for IBAN numbers in documents and used this regular expression: [a-zA-Z]{2}[0-9]{2} ?[a-zA-Z0-9]{4} ?[0-9]{4} ?[0-9]{4} ?[0-9]{2} which supports the usual optional whitespace like in NL12 INGB 0345 6789 01.

It is based on a nice list with table of Notepad++ RegEx character classes supported at [Wayback] Searching | Notepad++ User Manual:

Character Classes

[set] ⇒ This indicates a set of characters, for example, [abc] means any of the literal characters a, b or c. You can also use ranges by doing a hyphen between characters, for example [a-z] for any character from a to z. You can use a collating sequence in character ranges, like in [[.ch.]-[.ll.]] (these are collating sequence in Spanish).

[^set] ⇒ The complement of the characters in the set. For example, [^A-Za-z] means any character except an alphabetic character. Care should be taken with a complement list, as regular expressions are always multi-line, and hence [^ABC]* will match until the first A, B or C (or a, b or c if match case is off), including any newline characters. To confine the search to a single line, include the newline characters in the exception list, e.g. [^ABC\r\n].

Please note that the complement of a character set is often many more characters than you expect: (?-s)[^x]+ will match 1 or more instances of any non-x character, including newlines: the (?-s) search modifier turns off “dot matches newlines”, but the [^x] is not a dot ., so that class is still allowed to match newlines.

[[:name:]] or [[:☒:]] ⇒ The whole character class named name. For many, there is also a single-letter “short” class name, ☒. Please note: the [:name:] and [:☒:] must be inside a character class [...] to have their special meaning.

short full name description equivalent character class

alnum letters and digits

alpha letters

h blank spacing which is not a line terminator [\t\x20\xA0]

cntrl control characters [\x00-\x1F\x7F\x81\x8D\x8F\x90\x9D]

d digit digits

graph graphical character, so essentially any character except for control chars, \0x7F, \x80

l lower lowercase letters

print printable characters [\s[:graph:]]

punct punctuation characters [!"#$%&'()*+,\-./:;<=>?@\[\\\]^_{

s space whitespace (word or line separator) [\t\n\x0B\f\r\x20\x85\xA0\x{2028}\x{2029}]

u upper uppercase letters

unicode any character with code point above 255 [\x{0100}-\x{FFFF}]

w word word characters [_\d\l\u]

xdigit hexadecimal digits [0-9A-Fa-f]

Note that letters include any unicode letters (ASCII letters, accented letters, and letters from a variety of other writing systems); digits include ASCII numeric digits, and anything else in Unicode that’s classified as a digit (like superscript numbers ¹²³…).

Note that those character class names may be written in upper or lower case without changing the results. So [[:alnum:]] is the same as [[:ALNUM:]] or the mixed-case [[:AlNuM:]].

As stated earlier, the [:name:] and [:☒:] (note the single brackets) must be a part of a surrounding character class. However, you may combine them inside one character class, such as [_[:d:]x[:upper:]=], which is a character class that would match any digit, any uppercase, the lowercase x, and the literal _ and = characters. These named classes won’t always appear with the double brackets, but they will always be inside of a character class.

If the [:name:] or [:☒:] are accidentally not contained inside a surrounding character class, they will lose their special meaning. For example, [:upper:] is the character class matching :, u, p, e, and r; whereas [[:upper:]] is similar to [A-Z] (plus other unicode uppercase letters)

[^[:name:]] or [^[:☒:]] ⇒ The complement of character class named name or ☒ (matching anything not in that named class). This uses the same long names, short names, and rules as mentioned in the previous description.

short	full name	description	equivalent character class
	`alnum`	letters and digits
	`alpha`	letters
`h`	`blank`	spacing which is not a line terminator	`[\t\x20\xA0]`
	`cntrl`	control characters	`[\x00-\x1F\x7F\x81\x8D\x8F\x90\x9D]`
`d`	`digit`	digits
	`graph`	graphical character, so essentially any character except for control chars, `\0x7F`, `\x80`
`l`	`lower`	lowercase letters
	`print`	printable characters	`[\s[:graph:]]`
	`punct`	punctuation characters	`[!"#$%&'()*+,\-./:;<=>?@\[\\\]^_`{
`s`	`space`	whitespace (word or line separator)	`[\t\n\x0B\f\r\x20\x85\xA0\x{2028}\x{2029}]`
`u`	`upper`	uppercase letters
	`unicode`	any character with code point above 255	`[\x{0100}-\x{FFFF}]`
`w`	`word`	word characters	`[_\d\l\u]`
	`xdigit`	hexadecimal digits	`[0-9A-Fa-f]`

–jeroen

This entry was posted on 2022/02/03 at 06:00 and is filed under Development, Notepad++, Power User, RegEx, Software Development, Text Editors. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

	Jeroen Wiert Pluimer… on Arjen Lentz Crystal Ball Vulne…
	Jeroen Wiert Pluimer… on Digitale toegankelijkheid als…
	Jeroen Wiert Pluimer… on Digitale toegankelijkheid als…
	Vereniging NLUUG on Digitale toegankelijkheid als…
	jpluimers on Sony STR-DE205 Receiver…

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

My Flickr Stream

Pages

All categories

Email Subscription

RegEx character classes in “Searching | Notepad++ User Manual”

Character Classes

Leave a comment Cancel reply

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

My Flickr Stream

Pages

All categories

Email Subscription

RegEx character classes in “Searching | Notepad++ User Manual”

Character Classes

Rate this:

Share this:

Related

Leave a comment Cancel reply