The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,466 other followers

Archive for the ‘RegEx’ Category

windows – Is there any sed like utility for cmd.exe? – Stack Overflow

Posted by jpluimers on 2021/07/19

[WayBack] windows – Is there any sed like utility for cmd.exe? – Stack Overflow

TL;DR: many people suggest to use PowerShell, but there is GNU sed in Chocolatey

The chocolatey part:

The PowerShell part: read the other answers from the above question.

–jeroen

Posted in *nix, *nix-tools, CommandLine, Power User, PowerShell, RegEx, sed, Windows | Leave a Comment »

CloudFlare knows how to do public postmortems on outages

Posted by jpluimers on 2021/07/16

Everyone can learn from an outage. CloudFlare shows how to do it right, for instance on the RegEx-going-wild downtime 2 years ago.

So it’s time to link to that one again: [WayBack] Details of the Cloudflare outage on July 2, 2019

More like these at [WayBack] Post Mortem – The Cloudflare Blog.

More on evaluating regular expressions in linear time:

Via [WayBack] Details of the Cloudflare outage on July 2, 2019 | Hacker News

–jeroen

Posted in Algorithms, Development, Power User, RegEx, Software Development | Leave a Comment »

Regex for a file name without an extension – Stack Overflow

Posted by jpluimers on 2021/06/30

For me this unaccepted answer from [WayBack] Regex for a file name without an extension – Stack Overflow by [WayBack] Bohemian worked best:

Assuming the extensions are up to 4 chars in length (so filenames like mr.smith aren’t considered as having an extension, but mr.smith.doc and mr.smith.html are considered as having extensions):

^.*[^.]{5}$

No need to capture a group, as the whole expression is what you want – ie group 0.

Depending on the extension length, increase 5 to like 7 for 6 character extensions (it’s always N+1 when you want to match extensions of N characters).

–jeroen

Posted in Development, RegEx, Software Development | Leave a Comment »

VMware ESXi console: viewing all VMs, suspending and waking them up: part 1

Posted by jpluimers on 2021/04/22

I think the easiest way to list all VMs is the vim-cmd vmsvc/getallvms command, but it has a big downside: the downside is a mess.

The reason is that the output has a lot of columns (Vmid, Name, Datastore, File, Guest OS, Version, Annotation), more than 500 characters per line (eat that 1080p monitor!), and potentially more than one line per VM as the Annotation is a free-text field that can have newlines.

Example output on one of my machines:

Vmid Name File Guest OS Version Annotation
10 X9SRI-3F-W10P-EN-MEDIA [EVO860_500GB] VM/X9SRI-3F-W10P-EN-MEDIA/X9SRI-3F-W10P-EN-MEDIA.vmx windows9_64Guest vmx-14
5 PPB Local_Virtual Machine_v4.0 [EVO860_500GB] VM/PPB-Local_Virtual-Machine_v4.0/PPB Local_Virtual Machine_v4.0.vmx centos64Guest vmx-11 PowerPanel Business software(Local) provides the service which communicates
with the UPS through USB or Serial cable and relays the UPS state to each Remote on other computers
via a network.
It also monitors and logs the UPS status. The computer which has been installed the Local provides
graceful,
unattended shutdown in the event of the power outage to protect the hosted computer.

As an alternative, you could use esxcli vm process list, but that gives IDs that are way harder to remember:

PPB Local_Virtual Machine_v4.0
World ID: 2099719
Process ID: 0
VMX Cartel ID: 2099713
UUID: 56 4d 74 f8 c8 22 41 27-a3 88 49 df 8b dc d6 63
Display Name: PPB Local_Virtual Machine_v4.0
Config File: /vmfs/volumes/5d35e7d8-e8df636f-46b9-0025907d9d5c/VM/PPB-Local_Virtual-Machine_v4.0/PPB Local_Virtual Machine_v4.0.vmx
X9SRI-3F-W10P-EN-MEDIA
World ID: 2099728
Process ID: 0
VMX Cartel ID: 2099717
UUID: 56 4d 51 ac f6 cf e4 0b-b6 86 2f 53 a2 8a 4b ea
Display Name: X9SRI-3F-W10P-EN-MEDIA
Config File: /vmfs/volumes/5d35e7d8-e8df636f-46b9-0025907d9d5c/VM/X9SRI-3F-W10P-EN-MEDIA/X9SRI-3F-W10P-EN-MEDIA.vmx

I got both of the above commands from [Wayback] VMware Knowledge Base: Performing common virtual machine-related tasks with command-line utilities (2012964).

Back to the columns that vim-cmd vmsvc/getallvms returns:

  • Vmid is an unsigned integer
  • Name can have spaces
  • Datastore has square brackets [ and ] around it
  • File can contain spaces
  • Guest OS is an identifier without spaces (it is a value from [Wayback] the vSphere API VcVirtualMachineGuestOsIdentifier
  • Version looks like vmx-# where # is an unsigned integer
  • Annotation is multi-line free-form so potentially can have lines starting like being Vmid, but the chance that a line looks exactly like a non-annotated one is very low

So let’s find a grep or  sed filter to get just the lines without annotation continuations. Though in general I try to avoid regular expressions as they are hard to both write and read, but with Busybox there is no much choice.

I choose sed, just in case I wanted to do some manipulation in addition to matching.

Busybox sed

Though the source code [Wayback] sed.c\editors – busybox – BusyBox: The Swiss Army Knife of Embedded Linux indicates sed.c - very minimalist version of sed, the implementation actually is reasonably feature rich, just not feature complete. That’s OK given the aim of Busybox to be small.

Luckily, deep in the busybox sed code, it indicates that extended regular expressions are supported (support is in [Wayback] /uClibc/plain/libc/misc/regex/regcomp.c (look for regcomp, do not get confused by xregcomp on call sites as that is [Wayback] just a tiny wrapper to call regcomp).

The support has become better over time, like [Wayback] gnu – sed Command on BusyBox expects different syntax? – Super User shows.

This means far less escaping than basic regular expressions, capture groups are supported as well as character classes (so [[:digit:]] is more readable than [0-9]), and the + is supported to match once or more (so [0-9]+ means one or more digits, as does [[:digit:]]+, but [d]+ or \d+ don’t ). Unfortunately named capture groups are not supported (so documenting parts of the regular expression like (?<Vmid>^[[:digit:]]+) is not possible, it will give you an error [Wayback] Invalid preceding regular expression).

But first a few of the sed commandline options and their order:

vim-cmd vmsvc/getallvms | sed -n -E -e '/(^[[:digit:]]+)/p'
  1. -n outputs only matching lines that have a p print command.
  2. -E allows extended regular expressions (you can also use -r for that)
  3. -e adds a (in this case extended) regular expression
  4. '/(^[[:digit:]]+)/p' is the extended regular expression embedded in quotes
    1. / at the start indicates that sed should match the regular expression on each line it parses
    2. /p at the end indicates the matching line should be printed
    3. Parentheses ( and ) surround a capture group
    4. ^[[:digit:]]+ matches 1 or more digits at the start of the line

The grep command is indeed much shorter, but does not allow post-editing:

vim-cmd vmsvc/getallvms | grep -E '(^[[:digit:]]+)'

Building a sed filter

I came up with the below sed regular expression to filter out lines:

  1. starting with a Vmid unsigned integer
  2. having a [Datastore] before the File
  3. have a Guest OS identifier after File
  4. have a Version matching vmx-# after File where # is an unsigned integer
  5. optionally has an Annotation after Version
vim-cmd vmsvc/getallvms | sed -n -E -e  "/^([[:digit:]]+)(\s+)((\S.+\S)?)(\s+)(\[\S+\])(\s+)(.+\.vmx)(\s+)(\S+)(\s+)(vmx-[[:digit:]]
+)(\s*?)((\S.+)?)$/p"

A longer expression that I used to fiddle around with is at regex101.com/r/A7MfKu and contains named capture groups. I had to nest a few groups and use the ? non-greedy (or lazy) operator a few times to ensure the fields would not include the spaces between the columns.

Others use different expressions as for instance explained in [Wayback] Get all VMs with “vmware-vim-cmd vmsvc/getallvms” – VMware Technology Network VMTN:

Output from “vim-cmd vmsvc/getallvms” is really challenging to process. Our normal approaches such as awk column indexes, character index, and regular expression are all error prone here. The character index of each column varies depending on maximum field length of, for example, VM name. And the presence of spaces in VM names throws off processing as awk columns. And VM name could contain almost any character, foiling regex’s.

Printing capture groups

The cool thing is that it is straightforward to modify the expression to print any of the capture groups in the order you wish: you convert the match expression (/match/p) into a replacement expression (s/match/replace/p) and print the required capture groups in the replace part. A short example is at [Wayback] regex – How to output only captured groups with sed? – Stack Overflow.

There is one gotcha though: Busybox sed only allows single-digit capture group numbers, and we have far more than 9 capture groups. This fails and prints 0 after the output of capture group 1 instead of printing capture group 10, similar for 2 after group 1 instead of printing group 12:

vim-cmd vmsvc/getallvms | sed -n -E -e  "s/^([[:digit:]]+)(\s+)((\S.+\S)?)(\s+)(\[\S+\])(\s+)(.+\.vmx)(\s+)(\S+)(\s+)(vmx-[[:digit:]]+)(\s*?)((\S.+)?)$/Vmid:\1 Guest:\10 Version:\12 Name:\3 Datastore:\7 File:\8/p"

So we need to cut down on capture groups first by removing all capture groups around the \s white-space matching:

vim-cmd vmsvc/getallvms | sed -n -E -e  "/^([[:digit:]]+)\s+((\S.+\S)?)\s+(\[\S+\])\s+(.+\.vmx)\s+(\S+)\s+(vmx-[[:digit:]]+)\s*?((\S.+)?)$/p"

Then we get this to print some of the capture groups:

vim-cmd vmsvc/getallvms | sed -n -E -e "s/^([[:digit:]]+)\s+((\S.+\S)?)\s+(\[\S+\])\s+(.+\.vmx)\s+(\S+)\s+(vmx-[[:digit:]]+)\s*?((\S.+)?)$/Vmid:\1 Guest:\6 Version:\7 Name:\3 Datastore:\4 File:\5 Annotation:\8/p"

With this output:

Vmid:10 Guest:windows9_64Guest Version:vmx-14 Name:X9SRI-3F-W10P-EN-MEDIA Datastore:[EVO860_500GB] File:VM/X9SRI-3F-W10P-EN-MEDIA/X9SRI-3F-W10P-EN-MEDIA.vmx Annotation:
Vmid:5 Guest:centos64Guest Version:vmx-11 Name:PPB Local_Virtual Machine_v4.0 Datastore:[EVO860_500GB] File:VM/PPB-Local_Virtual-Machine_v4.0/PPB Local_Virtual Machine_v4.0.vmx Annotation:PowerPanel Business software(Local) provides the service which communicates

Figuring out power state for each VM

This will be in the next installment, as by now this already has become a big blog-post (:

–jeroen

Posted in *nix, *nix-tools, ash/dash, ash/dash development, Development, ESXi6, ESXi6.5, ESXi6.7, ESXi7, Power User, RegEx, Scripting, Software Development, Virtualization, VMware, VMware ESXi | Leave a Comment »

Delphi TRegExOption: Where is description of roNotEmpty option? What does this option do? – Jacek Laskowski – Google+

Posted by jpluimers on 2020/12/10

I really dislike using regular expressions, mainly because every time I bump into code using them either:

  • I cannot decipher them any more
  • It is used for things not suites for (like parsing JSON or XML: please don’t!)

For more background on when NOT to use regular expressions, remember they describe a regular grammar, and can only me implemented by a finite state machine (a state machine that can be exactly one state out of a set of finite states).

As soon as you need to parse something that needs multiple states at once, or the number of states becomes infinite,

Some background reading:

Read the rest of this entry »

Posted in Delphi, Development, RegEx, Software Development | Leave a Comment »

 
%d bloggers like this: