The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 4,229 other subscribers

Space matching with sed is different from PCRE or other common regular expression parsers

Posted by jpluimers on 2021/07/14

On my research list: find out what is the cause of the difference below (Windows batch and Linux behave the same; just the quotes around the echo is different):

Windows statements:

echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080 | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo failure with [:\s]*?
echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080 | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"

Linux statements:

echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080 "| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080"| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org "| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org"| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo failure with [:\s]*?
echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080 "| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080"| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org "| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org"| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"

Output:

echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080 | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
https://plastic.example.org/webui/repos/MyRepository/diff/changeset/2648

echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
https://plastic.example.org/webui/repos/MyRepository/diff/changeset/2648

echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
https://plastic.example.org/webui/repos/MyRepository/diff/changeset/2648

echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
https://plastic.example.org/webui/repos/MyRepository/diff/changeset/2648

echo failure with [:\s]*?
failure with [:\s]*?

echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080 | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
https://pla/webui/repos/MyRepository/diff/changeset/2648

echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
https://pla/webui/repos/MyRepository/diff/changeset/2648

echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
https://pla/webui/repos/MyRepository/diff/changeset/2648

echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
https://pla/webui/repos/MyRepository/diff/changeset/2648

Related:

–jeroen

 

 

One Response to “Space matching with sed is different from PCRE or other common regular expression parsers”

  1. Jürgen Krämer said

    Character classes like \s and \d don’t keep their meaning inside a collection. (That’s at least the way sed and vim treat them.) Depending on the exact variant of regular expression engine they are either interpreted as a backslash and a letter or as just the letter. That means [^\s] means either all characters except backslash and lower case S or all characters except lower case S.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

 
%d bloggers like this: