Space matching with sed is different from PCRE or other common regular expression parsers
Posted by jpluimers on 2021/07/14
On my research list: find out what is the cause of the difference below (Windows batch and Linux behave the same; just the quotes around the echo
is different):
Windows statements:
echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080 | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo failure with [:\s]*? echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080 | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
Linux statements:
echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080 "| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080"| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org "| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org"| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo failure with [:\s]*? echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080 "| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080"| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org "| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" echo "cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org"| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p"
Output:
echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080 | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" https://plastic.example.org/webui/repos/MyRepository/diff/changeset/2648 echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" https://plastic.example.org/webui/repos/MyRepository/diff/changeset/2648 echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" https://plastic.example.org/webui/repos/MyRepository/diff/changeset/2648 echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^: ]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" https://plastic.example.org/webui/repos/MyRepository/diff/changeset/2648 echo failure with [:\s]*? failure with [:\s]*? echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080 | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" https://pla/webui/repos/MyRepository/diff/changeset/2648 echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org:8080| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" https://pla/webui/repos/MyRepository/diff/changeset/2648 echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org | sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" https://pla/webui/repos/MyRepository/diff/changeset/2648 echo cs:2648@rep:MyRepository@repserver:ssl://plastic.example.org| sed -E -r -n "s/^cs:(.*?)@rep:(.*?)@repserver:([a-zA-Z][a-zA-Z+.-]*?):\/\/(\w[^:\s]*?)(:\d*)?.*$/https:\/\/\4\/webui\/repos\/\2\/diff\/changeset\/\1/p" https://pla/webui/repos/MyRepository/diff/changeset/2648
Related:
- PCRE example: regex101.com/r/6XYaSe/8
- a
- [WayBack] Reinserting Text Matched By Capturing Groups in The Replacement Text
- [WayBack] javascript – Non-capturing group ignored by regex101.com – Stack Overflow
- sed regex101 – Google Search
–jeroen
Jürgen Krämer said
Character classes like \s and \d don’t keep their meaning inside a collection. (That’s at least the way sed and vim treat them.) Depending on the exact variant of regular expression engine they are either interpreted as a backslash and a letter or as just the letter. That means [^\s] means either all characters except backslash and lower case S or all characters except lower case S.