aha (Ansi HTML Adapter) with clickable URIs
Posted by jpluimers on 2018/10/02
aha is great to generate HTML from ANSI text (i.e. the coloured output on a Linux console).
But it doesn’t generate clickable URIs (it can’t yet by itself as it only looks one character in the future).
The thread at https://github.com/theZiz/aha/issues/20 suggested a case-insensitive regex through sed but the exact suggestion failed for a few reasons I will explain below.
First the bash alias (requires both aha and perl):
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env bash | |
| # based on https://github.com/theZiz/aha/issues/20#event-797466520 | |
| aha-with-expanded-http-https-urls() | |
| { | |
| aha | perl -C -Mutf8 -pe 's,([^"])((https?|s?ftp|ftps?|file)://.*?)([\s]|\"\s),$1<a href="$2">$2</a>$4,gi' | |
| } |
The above script is a gist as WordPress regularly fucks up text that remotely resembles html.
The drawbacks of the original solution (sed replacement before running aha):
ahawould replace the generate<and>characters in the anchor element with<and>so the regular expression would not work- after moving
ahain front ofsedI found out that on Mac OS X, theIoption is not supported: you will get abad flag in substitute command: 'I'when executingsed 's,\(https\?://[^ ]*\),<a href="\1">\1</a>,gI' - after an initial port of the regular expression replacement to
perlI found out it replaced too much (as it now operated onahageneratedhtml) which made evenperl -C -Mutf8 -pe 's,([^"])((https?|s?ftp|ftps?|file)://[^\s]*),$1<a href="$2">$2</a>,gi'fail
To cut a long story short, here is a bash function that works and you can pipe Ansi output through:
aha-with-expanded-http-https-urls()
{
aha | perl -C -Mutf8 -pe 's,([^"])((https?|s?ftp|ftps?|file)://.*?)([\s]|\"\s),$1<a
}
It doesn’t take into account RFC URI checking by regex as that’s way too convoluted. If anyone wants that, adapt it according to the answers athttp://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url
The biggest problem was to ensure it would skip the " terminating an URI at the end of the line. This can be in the testssl.sh output upon a 302-redirect. So the solution is somewhat tailored to testssl.sh output piped through aha.
A lot of digging finally resulted in this expression at https://regex101.com/r/zF3zQ2/2 Note that site forgets about the , as search separators, but that’s OK: you can use the drop-down to choose another one or paste this full expression and it will happily use the , separator:
s,([^"])((https?|s?ftp|ftps?|file)://.*?)([\s]|\"\s),$1<a href="$2">$2</a>$4,gi
Getting there, one of the things I tried was negative lookahead but that failed. I tried following the example at for instance http://stackoverflow.com/questions/11028336/regex-to-match-a-pattern-and-exclude-list-of-string
So in the above solution, I went for a non-greedy .*? expression followed by matching either whitespace or the " followed by whitespace.
These are the separator, search and modifier part of the above expression:
,([^"])((https?|s?ftp|ftps?|file)://.*?)([\s]|\"\s),gi
Note the 2nd capturing group cannot do without the 3rd in order to match multiple protocols.
This is how it’s assembled:
1stCapturing group([^"])[^"]match a single character not present in the list below"a single character in the list"literally (case insensitive)
2ndCapturing group((https?|s?ftp|ftps?|file)://.*?)3rdCapturing group(https?|s?ftp|ftps?|file)1stAlternative:https?httpmatches the charactershttpliterally (case insensitive)
s?matches the charactersliterally (case insensitive)- Quantifier:
?Betweenzeroandonetime, as many times as possible, giving back as needed[greedy]
- Quantifier:
2ndAlternative:s?ftps?matches the charactersliterally (case insensitive)- Quantifier:
?Betweenzeroandonetime, as many times as possible, giving back as needed[greedy]
- Quantifier:
ftpmatches the charactersftpliterally (case insensitive)3rdAlternative:ftps?ftpmatches the charactersftpliterally (case insensitive)
s?matches the charactersliterally (case insensitive)- Quantifier:
?Betweenzeroandonetime, as many times as possible, giving back as needed[greedy]
- Quantifier:
4thAlternative:filefilematches the characters file literally (case insensitive)
://matches the characters://literally.*?matches any character (except newline)- Quantifier:
*?Between zero and unlimited times, as few times as possible, expanding as needed[lazy]
- Quantifier:
4thCapturing group([\s]|\")1stAlternative:[\s][\s]match a single character present in the list below\smatch any white space character[\r\n\t\f ]
2ndAlternative:\"\s\&matches the character&literallyquot;matches the charactersquot;literally (case insensitive)\smatch any white space character[\r\n\t\f ]
gmodifier: global. All matches (don’t return on first match)imodifier: insensitive. Case insensitive match (ignores case of[a-zA-Z])
For replacement it’s important to ensure all unique capturing groups end up in the output. Which means you can skip $3 (as it’s part of $2) but have to include the others.
Which gets me to the replacement part of the expression:
$1<a href="$2">$2</a>$4
Test input:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!-- This file was created with the aha Ansi HTML Adapter. http://ziz.delphigl.com/tool_aha.php -->
<html xmlns="http://www.w3.org/1999/xhtml">
testssl.sh 2.7dev from https://testssl.sh/dev/
<span style="font-weight:bold;"> OCSP URI </span>http://clients1.google.com/ocsp
<span style="font-weight:bold;"> HTTP Status Code </span> 302 Found, redirecting to "https://www.google.nl/?gfe_rd=cr&ei=ZWjmV86hE5LH8AeFmaP4Bg"
Test output:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!-- This file was created with the aha Ansi HTML Adapter. <a href="http://ziz.delphigl.com/tool_aha.php">http://ziz.delphigl.com/tool_aha.php</a> -->
<html xmlns="http://www.w3.org/1999/xhtml">
testssl.sh 2.7dev from <a href="https://testssl.sh/dev/">https://testssl.sh/dev/</a>
<span style="font-weight:bold;"> OCSP URI </span><a href="http://clients1.google.com/ocsp">http://clients1.google.com/ocsp</a>
<span style="font-weight:bold;"> HTTP Status Code </span> 302 Found, redirecting to "<a href="https://www.google.nl/?gfe_rd=cr&ei=ZWjmV86hE5LH8AeFmaP4Bg">https://www.google.nl/?gfe_rd=cr&ei=ZWjmV86hE5LH8AeFmaP4Bg</a>"
Test matches:
MATCH 1
1. [168-169] ` `
2. [169-205] `http://ziz.delphigl.com/tool_aha.php`
3. [169-173] `http`
4. [205-206] ` `
MATCH 2
1. [286-287] ` `
2. [287-310] `https://testssl.sh/dev/`
3. [287-292] `https`
4. [310-311] `
`
MATCH 3
1. [379-380] `>`
2. [380-411] `http://clients1.google.com/ocsp`
3. [380-384] `http`
4. [411-412] `
`
MATCH 4
1. [512-513] `;`
2. [513-575] `https://www.google.nl/?gfe_rd=cr&ei=ZWjmV86hE5LH8AeFmaP4Bg`
3. [513-518] `https`
4. [575-582] `"
`
Enjoy!
–jeroen
via:
- running testssh.sh through aha while expanding http/https URI entries
- It would be nice if `aha` could render URLs as `a href` · Issue #20 · theZiz/aha






Leave a comment