aha (Ansi HTML Adapter) with clickable URIs
Posted by jpluimers on 2018/10/02
aha
is great to generate HTML from ANSI text (i.e. the coloured output on a Linux console).
But it doesn’t generate clickable URIs (it can’t yet by itself as it only looks one character in the future).
The thread at https://github.com/theZiz/aha/issues/20 suggested a case-insensitive regex
through sed
but the exact suggestion failed for a few reasons I will explain below.
First the bash alias (requires both aha
and perl
):
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
# based on https://github.com/theZiz/aha/issues/20#event-797466520 | |
aha-with-expanded-http-https-urls() | |
{ | |
aha | perl -C -Mutf8 -pe 's,([^"])((https?|s?ftp|ftps?|file)://.*?)([\s]|\"\s),$1<a href="$2">$2</a>$4,gi' | |
} |
The above script is a gist as WordPress regularly fucks up text that remotely resembles html.
The drawbacks of the original solution (sed
replacement before running aha
):
aha
would replace the generate<
and>
characters in the anchor element with<
and>
so the regular expression would not work- after moving
aha
in front ofsed
I found out that on Mac OS X, theI
option is not supported: you will get abad flag in substitute command: 'I'
when executingsed 's,\(https\?://[^ ]*\),<a href="\1">\1</a>,gI'
- after an initial port of the regular expression replacement to
perl
I found out it replaced too much (as it now operated onaha
generatedhtml
) which made evenperl -C -Mutf8 -pe 's,([^"])((https?|s?ftp|ftps?|file)://[^\s]*),$1<a href="$2">$2</a>,gi'
fail
To cut a long story short, here is a bash function that works and you can pipe Ansi output through:
aha-with-expanded-http-https-urls()
{
aha | perl -C -Mutf8 -pe 's,([^"])((https?|s?ftp|ftps?|file)://.*?)([\s]|\"\s),$1<a
}
It doesn’t take into account RFC URI checking by regex as that’s way too convoluted. If anyone wants that, adapt it according to the answers athttp://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url
The biggest problem was to ensure it would skip the "
terminating an URI at the end of the line. This can be in the testssl.sh
output upon a 302-redirect
. So the solution is somewhat tailored to testssl.sh
output piped through aha
.
A lot of digging finally resulted in this expression at https://regex101.com/r/zF3zQ2/2 Note that site forgets about the ,
as search separators, but that’s OK: you can use the drop-down to choose another one or paste this full expression and it will happily use the ,
separator:
s,([^"])((https?|s?ftp|ftps?|file)://.*?)([\s]|\"\s),$1<a href="$2">$2</a>$4,gi
Getting there, one of the things I tried was negative lookahead but that failed. I tried following the example at for instance http://stackoverflow.com/questions/11028336/regex-to-match-a-pattern-and-exclude-list-of-string
So in the above solution, I went for a non-greedy .*?
expression followed by matching either whitespace or the "
followed by whitespace.
These are the separator, search and modifier part of the above expression:
,([^"])((https?|s?ftp|ftps?|file)://.*?)([\s]|\"\s),gi
Note the 2nd
capturing group cannot do without the 3rd
in order to match multiple protocols.
This is how it’s assembled:
1st
Capturing group([^"])
[^"]
match a single character not present in the list below"
a single character in the list"
literally (case insensitive)
2nd
Capturing group((https?|s?ftp|ftps?|file)://.*?)
3rd
Capturing group(https?|s?ftp|ftps?|file)
1st
Alternative:https?
http
matches the charactershttp
literally (case insensitive)
s?
matches the characters
literally (case insensitive)- Quantifier:
?
Betweenzero
andone
time, as many times as possible, giving back as needed[greedy]
- Quantifier:
2nd
Alternative:s?ftp
s?
matches the characters
literally (case insensitive)- Quantifier:
?
Betweenzero
andone
time, as many times as possible, giving back as needed[greedy]
- Quantifier:
ftp
matches the charactersftp
literally (case insensitive)3rd
Alternative:ftps?
ftp
matches the charactersftp
literally (case insensitive)
s?
matches the characters
literally (case insensitive)- Quantifier:
?
Betweenzero
andone
time, as many times as possible, giving back as needed[greedy]
- Quantifier:
4th
Alternative:file
file
matches the characters file literally (case insensitive)
://
matches the characters://
literally.*?
matches any character (except newline)- Quantifier:
*?
Between zero and unlimited times, as few times as possible, expanding as needed[lazy]
- Quantifier:
4th
Capturing group([\s]|\")
1st
Alternative:[\s]
[\s]
match a single character present in the list below\s
match any white space character[\r\n\t\f ]
2nd
Alternative:\"\s
\&
matches the character&
literallyquot;
matches the charactersquot;
literally (case insensitive)\s
match any white space character[\r\n\t\f ]
g
modifier: global. All matches (don’t return on first match)i
modifier: insensitive. Case insensitive match (ignores case of[a-zA-Z]
)
For replacement it’s important to ensure all unique capturing groups end up in the output. Which means you can skip $3
(as it’s part of $2
) but have to include the others.
Which gets me to the replacement part of the expression:
$1<a href="$2">$2</a>$4
Test input:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!-- This file was created with the aha Ansi HTML Adapter. http://ziz.delphigl.com/tool_aha.php -->
<html xmlns="http://www.w3.org/1999/xhtml">
testssl.sh 2.7dev from https://testssl.sh/dev/
<span style="font-weight:bold;"> OCSP URI </span>http://clients1.google.com/ocsp
<span style="font-weight:bold;"> HTTP Status Code </span> 302 Found, redirecting to "https://www.google.nl/?gfe_rd=cr&ei=ZWjmV86hE5LH8AeFmaP4Bg"
Test output:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!-- This file was created with the aha Ansi HTML Adapter. <a href="http://ziz.delphigl.com/tool_aha.php">http://ziz.delphigl.com/tool_aha.php</a> -->
<html xmlns="http://www.w3.org/1999/xhtml">
testssl.sh 2.7dev from <a href="https://testssl.sh/dev/">https://testssl.sh/dev/</a>
<span style="font-weight:bold;"> OCSP URI </span><a href="http://clients1.google.com/ocsp">http://clients1.google.com/ocsp</a>
<span style="font-weight:bold;"> HTTP Status Code </span> 302 Found, redirecting to "<a href="https://www.google.nl/?gfe_rd=cr&ei=ZWjmV86hE5LH8AeFmaP4Bg">https://www.google.nl/?gfe_rd=cr&ei=ZWjmV86hE5LH8AeFmaP4Bg</a>"
Test matches:
MATCH 1
1. [168-169] ` `
2. [169-205] `http://ziz.delphigl.com/tool_aha.php`
3. [169-173] `http`
4. [205-206] ` `
MATCH 2
1. [286-287] ` `
2. [287-310] `https://testssl.sh/dev/`
3. [287-292] `https`
4. [310-311] `
`
MATCH 3
1. [379-380] `>`
2. [380-411] `http://clients1.google.com/ocsp`
3. [380-384] `http`
4. [411-412] `
`
MATCH 4
1. [512-513] `;`
2. [513-575] `https://www.google.nl/?gfe_rd=cr&ei=ZWjmV86hE5LH8AeFmaP4Bg`
3. [513-518] `https`
4. [575-582] `"
`
Enjoy!
–jeroen
via:
- running testssh.sh through aha while expanding http/https URI entries
- It would be nice if `aha` could render URLs as `a href` · Issue #20 · theZiz/aha
Leave a Reply