Archive for the ‘Software Development’ Category
Posted by jpluimers on 2022/02/08
As a precursor to a post tomorrow showing that serving UTF8 does not mean organisations go without unicode problems, first some statistics.
The first Unicode ideas got drafted some 30 years ago in 1987. In 1991, more than 30 years ago, the Unicode Consortium saw the light. Nowadays more than 95% percent of the web-pages (close to 100% when you include plain ASCII) is served using the UTF-8 encoding.
It means that nowadays there is a very small chance you
will see mangled characters (what Japanese call mojibake) when you’re surfing the web.
Some nice graphs of unicode growth are at these locations are at these locations:
I think especially important are 2008 (when UTF-8 had outgrown all other individual encodings) and slightly after 2010, when UTF-8 alone covered more than 50% of the pages served. These exclude ASCII-only pages. Adding those would make the figures even larger.




Historical yearly trends in the usage statistics of character encodings for websites, June 2021
–jeroen
Posted in Development, Encoding, Software Development, UTF-8, UTF8, Web Development | Leave a Comment »
Posted by jpluimers on 2022/02/03
I needed to search for IBAN numbers in documents and used this regular expression: [a-zA-Z]{2}[0-9]{2} ?[a-zA-Z0-9]{4} ?[0-9]{4} ?[0-9]{4} ?[0-9]{2} which supports the usual optional whitespace like in NL12 INGB 0345 6789 01.
It is based on a nice list with table of Notepad++ RegEx character classes supported at [Wayback] Searching | Notepad++ User Manual:
Character Classes
[set] ⇒ This indicates a set of characters, for example, [abc] means any of the literal characters a, b or c. You can also use ranges by doing a hyphen between characters, for example [a-z] for any character from a to z. You can use a collating sequence in character ranges, like in [[.ch.]-[.ll.]] (these are collating sequence in Spanish).
[^set] ⇒ The complement of the characters in the set. For example, [^A-Za-z] means any character except an alphabetic character. Care should be taken with a complement list, as regular expressions are always multi-line, and hence [^ABC]* will match until the first A, B or C (or a, b or c if match case is off), including any newline characters. To confine the search to a single line, include the newline characters in the exception list, e.g. [^ABC\r\n].
Please note that the complement of a character set is often many more characters than you expect: (?-s)[^x]+ will match 1 or more instances of any non-x character, including newlines: the (?-s) search modifier turns off “dot matches newlines”, but the [^x] is not a dot ., so that class is still allowed to match newlines.
[[:name:]] or [[:☒:]] ⇒ The whole character class named name. For many, there is also a single-letter “short” class name, ☒. Please note: the [:name:] and [:☒:] must be inside a character class [...] to have their special meaning.
| short |
full name |
description |
equivalent character class |
|
alnum |
letters and digits |
|
|
alpha |
letters |
|
h |
blank |
spacing which is not a line terminator |
[\t\x20\xA0] |
|
cntrl |
control characters |
[\x00-\x1F\x7F\x81\x8D\x8F\x90\x9D] |
d |
digit |
digits |
|
|
graph |
graphical character, so essentially any character except for control chars, \0x7F, \x80 |
|
l |
lower |
lowercase letters |
|
|
print |
printable characters |
[\s[:graph:]] |
|
punct |
punctuation characters |
[!"#$%&'()*+,\-./:;<=>?@\[\\\]^_{ |
s |
space |
whitespace (word or line separator) |
[\t\n\x0B\f\r\x20\x85\xA0\x{2028}\x{2029}] |
u |
upper |
uppercase letters |
|
|
unicode |
any character with code point above 255 |
[\x{0100}-\x{FFFF}] |
w |
word |
word characters |
[_\d\l\u] |
|
xdigit |
hexadecimal digits |
[0-9A-Fa-f] |
Note that letters include any unicode letters (ASCII letters, accented letters, and letters from a variety of other writing systems); digits include ASCII numeric digits, and anything else in Unicode that’s classified as a digit (like superscript numbers ¹²³…).
Note that those character class names may be written in upper or lower case without changing the results. So [[:alnum:]] is the same as [[:ALNUM:]] or the mixed-case [[:AlNuM:]].
As stated earlier, the [:name:] and [:☒:] (note the single brackets) must be a part of a surrounding character class. However, you may combine them inside one character class, such as [_[:d:]x[:upper:]=], which is a character class that would match any digit, any uppercase, the lowercase x, and the literal _ and = characters. These named classes won’t always appear with the double brackets, but they will always be inside of a character class.
If the [:name:] or [:☒:] are accidentally not contained inside a surrounding character class, they will lose their special meaning. For example, [:upper:] is the character class matching :, u, p, e, and r; whereas [[:upper:]] is similar to [A-Z] (plus other unicode uppercase letters)
[^[:name:]] or [^[:☒:]] ⇒ The complement of character class named name or ☒ (matching anything not in that named class). This uses the same long names, short names, and rules as mentioned in the previous description.
–jeroen
Posted in Development, Notepad++, Power User, RegEx, Software Development, Text Editors | Leave a Comment »
Posted by jpluimers on 2022/02/02
Interesting project at [Wayback] Open Source Insights
Open Source Insights is an experimental project by Google.
Hopefully by now it is supporting more than just npm/golang/maven and by the time it sunsets, other projects take over.
The introduction was some 9 months ago: [Wayback] Introducing the Open Source Insights Project | Google Open Source Blog
Via:
–jeroen
Posted in Development, Go (golang), JavaScript/ECMAScript, Node.js, Power User, Scripting, Security, Software Development | Leave a Comment »
Posted by jpluimers on 2022/02/02
TL;DR:
- Windows has
CON: which is an equivalent for /dev/tty
- Windows has no equivalent for
/dev/stdout (the standard output stream)
- There is a C#
PipeServer.cs proof-of-concept that allows to simulate /dev/stdout through a temporary named pipe
- Windows pipe names start with
\\.\pipe\ for names on the local machine
- The above for
/dev/stdout on Windows also holds for /dev/stdin (the standard input stream)
All via [Wayback] pipe – Windows how to redirect file parameter to stdout? (Windows equivalent of /dev/stdout) – Super User.
Read the rest of this entry »
Posted in .NET, C#, Development, Software Development, Windows Development | Leave a Comment »
Posted by jpluimers on 2022/02/01
[Wayback] Jeroen Wiert Pluimers on Twitter: “”Too special” password character password woos at @HORNBACH_NL : [ Het wachtwoord moet minstens acht tekens lang zijn, en minstens een getal en een letter (a-zA-Z) bevatten. De volgende speciale tekens zijn toegestaan: !”#$%&'()*+,.:;?@_|} ] 1/”
I wonder what kind of parser they use, as these printable special ASCII characters are forbidden:
- \-/[\]^`{~
- space (0x20)
- tab (0x9)
- line feed (0xa)
- carriage return (0xb
- vertical tab (0xb)
- form feed (0xc)
Seems no JSON or SQL to me: there I would expect other limitations.
What would break if you use them in other fields or pass them in an HTML POST-request?
I mean: these passwords should be salted and hashed immediately when the HTML-POST request is received, so certainly they would not be stored somewhere or passed many layers into code, right?
Oh, in order to activate an account there, you need to accept some 40+ A4 sized pages of legal stuff. Brave Dutch judge that will put these all in favour of Hornbach.
–jeroen
Read the rest of this entry »
Posted in Development, LifeHacker, Power User, Security, Software Development, Web Development | Leave a Comment »
Posted by jpluimers on 2022/02/01
Sometimes it is easier to have current and public CA signed TLS certificates for internal servers than to setup and maintain an internal CA and register it on all affected browsers (including mobile phones).
One of my reasons to investigate this is that Chrome refuses to save credentials on servers that have no verifiable TLS certificate, see my post Some links on Chrome not prompting to save passwords (when Firefox and Safari do) about a week ago.
Below are some links for my link archive that hopefully will allow me to do this with Let’s Encrypt (msot via [Wayback/Archive] letsencrypt for internal servers – Google Search):
Read the rest of this entry »
Posted in Cloud, Cloudflare, Development, Encryption, ESXi6, ESXi6.5, ESXi6.7, ESXi7, Fritz!, Fritz!Box, Fritz!WLAN, Infrastructure, Internet, Let's Encrypt (letsencrypt/certbot), Power User, Security, Software Development, Virtualization, VMware, VMware ESXi, Web Development | Leave a Comment »
Posted by jpluimers on 2022/01/31
Small cd-to-file.bat tip:
pushd %~dp1
–jeroen
Posted in Batch-Files, Power User, Scripting, Software Development, Windows | Leave a Comment »
Posted by jpluimers on 2022/01/27
First the script that display messages for all virtual machines, vim-cmd-display-messages-for-all-VMs.sh:
#!/bin/sh
vmids=`vim-cmd vmsvc/getallvms | sed -n -E -e "s/^([[:digit:]]+)\s+((\S.+\S)?)\s+(\[\S+\])\s+(.+\.vmx)\s+(\S+)\s+(vmx-[[:digit:]]+)\s*?((\S.+)?)$/\1/p"`
for vmid in ${vmids} ; do
powerState=`vim-cmd vmsvc/power.getstate ${vmid} | sed '1d'`
name=`vim-cmd vmsvc/get.config ${vmid} | sed -n -E -e '/\(vim.vm.ConfigInfo\) \{/,/files = \(vim.vm.FileInfo\) \{/ s/^ +name = "(.*)",.*?/\1/p'`
vmPathName=`vim-cmd vmsvc/get.config ${vmid} | sed -n -E -e '/files = \(vim.vm.FileInfo\) \{/,/tools = \(vim.vm.ToolsConfigInfo\) \{/ s/^ +vmPathName = "(.*)",.*?/\1/p'`
echo "Messages for VM with id ${vmid} which has power state ${powerState} (name = ${name}; vmPathName = ${vmPathName})."
vim-cmd vmsvc/message ${vmid}
done
exit 0
It is very similar to vim-cmd-reload-all-VM-vmx-configurations.sh from Source: ESXi: reloading all virtual machines from their (potentially) vmx files.
Messages I know either equal “No message” or are about “This virtual machine may have been moved or copied.“
If there is no available message, then you always get the stock message No message., so this is something you can use as a check in scripts.
Read the rest of this entry »
Posted in *nix, *nix-tools, ArchiveTeamWarrior, ash/dash, ash/dash development, Development, ESXi6, ESXi6.5, ESXi6.7, ESXi7, Power User, Scripting, Software Development, Virtualization, VMware, VMware ESXi | Leave a Comment »
Posted by jpluimers on 2022/01/26
I’ve been agile all my (not just programming) life, and only figured out this century that there is a vocabulary for that, containing the words agile, extreme programming, feature-driven and many more.
Now with the passing of the years, I also realise I have been trying to do “slow and smooth” all my life, and that with age (and less adrenaline) this becomes easier and easier.
I think “slow and smooth” goes well with “agile”, specially when you keep the focus on “doing things right” (and trying to do them right the first time, and keeping it right in incremental steps).
It often reminds me of the Dutch phrase “heeft u haast, gaat dan zitten” which often is attributed to be part of the many Chinese proverbs. It roughly translates to “when in a hurry, take a seat”, and suggests to take a step back and think when under pressure. Maybe this English version of a Chinese proverb comes close: “When you are in a hurry, the horse holds back”.
For is it is intriguing that mainly Chinese, but in a broader sense Asian, proverbs play such an important role, whereas Western proverbs get less and less important. Informal knowledge seems to diminish in Western culture, which I think is a pity.
Maybe all these vocabulary things that started to make sense way after my puberty also have to do with being diagnosed autistic at 50. That too started a lot of puzzle-pieces to suddenly make sense.
Below the links that inspired me to make this blog post in the first place:
–jeroen
Read the rest of this entry »
Posted in Agile, Conference Topics, Conferences, Development, Event, LifeHacker, Power User, Software Development | Leave a Comment »