Archive for the ‘Encoding’ Category
Posted by jpluimers on 2022/11/22
I have been into more and more Mojibake example pages like [Wayback] Mojibake: Question Marks, Strange Characters and Other Issues | GPI
Have you ever found strange characters like these ��� when viewing content in applications or websites in other languages?
They made me realise that all these (including the Mojibake examples on my blog) are just artifacts, but the real list of examples is the set of ftfy test cases at [Wayback/Archive.is] python-ftfy/test_cases.json at master · LuminosoInsight/python-ftfy
I got reminded when Waternet moved from paper mail using “Pyreneeën” to email using “Pyreneeën“. Not as bad as Waterschap AGV did earlier: they took it one level further and made “Pyreneeën” out of it, see Last year, a classic Mojibake was introduced when Waterschap Amstel, Gooi en Vecht redesigned their IT systems.
This seems like a trend where newer systems perform worse than older systems. I wonder why that is.
BTW: the trick on the [Wayback/Archive] Python.org shell to run ftfy (which is not installed by default) is first dropping to the shell (see my post How do I drop a bash shell from within Python? – Stack Overflow), then starting python again:
Read the rest of this entry »
Posted in CP850, Development, Encoding, ftfy, ISO-8859, Mojibake, Python, Scripting, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »
Posted by jpluimers on 2022/09/01
Geocities is long dead, but luckily a lot has been archived: [Wayback] Archive.is: History of ASCII Art with a very comprehensive history ranging from ancient old hand painted art to contemporary computer made illustrations.
Via: [Wayback/Archive.is] ASCII art: The roots of ASCII art
--jeroen
Posted in ASCII, ASCII art / AsciiArt, Development, Encoding, Fun, History, Power User, Retrocomputing, Software Development | Leave a Comment »
Posted by jpluimers on 2022/07/06
Early june, I blogged about Wake-on-LAN from a Windows machine.
My plan was to adopt [Wayback/Archive.is] Wake.ps1 into Wake-on-LAN.ps1 (as naming is important).
One of the goals was to support multiple hardware MAC address formats, especially as Wake.ps1 had the below comment, but did support the AA-BB-CC-DD-EE-FF, though not the AA:BB:CC:DD:EE:FF hardware MAC address format:
<#
...
.NOTES
Make sure the MAC addresses supplied don't contain "-" or ".".
#>
A colon separated hardware MAC address would result in this error inside the call to the [Wayback/Archive.is] PhysicalAddress.Parse Method (System.Net.NetworkInformation) | Microsoft Docs:
Send-Packet : Exception calling "Parse" with "1" argument(s): "An invalid physical address was specified."
So I did some digging, starting inside the above mentioned blog post, and adding more:
- Wake.ps1 uses the [Wayback/Archive.is]
Parse method in the [Wayback/Archive.is] PhysicalAddress.cs source code in C# .NET, which contains code like this:
//has dashes?
if (address.IndexOf('-') >= 0 ){
hasDashes = true;
buffer = new byte[(address.Length+1)/3];
}
- The Perl script at [Wayback/Archive.is] wakeonlan/wakeonlan at master · jpoliv/wakeonlan that started my first blog post in this series which mentions:
xx:xx:xx:xx:xx:xx (canonical)
xx-xx-xx-xx-xx-xx (Windows)
xxxxxx-xxxxxx (Hewlett-Packard switches)
xxxxxxxxxxxx (Intel Landesk)
I should rename the first one IEEE 802, as per this:
- The MAC address: Notational conventions – Wikipedia
The standard (IEEE 802) format for printing EUI-48 addresses in human-friendly form is six groups of two hexadecimal digits, separated by hyphens (-) in transmission order (e.g. 01-23-45-67-89-AB). This form is also commonly used for EUI-64 (e.g. 01-23-45-67-89-AB-CD-EF).[2] Other conventions include six groups of two hexadecimal digits separated by colons (:) (e.g. 01:23:45:67:89:AB), and three groups of four hexadecimal digits separated by dots (.) (e.g. 0123.4567.89AB); again in transmission order.[30]
The latter is used by Cisco (see for instance [Wayback/Archive.is] Cisco DCNM Security Configuration Guide, Release 4.0 – Configuring MAC ACLs [Support] – Cisco and [Wayback/Archive.is] Cisco IOS LAN Switching Command Reference – mac address-group through revision [Support] – Cisco), so another format to add:
- [Wayback/Archive.is] PhysicalAddress.Parse Method (System.Net.NetworkInformation) | Microsoft Docs remarks:
The address parameter must contain a string that can only consist of numbers and letters as hexadecimal digits. Some examples of string formats that are acceptable are as follows:
001122334455
00-11-22-33-44-55
0011.2233.4455
00:11:22:33:44:55
F0-E1-D2-C3-B4-A5
f0-e1-d2-c3-b4-a5
Use the GetAddressBytes method to retrieve the address from an existing PhysicalAddress instance.
- After a bit more digging via [Wayback/Archive.is] “three groups of four hexadecimal digits separated by dots” – Google Search , I found that even more hardware MAC address formats are in use as per [Wayback/Archive.is] What are the various standard and industry practice ways to express a 48-bit MAC address? – Network Engineering Stack Exchange.
I really do not have all the sources for the various representations for 48-bit MAC addresses, but I have seen them variously used:
AA-BB-CC-DD-EE-FF
AA.BB.CC.DD.EE.FF
AA:BB:CC:DD:EE:FF
AAA-BBB-CCC-DDD
AAA.BBB.CCC.DDD
AAA:BBB:CCC:DDD
AAAA-BBBB-CCCC
AAAA.BBBB.CCCC
AAAA:BBBB:CCCC
AAAAAA-BBBBBB
AAAAAA.BBBBBB
AAAAAA:BBBBBB
From the last list, which is far more complete than the others, I recognise quite a few from tools I used in the past, but too forgot the actual sources, so I took the full list from there and tried to name them in parenthesis after the links I found above and what I remembered:
AABBCCDDEEFF (Bare / Landesk)
AA-BB-CC-DD-EE-FF (IEEE 802 / Windows)
AA.BB.CC.DD.EE.FF (???)
AA:BB:CC:DD:EE:FF (Linux / BSD / MacOS)
AAA-BBB-CCC-DDD (???)
AAA.BBB.CCC.DDD (Cisco?)
AAA:BBB:CCC:DDD (???)
AAAA-BBBB-CCCC (???)
AAAA.BBBB.CCCC (Cisco / Brocade)
AAAA:BBBB:CCCC (???)
AAAAAA-BBBBBB (Hewlett-Packard networking)
AAAAAA.BBBBBB (???)
AAAAAA:BBBBBB (???)
Some additional links in addition to the ones above:
–jeroen
Posted in .NET, CommandLine, Development, Encoding, HEX encoding, Network-and-equipment, Power User, PowerShell, PowerShell, Scripting, Software Development | Leave a Comment »
Posted by jpluimers on 2022/06/30
Even with a batch file saved as UTF-8 (with or without BOM), by default it does not show most non-ASCII Unicode characters.
The reason is that the default codepage usually is an ANSI one like codepage 437.
Thanks [Wayback] niutech for answering [Wayback/Archive.is] Unicode symbols in a batch file – Stack Overflow:
You can manually set the codepage to UTF-8 by typing chcp 65001 at the top of your batch file.
Codepage 65001 is Windows speak for the UTF-8 code page. I have some more blog entries mentioning codepage 65001.
An example where I needed this was to show how to address the localghost from a batch file (see The spookback localghost address to resolve 👻). This was the resulting UTF-8 saved batch file:
chcp 65001
ping 👻
ping xn--9q8h
For single-byte non-ASCII characters, you can usually get away with setting the encoding of your batch file to your default code page as mentioned in [Wayback/Archive.is] cmd – Using box-drawing Unicode characters in batch files – Stack Overflow.
–jeroen
Posted in Batch-Files, Development, Encoding, Scripting, Software Development, Unicode, UTF-8, Windows Development | Leave a Comment »
Posted by jpluimers on 2022/06/23
[Wayback/Archive.is] CyberChef:
a simple, intuitive web app for carrying out all manner of “cyber” operations within a web browser. These operations include simple encoding like XOR or Base64, more complex encryption like AES, DES and Blowfish, creating binary and hexdumps, compression and decompression of data, calculating hashes and checksums, IPv6 and X.509 parsing, changing character encodings, and much more.
Source code at [Wayback/Archive.is] gchq/CyberChef: The Cyber Swiss Army Knife – a web app for encryption, encoding, compression and data analysis.
Via [Archive.is] Jilles🏳️🌈 on Twitter: “Hidden in plain sight. Rot13 cross word. Hidden Barcodes. Qr codes. Barely any InfoSec skill required. Still a hand full. Usually my to go place is: Cyberchef. I did a fun one for cyberklaas using ansi art.… “
Jilles also pointed to the solving part in [Archive.is] Jilles🏳️🌈 on Twitter: “See also, for solving: SCWF… “
The [Wayback/Archive.is] Solve Crypto with Force! needs to run without most script blockers, so best run it in an anonymous/private browser window.
Source code for SCWF is at [Wayback/Archive.is] DaWouw/SCWF: CTF tool for identifying, brute forcing and decoding encryption schemes in an automated way.
Screen shot of Cyberchef example “Perform AES decryption, extracting the IV from the beginning of the cipher stream” [Archive.is]:
Read the rest of this entry »
Posted in Cyberchef, Development, Encoding, Encryption, Hashing, Power User, Security, Software Development | Leave a Comment »
Posted by jpluimers on 2022/06/09
Like me, [Archive.is] Kristian Köhntopp is a nerd.
Unlike me, Kris bumped into character encoding issues for just about all his digital life. That started about the same time as mine, but again unlike me: he was way more involved in the technical aspects of it.
First a series of Tweets:
Read the rest of this entry »
Posted in ASCII, C++, Development, Encoding, EPS/PostScript, Font, ISO-8859, ISO8859, Power User, Software Development, Times New Roman | Leave a Comment »
Posted by jpluimers on 2022/03/16
Last year, Waterschap Amstel, Gooi en Vecht sent me a paper letter notifying the yearly water bill was going to be late as they were redesigning their IT systems.
Their letter introduced a classic Mojibake that had not been present in all their older paper letter communication.
- Street name on a letter via the old IT systems is
"Pyreneeën":

- Street name on a letter via the new IT systems is
"Pyreneeën":

Read the rest of this entry »
Posted in Development, Encoding, ftfy, Mojibake, Python, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »
Posted by jpluimers on 2022/03/10
When writing this, [Wayback/Archive.is] ftfy · PyPI:history indicates ftfy was already at 6.0.3.
It is still my goto tool for figuring out the cause of Mojibake. I remember writing about it the first time in 2016 (see the ftfy category) when it was already at version 3.0, discovering it after a few Mojibake posts.
By now it even understands right-to-left Mojibake garbage:
[Archive.is] Elia Robyn Speer on Twitter: “ftfy 5.8 is out! … A user reported that Hebrew text wasn’t being fixed, and this made me think about how to expand some of the trickier cases to non-Latin alphabets.”
Mojibake mishaps still happen a lot, so by now I hope I will have done a Mojibake themed Delphi talk at one or more conferences.
Read the rest of this entry »
Posted in !!con (bangbangcon), About, Autistic Spectrum/Autism, Cancer, Conference Topics, Conferences, Development, Encoding, Event, ftfy, Mojibake, Personal, Python, Rectum cancer, Scripting, Software Development, Unicode | Leave a Comment »
Posted by jpluimers on 2022/02/16
So I can find them back later:
- SMS: Short Message Service. Messages limited to 140 octet (160 7-bit characers, 140 8-bit characters or 70 16-bit characters) sent mainly over the GSM or UMTS mobile networks.
- Concatenated SMS or Multipart SMS. Does work on most devices and most operators. Way to send messages longer than 140 octets. Each part is billed separately.
- MSISDN a number uniquely identifying a subscription in a GSM or a UMTS mobile network. Always starts with country code. Never includes a prefix (like 00 or +).
- SMPP: Short Message Peer-to-Peer.
- HLR: Home Location Register.
An interesting party with some public SMS APIs is MessageBird. You can compare their old and new ones:
Read the rest of this entry »
Posted in Development, Encoding, Software Development | Leave a Comment »