Archive for the ‘Encoding’ Category
Posted by jpluimers on 2022/06/09
Like me, [Archive.is] Kristian Köhntopp is a nerd.
Unlike me, Kris bumped into character encoding issues for just about all his digital life. That started about the same time as mine, but again unlike me: he was way more involved in the technical aspects of it.
First a series of Tweets:
Read the rest of this entry »
Posted in ASCII, C++, Development, Encoding, EPS/PostScript, Font, ISO-8859, ISO8859, Power User, Software Development, Times New Roman | Leave a Comment »
Posted by jpluimers on 2022/03/16
Last year, Waterschap Amstel, Gooi en Vecht sent me a paper letter notifying the yearly water bill was going to be late as they were redesigning their IT systems.
Their letter introduced a classic Mojibake that had not been present in all their older paper letter communication.
- Street name on a letter via the old IT systems is
"Pyreneeën":

- Street name on a letter via the new IT systems is
"Pyreneeën":

Read the rest of this entry »
Posted in Development, Encoding, ftfy, Mojibake, Python, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »
Posted by jpluimers on 2022/03/10
When writing this, [Wayback/Archive.is] ftfy · PyPI:history indicates ftfy was already at 6.0.3.
It is still my goto tool for figuring out the cause of Mojibake. I remember writing about it the first time in 2016 (see the ftfy category) when it was already at version 3.0, discovering it after a few Mojibake posts.
By now it even understands right-to-left Mojibake garbage:
[Archive.is] Elia Robyn Speer on Twitter: “ftfy 5.8 is out! … A user reported that Hebrew text wasn’t being fixed, and this made me think about how to expand some of the trickier cases to non-Latin alphabets.”
Mojibake mishaps still happen a lot, so by now I hope I will have done a Mojibake themed Delphi talk at one or more conferences.
Read the rest of this entry »
Posted in !!con (bangbangcon), About, Autistic Spectrum/Autism, Cancer, Conference Topics, Conferences, Development, Encoding, Event, ftfy, Mojibake, Personal, Python, Rectum cancer, Scripting, Software Development, Unicode | Leave a Comment »
Posted by jpluimers on 2022/02/16
So I can find them back later:
- SMS: Short Message Service. Messages limited to 140 octet (160 7-bit characers, 140 8-bit characters or 70 16-bit characters) sent mainly over the GSM or UMTS mobile networks.
- Concatenated SMS or Multipart SMS. Does work on most devices and most operators. Way to send messages longer than 140 octets. Each part is billed separately.
- MSISDN a number uniquely identifying a subscription in a GSM or a UMTS mobile network. Always starts with country code. Never includes a prefix (like 00 or +).
- SMPP: Short Message Peer-to-Peer.
- HLR: Home Location Register.
An interesting party with some public SMS APIs is MessageBird. You can compare their old and new ones:
Read the rest of this entry »
Posted in Development, Encoding, Software Development | Leave a Comment »
Posted by jpluimers on 2022/02/15
[Wayback] WILT: XML encode a string in .net « Benoit MARTIN’s Weblog:
Always wondered why I couldn’t find a method that would XML encode a string, effectively escaping the 5 illegal characters for XML. There is such a method but its location in the API is not intuitive at all. It’s in the System.Security namespace: [Wayback] SecurityElement.Escape(String) Method (System.Security) | Microsoft Docs
public static string? Escape (string? str);
Its usage is:
tagText = System.Security.SecurityElement.Escape(tagText);
This will escape the 5 characters <, >, &, " and '
–jeroen
Posted in .NET, Development, Encoding, Software Development, XML, XML escapes, XML/XSD | Leave a Comment »
Posted by jpluimers on 2022/02/15
From my Windows XP days (which are long gone), but historically relevant the answer to [Wayback] DELPHI : EEncodingError – Invalid code page on windows xp embedded – Stack Overflow by [Wayback] Remy Lebeau:
The TEncoding.ASCII property uses codepage 20127, which is not installed on XP Embedded by default. You have to install it manually. The TEncoding class does not exist in D2006.
Are you using Indy 10, by chance? It uses TEncoding.ASCII by default for its string encodings. This exact error has been known to occur when using Indy on XP Embedded.
–jeroen
Posted in ASCII, Delphi, Development, Encoding, Power User, Software Development, XP-embedded | Leave a Comment »
Posted by jpluimers on 2022/02/10
I will likely need some of these links in the future:
–jeroen
Posted in Apple, Development, Encoding, Mac, Mac OS X / OS X / MacOS, Power User, Software Development, Unicode | Leave a Comment »
Posted by jpluimers on 2022/02/09
Nowadays, some 35 years after the first Unicode ideas got drafted and 30+ years after the Unicode Consortium saw the light, UTF-8 is served my more than 95% of the web as shown in yesterday’s post UTF-8 web adoption is huge, closing 100%, but only soured up since around 2006..
I mentioned this:
It means that nowadays there is a very small chance you will see mangled characters (what Japanese call mojibake) when you’re surfing the web.
Serving UTF8 does not mean no unicode problems.
Below are some issues that happened not too long ago and still happen. I have reported them to all parties involved through web-care, but no response whatsoever, and this is bad: Unicode support beyond basic ASCII for the below systems are still broken even for relatively simple non-ASCII characters based in diacritics decorating a standard ASCII character.
Yes, I know the realm of encoding and code pages is a mess, especially when handling data in multiple layers of an application stack. That’s why I wrote this post in the first place, and have a whole encoding category of blog posts plus a Mojibake subset.
Read the rest of this entry »
Posted in Communications Development, CP850, Dark Pattern, Development, Encoding, ISO-8859, ISO8859, Mojibake, Software Development, Unicode, User Experience (ux), UTF-16, UTF-8, Windows-1252 | Leave a Comment »
Posted by jpluimers on 2022/02/09
Note: notepad cannot correctly guess the encoding, see the “old new thing”: [Wayback] Some files come up strange in Notepad | The Old New Thing (talking about ANSI a.k.a. Windows-1252, UTF-16LE, UTF-16BE, UTF-8, UTF-7 somewith and some without BOM as Notepad does not understand all permutations)
David Cumps discovered that certain text files come up strange in Notepad. The reason is that Notepad has to edit files in a variety of encodings, and when its back against the wall, sometimes it’s forced to guess.
[Wayback] C# Effective way to find any file’s Encoding – Stack Overflow shows how to detect various byte order marks in C#.
–jeroen
Posted in ASCII, Development, Encoding, Software Development, Unicode, UTF-16, UTF-32, UTF-8, UTF16, UTF32, UTF8 | Leave a Comment »