Get the new [Wayback/Archive] Delphi Thread Safety Patterns eBook at a discount while it is hot:
Use Coupon Code: DTSPATT10 at checkout to get a $10 discount.
This promotional offer is valid through June 14.
Posted by jpluimers on 2022/06/01
Get the new [Wayback/Archive] Delphi Thread Safety Patterns eBook at a discount while it is hot:
Use Coupon Code: DTSPATT10 at checkout to get a $10 discount.
This promotional offer is valid through June 14.
Posted in Delphi, Development, Encoding, ISO-8859, ISO8859, Mojibake, Multi-Threading / Concurrency, Software Development, UTF-8, Windows-1252 | Leave a Comment »
Posted by jpluimers on 2022/03/16
Last year, Waterschap Amstel, Gooi en Vecht sent me a paper letter notifying the yearly water bill was going to be late as they were redesigning their IT systems.
Their letter introduced a classic Mojibake that had not been present in all their older paper letter communication.
"Pyreneeën"
:"Pyreneeën"
:Posted in Development, Encoding, ftfy, Mojibake, Python, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »
Posted by jpluimers on 2022/03/10
When writing this, [Wayback/Archive.is] ftfy · PyPI:history indicates ftfy was already at 6.0.3.
It is still my goto tool for figuring out the cause of Mojibake. I remember writing about it the first time in 2016 (see the ftfy category) when it was already at version 3.0, discovering it after a few Mojibake posts.
By now it even understands right-to-left Mojibake garbage: [Archive.is] Elia Robyn Speer on Twitter: “ftfy 5.8 is out! … A user reported that Hebrew text wasn’t being fixed, and this made me think about how to expand some of the trickier cases to non-Latin alphabets.”
Mojibake mishaps still happen a lot, so by now I hope I will have done a Mojibake themed Delphi talk at one or more conferences.
Posted in !!con (bangbangcon), About, Autistic Spectrum/Autism, Cancer, Conference Topics, Conferences, Development, Encoding, Event, ftfy, Mojibake, Personal, Python, Rectum cancer, Scripting, Software Development, Unicode | Leave a Comment »
Posted by jpluimers on 2022/02/09
Nowadays, some 35 years after the first Unicode ideas got drafted and 30+ years after the Unicode Consortium saw the light, UTF-8 is served my more than 95% of the web as shown in yesterday’s post UTF-8 web adoption is huge, closing 100%, but only soured up since around 2006..
I mentioned this:
It means that nowadays there is a very small chance you will see mangled characters (what Japanese call mojibake) when you’re surfing the web.
Below are some issues that happened not too long ago and still happen. I have reported them to all parties involved through web-care, but no response whatsoever, and this is bad: Unicode support beyond basic ASCII for the below systems are still broken even for relatively simple non-ASCII characters based in diacritics decorating a standard ASCII character.
Yes, I know the realm of encoding and code pages is a mess, especially when handling data in multiple layers of an application stack. That’s why I wrote this post in the first place, and have a whole encoding category of blog posts plus a Mojibake subset.
Posted in Communications Development, CP850, Dark Pattern, Development, Encoding, ISO-8859, ISO8859, Mojibake, Software Development, Unicode, User Experience (ux), UTF-16, UTF-8, Windows-1252 | Leave a Comment »
Posted by jpluimers on 2019/06/17
I live in a street that has a non-ASCII character in it: Pyreneeën.
I’ve reverted back to entering the street name as plain ASCII for a simple reason:
Too often the ë gets mangled into encoding gibberish, similar to the é example in [WayBack] When Good Characters Go Bad: A Guide to Diagnosing Character Display Problems as these characters are very near both in UTF-8 and in the [WayBack] Unicode Characters in the Latin-1 Supplement Block:
I’ve seen these encodings, where only the top encoding is correct; the degeneration gets worse moving downwards, a classic Mojibake:
# encoded UTF-8 (hex.) 0 ë 0xC3 0xAB
1 ë 0xC3 0x83 0xC2 0xAB
2 ë 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0xAB
3 ÃÂë 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0xAB
4 ÃÂÃÂÃÂë 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x82 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0xAB
5 ë 0x26 0x65 0x75 0x6d 0x6c 0x3b
The last one seldomly happens, the first one relatively often, just like [Archive.is] fd.nl did a while on their finanancial pages.
These mistakes become sort of understandable (but not forgivable) when you look at the below table-fragment (the full table is at[WayBack] Unicode/UTF-8-character table – starting from code position 0080).
Posted in Development, Encoding, Mojibake, Power User, Software Development, Unicode, Web Browsers | Leave a Comment »