The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 4,262 other subscribers

Archive for the ‘ftfy’ Category

A while ago I bumped into some GPI Mojibake examples, but soon found out I should use the ftfy test cases

Posted by jpluimers on 2022/11/22

I have been into more and more Mojibake example pages like [Wayback] Mojibake: Question Marks, Strange Characters and Other Issues | GPI

Have you ever found strange characters like these ���  when viewing content in applications or websites in other languages?

They made me realise that all these (including the Mojibake examples on my blog) are just artifacts, but the real list of examples is the set of ftfy test cases at [Wayback/Archive.is] python-ftfy/test_cases.json at master · LuminosoInsight/python-ftfy

I got reminded when Waternet moved from paper mail using “Pyreneeën” to email using “Pyreneeën“. Not as bad as Waterschap AGV did earlier: they took it one level further and made “Pyreneeën” out of it, see Last year, a classic Mojibake was introduced when Waterschap Amstel, Gooi en Vecht redesigned their IT systems.

This seems like a trend where newer systems perform worse than older systems. I wonder why that is.

BTW: the trick on the [Wayback/Archive] Python.org shell to run ftfy (which is not installed by default) is first dropping to the shell (see my post How do I drop a bash shell from within Python? – Stack Overflow), then starting python again:

Read the rest of this entry »

Posted in CP850, Development, Encoding, ftfy, ISO-8859, Mojibake, Python, Scripting, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »

Last year, a classic Mojibake was introduced when Waterschap Amstel, Gooi en Vecht redesigned their IT systems

Posted by jpluimers on 2022/03/16

Last year, Waterschap Amstel, Gooi en Vecht sent me a paper letter notifying the yearly water bill was going to be late as they were redesigning their IT systems.

Their letter introduced a classic Mojibake that had not been present in all their older paper letter communication.

  • Street name on a letter via the old IT systems is "Pyreneeën":

    Pyreneeën goed geprint.

  • Street name on a letter via the new IT systems is "Pyreneeën":

    Pyreneeën geprint met Mojibake vervormingen.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, Mojibake, Python, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »

The things I didn’t notice during cancer survival: ftfy 6.0 and more versions got released during my recovery (including the poem “Ode to a Shipping Label”)

Posted by jpluimers on 2022/03/10

When writing this, [Wayback/Archive.is] ftfy · PyPI:history indicates ftfy was already at 6.0.3.

It is still my goto tool for figuring out the cause of Mojibake. I remember writing about it the first time in 2016 (see the ftfy category) when it was already at version 3.0, discovering it after a few Mojibake posts.

By now it even understands right-to-left Mojibake garbage: [Archive.is] Elia Robyn Speer on Twitter: “ftfy 5.8 is out! … A user reported that Hebrew text wasn’t being fixed, and this made me think about how to expand some of the trickier cases to non-Latin alphabets.”

Mojibake mishaps still happen a lot, so by now I hope I will have done a Mojibake themed Delphi talk at one or more conferences.

Read the rest of this entry »

Posted in !!con (bangbangcon), About, Autistic Spectrum/Autism, Cancer, Conference Topics, Conferences, Development, Encoding, Event, ftfy, Mojibake, Personal, Python, Rectum cancer, Scripting, Software Development, Unicode | Leave a Comment »

Encoding is hard… so how did the single quote become a circumflexed a followed by Euro sign and trade mark?

Posted by jpluimers on 2016/10/04

A while ago (in fact more than a year), I posted Encoding is hard…  go G+ with the below picture.

[Wayback] ftfy (“fixes text for you”, a parody on “fixed that for you”) [Wayback] fixes it, but:

How did the single quote become “’“?

Actually, because of a a common “beautification” of many Office suites (Microsoft and Open alike), the single quote was a special one: a Unicode Character ‘RIGHT SINGLE QUOTATION MARK’ (U+2019) which in UTF-8 is encoded as 0xE2 0x80 0x99.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, ISO-8859, ISO8859, Mojibake, Software Development, Unicode, UTF-8, UTF8, Windows-1252 | Leave a Comment »

installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog

Posted by jpluimers on 2016/09/06

Simple if you know it:

pip install ftfy

That installs it as a command which is a lot easier than using it from Github at [Waybackhttps://github.com/LuminosoInsight/python-ftfy

It knows how to solve the encoding issues in [Archive.is]  the future of publishing at W3C explaining about WTF-8 and Unicode history.

It didn’t solve my non-Unicode encoding issue: [Wayback] “v3/43/4r” -> “v¾¾r” -> “vóór”.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, Mojibake, Software Development, Unicode, UTF-8, UTF8 | 4 Comments »