The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

    • @EefvanKoos Dank je wel. Stapje voor stapje kom ik er wel. 1 hour ago
    • @EefvanKoos Heb een arts die al heel lang goede resultaten aflevert. Duimen dat het bij mij ook goed gaat. 1 hour ago
    • RT @trouw: PvdA-Kamerlid Henk Nijboer stapt per direct uit het presidium, het voorzittend orgaan van de Tweede Kamer. Hij zegt dat de kriti… 1 hour ago
    • RT @locuta: Energieprijsplafond ziet er goed uit. 1,45€ per m3 gas en 0,40€ per kWh. Met limieten van respectievelijk 1200m3 gas en 2900kWh… 1 hour ago
    • RT @DeSpeld: Grootayatollah Khamenei in coma geslagen omdat hij zijn tulband draagt als een slet speld.nl/2022/10/04/gro… 1 hour ago
  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,914 other followers

installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog

Posted by jpluimers on 2016/09/06

Simple if you know it:

pip install ftfy

That installs it as a command which is a lot easier than using it from Github at [Waybackhttps://github.com/LuminosoInsight/python-ftfy

It knows how to solve the encoding issues in [Archive.is]  the future of publishing at W3C explaining about WTF-8 and Unicode history.

It didn’t solve my non-Unicode encoding issue: [Wayback] “v3/43/4r” -> “v¾¾r” -> “vóór”.

That was caused by an infamous Western Latin character set confusion issue, in this case ISO-8859/Windows- versus CP850/CP858 encoding issue (so: no Unicode involved at all, nor CP437 as it doesn’t have ¾).

So [Wayback] I put in a suggestion for ftfy to support finding the above.

PS (20220424): I found back the offending document at [Wayback] g428-1.pdf

–jeroen

via

PS: these manglings are called Mojibake

4 Responses to “installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog”

  1. ruurd said

    Nah. Just use utf-8 instemde of them funky charsets from last century. Stupid Windows…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

 
%d bloggers like this: