installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog
Posted by jpluimers on 2016/09/06
Simple if you know it:
pip install ftfy
That installs it as a command which is a lot easier than using it from Github at [Wayback] https://github.com/LuminosoInsight/python-ftfy
It knows how to solve the encoding issues in [Archive.is]  the future of publishing at W3C explaining about WTF-8 and Unicode history.
It didn’t solve my non-Unicode encoding issue: [Wayback] “v3/43/4r” -> “v¾¾r” -> “vóór”.
That was caused by an infamous Western Latin character set confusion issue, in this case ISO-8859–/Windows- versus CP850/CP858 encoding issue (so: no Unicode involved at all, nor CP437 as it doesn’t have ¾).
So [Wayback] I put in a suggestion for ftfy to support finding the above.
PS (20220424): I found back the offending document at [Wayback] g428-1.pdf
–jeroen
via
- [Wayback] ftfy (fixes text for you) version 3.0 | Luminoso Blog.
- [Wayback] Which encoding failure did encode “vóór” into “v3/43/4r”? – Stack Overflow.
- [Archive.is] The WTF-8 encoding | Hacker News.
PS: these manglings are called Mojibake
ruurd said
Nah. Just use utf-8 instemde of them funky charsets from last century. Stupid Windows…
jpluimers said
I wish encoding mix-ups were Windows only, but you forgot to read the WTF-8 encoding link https://news.ycombinator.com/item?id=9611710
ruurd said
Yay. But I found something totally different while reading this: https://github.com/mrThe/to_nil
jpluimers said
I love that repo!