The things I didn’t notice during cancer survival: ftfy 6.0 and more versions got released during my recovery
Posted by jpluimers on 2022/03/10
When writing this, [Wayback/Archive.is] ftfy · PyPI:history indicates ftfy was already at 6.0.3.
It is still my goto tool for figuring out the cause of Mojibake. I remember writing about it the first time in 2016 (see the ftfy category) when it was already at version 3.0, discovering it after a few Mojibake posts.
By now it even understands right-to-left Mojibake garbage: [Archive.is] Elia Robyn Speer on Twitter: “ftfy 5.8 is out! … A user reported that Hebrew text wasn’t being fixed, and this made me think about how to expand some of the trickier cases to non-Latin alphabets.”
Mojibake mishaps still happen a lot, so by now I hope I will have done a Mojibake themed Delphi talk at one or more conferences.
It means that the “still recovering while writing this” has changed into “up and running after having recovered”.
The working title is “Let's talk about 文字化け (Mojibake: how to not handle Unicode and other encodings)”.
As part of that talk, I plan to run all of my Mojibake posts through it.
It should work, as ftfy already fixed the [Archive.is] “Ode to a Shipping Label” in 2014:
ODE TO A SHIPPING LABEL Once there was a little o, with an accent on top like só. It started out as UTF8, (universal since '98), but the program only knew latin1, and changed little ó to "ó" for fun. A second program saw the "ó" and said "I know HTML entity!" So "ó" was smartened to "ó" and passed on through happily. Another program saw the tangle (more precisely, ampersands to mangle) and thus the humble "ó" became "ó"
See?
>>> ftfy.fix_text('López') 'López'
A few links:
- [Archive.is] Elia Robyn Speer on Twitter: “Though the timing is bittersweet, I’m very excited to release version 6.0 of my Unicode-fixing library for Python, “ftfy”. It’s gained some new abilities, and I think the best way to appreciate it is through its shiny new documentation page: …”
- Repository: [Wayback/Archive.is] LuminosoInsight/python-ftfy: Fixes mojibake and other glitches in Unicode text, after the fact. (and the Archive.is archival back when I saved at version 3.0)
- Documentation: [Wayback/Archive.is] Home – ftfy: fixes text for you
- Mojibake talk:
- [Archive.is] Elia Robyn Speer on Twitter: “I’m presenting a lightning talk on mojibake and how to fix it at !!Con, on May 17!… “
- [Wayback/Archive.is] python-ftfy/ftfy talk.ipynb at master · LuminosoInsight/python-ftfy
- [Archive.is] Elia Robyn Speer on Twitter: “In about an hour I’m going to be giving my !!Con talk about ftfy (“fixes text for you”) and mojibake! If you want to follow along with the presentation as a Jupyter notebook: …”
- Ode to a Shipping Label:
- [Wayback/Archive.is] It even successfully un-mangles the shipping label from Ode to a Shipping Label:… | Hacker News
- [Archive.is] Ode to a Shipping Label : ProgrammerHumor
- [Archive.is] Elia Robyn Speer on Twitter: “I’ve just noticed that the shipping label in “Ode to a Shipping Label”, a little poem of mojibake lore, gets fixed automatically by ftfy 5.7 or later:
>>> ftfy.fix_text("LóPEZ") 'LóPEZ'
… “
- Keeping track of ftfy activity: [Archive.is] @r_speer ftfy – Twitter Search
- I need to check out Jupyter: [Archive.is] Elia Robyn Speer on Twitter: “I didn’t realize until now that Jupyter has a presentation mode built in you’re telling me I can turn my hacky demo code _directly_ into slides? I don’t have to keep trying to explain to Google Slides what code is? heck yeah let’s DO this”
Oh, the !!con (bangbangcon) filled with excellent 10-minute talks is everywhere:
- Youtube: [Archive.is] Youtube !!con channel (it sometimes takes a while for the playlist of the most recent conference to appear)
- Website: [Wayback] The joy, excitement, and surprise of computing – !!Con 2021
- Github: [Wayback/Archive.is] bangbangcon/bangbangcon.github.io
- Twitter: [Archive.is] !!Con (@bangbangcon) | Twitter
Playing around with ftfy
Oh, the trick to play around with ftfy
on the [Archive.is] Welcome to Python.org: interactive Python shell is to run these commands:
- From within Python, drop down to the
sh
shell:
import os os.system('sh')
- From the shell, import
ftfy
:
pip3 install ftfy
- From the shell, start a new Python instance:
python
Example:
Python 3.9.5 (default, May 27 2021, 19:45:35) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import ftfy Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'ftfy' >>> import sys >>> print(sys.path) ['', '/usr/local/lib/python39.zip', '/usr/local/lib/python3.9', '/usr/local/lib/python3.9/lib-dynload', '/usr/local/lib/python3.9/site-packages'] >>> import os >>> os.system('sh') $ pip3 install ftfy Defaulting to user installation because normal site-packages is not writeable Looking in links: /usr/share/pip-wheels Collecting ftfy Downloading ftfy-6.0.3.tar.gz (64 kB) |████████████████████████████████| 64 kB 4.2 MB/s Requirement already satisfied: wcwidth in /usr/local/lib/python3.9/site-packages (from ftfy) (0.2.5) Building wheels for collected packages: ftfy Building wheel for ftfy (setup.py) ... done Created wheel for ftfy: filename=ftfy-6.0.3-py3-none-any.whl size=41913 sha256=949203f29e2bf608be70310d3dd571ec8052101111ef66ae348d4000c51f6ef4 Stored in directory: /home/.anon-4bb0da142f894cc484ad9625/.cache/pip/wheels/3d/ee/4b/03a4e2e591ea56687aff999edc83827a2ace523baab75b8e41 Successfully built ftfy Installing collected packages: ftfy Successfully installed ftfy-6.0.3 $ python Python 3.9.5 (default, May 27 2021, 19:45:35) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import ftfy >>> ftfy.fix_text("What the h—ck happened to this text?") 'What the h—ck happened to this text?'
–jeroen
Leave a Reply