The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,909 other followers

The things I didn’t notice during cancer survival: ftfy 6.0 and more versions got released during my recovery

Posted by jpluimers on 2022/03/10

When writing this, [Wayback/Archive.is] ftfy · PyPI:history indicates ftfy was already at 6.0.3.

It is still my goto tool for figuring out the cause of Mojibake. I remember writing about it the first time in 2016 (see the ftfy category) when it was already at version 3.0, discovering it after a few Mojibake posts.

By now it even understands right-to-left Mojibake garbage: [Archive.is] Elia Robyn Speer on Twitter: “ftfy 5.8 is out! … A user reported that Hebrew text wasn’t being fixed, and this made me think about how to expand some of the trickier cases to non-Latin alphabets.”

Mojibake mishaps still happen a lot, so by now I hope I will have done a Mojibake themed Delphi talk at one or more conferences.

It means that the “still recovering while writing this” has changed into “up and running after having recovered”.

The working title is “Let's talk about 文字化け (Mojibake: how to not handle Unicode and other encodings)”.

 

As part of that talk, I plan to run all of my Mojibake posts through it.

It should work, as ftfy already fixed the [Archive.is] “Ode to a Shipping Label” in 2014:

ODE TO A SHIPPING LABEL
Once there was a little o,
with an accent on top like só.

It started out as UTF8,
(universal since '98),
but the program only knew latin1,
and changed little ó to "ó" for fun.

A second program saw the "ó"
and said "I know HTML entity!"
So "ó" was smartened to "ó"
and passed on through happily.

Another program saw the tangle
(more precisely, ampersands to mangle)
and thus the humble "ó"
became "ó"

See?

    >>> ftfy.fix_text('López')
    'López'

 

A few links:

Oh, the !!con (bangbangcon) filled with excellent 10-minute talks is everywhere:

Playing around with ftfy

Oh, the trick to play around with ftfy on the [Archive.is] Welcome to Python.org: interactive Python shell is to run these commands:

  1. From within Python, drop down to the sh shell:
    import os
    os.system('sh')
  2. From the shell, import ftfy:
    pip3 install ftfy
  3. From the shell, start a new Python instance:
    python

Example:

Python 3.9.5 (default, May 27 2021, 19:45:35) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ftfy
Traceback (most recent call last):
  File "", line 1, in 
ModuleNotFoundError: No module named 'ftfy'
>>> import sys
>>> print(sys.path)
['', '/usr/local/lib/python39.zip', '/usr/local/lib/python3.9', '/usr/local/lib/python3.9/lib-dynload', '/usr/local/lib/python3.9/site-packages']
>>> import os
>>> os.system('sh')
$ pip3 install ftfy
Defaulting to user installation because normal site-packages is not writeable
Looking in links: /usr/share/pip-wheels
Collecting ftfy
  Downloading ftfy-6.0.3.tar.gz (64 kB)
     |████████████████████████████████| 64 kB 4.2 MB/s 
Requirement already satisfied: wcwidth in /usr/local/lib/python3.9/site-packages (from ftfy) (0.2.5)
Building wheels for collected packages: ftfy
  Building wheel for ftfy (setup.py) ... done
  Created wheel for ftfy: filename=ftfy-6.0.3-py3-none-any.whl size=41913 sha256=949203f29e2bf608be70310d3dd571ec8052101111ef66ae348d4000c51f6ef4
  Stored in directory: /home/.anon-4bb0da142f894cc484ad9625/.cache/pip/wheels/3d/ee/4b/03a4e2e591ea56687aff999edc83827a2ace523baab75b8e41
Successfully built ftfy
Installing collected packages: ftfy
Successfully installed ftfy-6.0.3
$ python
Python 3.9.5 (default, May 27 2021, 19:45:35) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ftfy
>>> ftfy.fix_text("What the h—ck happened to this text?")
'What the h—ck happened to this text?'

–jeroen


 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

 
%d bloggers like this: