The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

    • RT @MarcBerns1: @zeeger Je zou wel gek zijn als je op erfpacht woont en in verduurzaming investeert: wordt meteen je erfpacht verhoogt, kan… 53 minutes ago
    • RT @isotopp: TIL my potato server at home can create a million directories on XFS in 16s, single threaded. Using multiprocessing(10), it t… 54 minutes ago
    • RT @hansdamen: Dat ging snel, het Twitteraccount @MinBijleveld is binnen de kortste keren omgezet in het generieke @DefensieMin. Gemakkelij… 54 minutes ago
    • RT @locuta: OPROEP AAN ALLEN: Kijk vanavond om 20:25 naar 'Alleen Tegen De Staat' op NPO2, documentaire over de Toeslagenaffaire. 56 minutes ago
    • RT @LarsWienand: Ein Kunde in einer Tankstelle gerät wegen der fehlenden Maske in Streit, kommt umgezogen und mit Maske wieder und erschieß… 56 minutes ago
  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,465 other followers

Archive for the ‘ftfy’ Category

Encoding is hard… so how did the single quote become a circumflexed a followed by Euro sign and trade mark?

Posted by jpluimers on 2016/10/04

A while ago (in fact more than a year), I posted Encoding is hard…  go G+ with the below picture.

[Wayback] ftfy (“fixes text for you”, a parody on “fixed that for you”) [Wayback] fixes it, but:

How did the single quote become “’“?

Actually, because of a a common “beautification” of many Office suites (Microsoft and Open alike), the single quote was a special one: a Unicode Character ‘RIGHT SINGLE QUOTATION MARK’ (U+2019) which in UTF-8 is encoded as 0xE2 0x80 0x99.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, ISO-8859, ISO8859, Mojibake, Software Development, Unicode, UTF-8, UTF8, Windows-1252 | Leave a Comment »

installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog

Posted by jpluimers on 2016/09/06

Simple if you know it:

pip install ftfy

That installs it as a command which is a lot easier than using it from Github at [Waybackhttps://github.com/LuminosoInsight/python-ftfy

It knows how to solve the encoding issues in [Archive.is]  the future of publishing at W3C explaining about WTF-8 and Unicode history.

It didn’t solve my non-Unicode encoding issue: [Wayback] “v3/43/4r” -> “v¾¾r” -> “vóór”.

That was caused by an infamous Western Latin character set confusion issue, in this case ISO-8859/Windows- versus CP850/CP858 encoding issue (so: no Unicode involved at all, nor CP437 as it doesn’t have ¾).

So [Wayback] I put in a suggestion for ftfy to support finding the above.

–jeroen

via

PS: these manglings are called Mojibake

Posted in Development, Encoding, ftfy, Mojibake, Software Development, Unicode, UTF-8, UTF8 | 4 Comments »

 
%d bloggers like this: