The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My work

  • My badges

  • Twitter Updates

  • My Flickr Stream




    More Photos
  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,650 other followers

Archive for the ‘Unicode’ Category

Encoding is hard… so how did the single quote become a circumflexed a followed by Euro sign and trade mark?

Posted by jpluimers on 2016/10/04

A while ago (in fact more than a year), I posted Encoding is hard…  go G+ with the below picture.

ftfy (fixes text for you) fixes it, but:

How did the single quote become “’”?

Actually, because of a a common “beautification” of many Office suites (Microsoft and Open alike), the single quote was a special one: a Unicode Character ‘RIGHT SINGLE QUOTATION MARK’ (U+2019) which in UTF-8 is encoded as 0xE2 0x80 0x99.

Read the rest of this entry »

Posted in Development, Encoding, ISO-8859, ISO8859, Software Development, Unicode, UTF-8, UTF8, Windows-1252 | Leave a Comment »

installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog

Posted by jpluimers on 2016/09/06

Simple if you know it:

pip install ftfy

That installs it as a command which is a lot easier than using it from Github at

It knows how to solve the encoding issues in ÃƒÆ’ƒâ€šÃ‚ the future of publishing at W3C.

It didn’t solve my non-Unicode encoding issue: “v3/43/4r” -> “v¾¾r” -> “vóór”.

That was caused by an infamous Western Latin character set confusion issue, in this case ISO-8859/Windows- versus CP850/CP858 encoding issue (so: no Unicode involved at all, nor CP437 as it doesn’t have ¾).

So I put in a suggestion for ftfy to support finding the above.



Posted in Development, Encoding, Software Development, Unicode, UTF-8, UTF8 | 4 Comments »

Some interesting encoding/Unicode/text articles on kunststube and links for test files of various encodings

Posted by jpluimers on 2016/08/17

After yesterdays post on Testing and static methods don’t go well together, I read around on Source (kunststube [WayBack]) a bit more and found these very nice articles on encoding,Unicode and text:

Related on those, some other nice readings:


Posted in Ansi, ASCII, CP437/OEM 437/PC-8, Development, EBCDIC, Encoding, ISO-8859, ISO8859, Shift JIS, Software Development, Unicode, UTF-16, UTF-8, UTF16, UTF8, Windows-1252 | Leave a Comment »

Graphical emoji are killing Unicode

Posted by jpluimers on 2016/08/05

Unicode is about Glyphs that are used in writing. Have you ever seen the emoji on the right being written like this?

This has been bothering me a while and gets worse over time.

According to: Microsoft just changed its toy gun emoji to a real pistol:

Looks like Microsoft and Apple may not be on the same page about firearm emojis afterall. Right after Apple changed its gun emoji to a water pistol in iOS 10, Microsoft replaced its toy pistol emoji with an actual revolver.

While Apple and Microsoft have gone back to edit their symbols, Google continues to use a pistol in Android keyboards and doesn’t appear to have plans to change this. None of the companies in question have adjusted their knife, sword, bomb, poison and coffin emojis, so… ¯\_(ツ)_/¯

When vendors start prescribing how emojis must look like (influenced by all sorts of emotions) without the user allowing to choose (via a font – that’s what fonts are for!) how they look then it invalidates the whole Unicode principle:

Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world’s writing systems.

These emoji aren’t text and should be gone from the Unicode standard before they can do more harm.

Will the next step be that vendors define their own colours for certain characters in fonts? For Windows Times New Roman A becomes red, B green, C yellow, but in Courier New we’ll permute these colours and all Operating Systems and Versions will do different random colour choices.



Posted in Development, Encoding, Opinions, Software Development, Unicode | Leave a Comment »

ASCII and Unicode. Love and hate?

Posted by jpluimers on 2015/10/13

So I won’t forget:

Even though this does not work on most USA T-Shirt sites, it works on this Dutch one: T-Shirt Ontwerpen – t-shirt zelf ontwerpen | Spreadshirt.



Read the rest of this entry »

Posted in ASCII, Development, Encoding, Software Development, Unicode | Leave a Comment »

%d bloggers like this: