The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My work

  • My badges

  • Twitter Updates

  • My Flickr Stream




    More Photos
  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,630 other followers

Archive for the ‘Unicode’ Category

installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog

Posted by jpluimers on 2016/09/06

Simple if you know it:

pip install ftfy

That installs it as a command which is a lot easier than using it from Github at

It knows how to solve the encoding issues in ÃƒÆ’ƒâ€šÃ‚ the future of publishing at W3C.

It didn’t solve my non-Unicode encoding issue: “v3/43/4r” -> “v¾¾r” -> “vóór”.

That was caused by an infamous Western Latin character set confusion issue, in this case ISO-8859/Windows- versus CP850/CP858 encoding issue (so: no Unicode involved at all, nor CP437 as it doesn’t have ¾).

So I put in a suggestion for ftfy to support finding the above.



Posted in Development, Encoding, Software Development, Unicode, UTF-8, UTF8 | 2 Comments »

Some interesting encoding/Unicode/text articles on kunststube and links for test files of various encodings

Posted by jpluimers on 2016/08/17

After yesterdays post on Testing and static methods don’t go well together, I read around on Source (kunststube [WayBack]) a bit more and found these very nice articles on encoding,Unicode and text:

Related on those, some other nice readings:


Posted in Ansi, ASCII, CP437/OEM 437/PC-8, Development, EBCDIC, Encoding, ISO-8859, ISO8859, Shift JIS, Software Development, Unicode, UTF-16, UTF-8, UTF16, UTF8, Windows-1252 | Leave a Comment »

Graphical emoji are killing Unicode

Posted by jpluimers on 2016/08/05

Unicode is about Glyphs that are used in writing. Have you ever seen the emoji on the right being written like this?

This has been bothering me a while and gets worse over time.

According to: Microsoft just changed its toy gun emoji to a real pistol:

Looks like Microsoft and Apple may not be on the same page about firearm emojis afterall. Right after Apple changed its gun emoji to a water pistol in iOS 10, Microsoft replaced its toy pistol emoji with an actual revolver.

While Apple and Microsoft have gone back to edit their symbols, Google continues to use a pistol in Android keyboards and doesn’t appear to have plans to change this. None of the companies in question have adjusted their knife, sword, bomb, poison and coffin emojis, so… ¯\_(ツ)_/¯

When vendors start prescribing how emojis must look like (influenced by all sorts of emotions) without the user allowing to choose (via a font – that’s what fonts are for!) how they look then it invalidates the whole Unicode principle:

Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world’s writing systems.

These emoji aren’t text and should be gone from the Unicode standard before they can do more harm.

Will the next step be that vendors define their own colours for certain characters in fonts? For Windows Times New Roman A becomes red, B green, C yellow, but in Courier New we’ll permute these colours and all Operating Systems and Versions will do different random colour choices.



Posted in Development, Encoding, Opinions, Software Development, Unicode | Leave a Comment »

ASCII and Unicode. Love and hate?

Posted by jpluimers on 2015/10/13

So I won’t forget:

Even though this does not work on most USA T-Shirt sites, it works on this Dutch one: T-Shirt Ontwerpen – t-shirt zelf ontwerpen | Spreadshirt.



Read the rest of this entry »

Posted in ASCII, Development, Encoding, Software Development, Unicode | Leave a Comment »

Jon Skeet’s speech “Back to basics” is really a good watch – via Jørn Einar Angeltveit G+

Posted by jpluimers on 2015/07/15

Thanks Jørn Einar Angeltveit for sharing this a while ago:

A session by Jon Skeet and Tony the Pony (which has strong teeth) presented during the Polish DevDay 2013 in Kraków, Poland.

+Jon Skeet’s speech “Back to basics” is really a good watch.

In a funny way, he explains why the simplest fundamentals of computer software text, dates and numbers can cause some real headace for the programmer…

In case you didn’t know: Jon Skeet is “Chuck Norris” on

The subtitle is “the mess we’ve made of our fundamental data types”.

Some of the topics covered:

  • people
  • numbers and storage formats
  • strings and encodings
  • dates, times and time zones
  • scope things narrowly (YAGNI) in a conscious way, and understand beyond what you implement

He for instance shows that the fundamentals are both very much unknown by many among us, and less universal than we think.


via +Jon Skeet’s speech “Back to basics” is really a good watch. In a funny way,….

Read the rest of this entry »

Posted in .NET, C#, Delphi, Development, Encoding, i18n internatiolanization and L10 Localization, Java, Java Platform, Pascal, Scripting, Software Development, Unicode | 2 Comments »

%d bloggers like this: