The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,860 other subscribers

Archive for the ‘UTF8’ Category

Encoding is hard… so how did the single quote become a circumflexed a followed by Euro sign and trade mark?

Posted by jpluimers on 2016/10/04

A while ago (in fact more than a year), I posted Encoding is hard…  go G+ with the below picture.

[Wayback] ftfy (“fixes text for you”, a parody on “fixed that for you”) [Wayback] fixes it, but:

How did the single quote become “’“?

Actually, because of a a common “beautification” of many Office suites (Microsoft and Open alike), the single quote was a special one: a Unicode Character ‘RIGHT SINGLE QUOTATION MARK’ (U+2019) which in UTF-8 is encoded as 0xE2 0x80 0x99.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, ISO-8859, ISO8859, Mojibake, Software Development, Unicode, UTF-8, UTF8, Windows-1252 | Leave a Comment »

installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog

Posted by jpluimers on 2016/09/06

Simple if you know it:

pip install ftfy

That installs it as a command which is a lot easier than using it from Github at [Waybackhttps://github.com/LuminosoInsight/python-ftfy

It knows how to solve the encoding issues in [Archive.is]  the future of publishing at W3C explaining about WTF-8 and Unicode history.

It didn’t solve my non-Unicode encoding issue: [Wayback] “v3/43/4r” -> “v¾¾r” -> “vóór”.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, Mojibake, Software Development, Unicode, UTF-8, UTF8 | 4 Comments »

Some interesting encoding/Unicode/text articles on kunststube and links for test files of various encodings

Posted by jpluimers on 2016/08/17

After yesterdays post on Testing and static methods don’t go well together, I read around on Source (kunststube [WayBack]) a bit more and found these very nice articles on encoding,Unicode and text:

Related on those, some other nice readings:

–jeroen

Posted in Ansi, ASCII, CP437/OEM 437/PC-8, Development, EBCDIC, Encoding, ISO-8859, ISO8859, Shift JIS, Software Development, Unicode, UTF-16, UTF-8, UTF16, UTF8, Windows-1252 | Leave a Comment »

[NL] encoding blijft moeilijk, waarom toch? (dit keer in een brief van @xs4all)

Posted by jpluimers on 2015/02/24

Hoe moeilijk kan het toch zijn om je encoding goed te doen.

Deze keer uit een brief van xs4all:

Mojibake encoding probleem

Mojibake encoding probleem

Als je een trema in een brief zet, dan controleer je toch even dat die ook goed op de brief wordt afgedrukt?

Read the rest of this entry »

Posted in Development, Encoding, ISO-8859, Mojibake, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »

Great Unicode presentation by

Posted by jpluimers on 2015/01/21

Stefan Heymann did a great presentation Character Sets and Unicode in Firebird at fbcon11. About 90% of it is not about Firebird, but about Unicode: a highly recommended presentation.

There is also a PDF version of the same presentation for easier reading/searching.

If you like Firebird, there is a whole bunch of Firebird related presentations from various authors shared by MindTheBird.

–jeroen

Posted in Ansi, Database Development, Development, Encoding, Firebird, ISO-8859, ISO8859, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »

Delphi hinting directives: deprecated, experimental, library and platform

Posted by jpluimers on 2014/10/01

I’ve been experimenting with the Delphi hinting directives lately to make it easier to migrate some libraries to newer versions of Delphi and newer platforms.

Hinting directives (deprecated, experimental, library and platform) were – like the $MESSAGE directive – added to Delphi 6.

Up to Delphi 5 you didn’t have any means to declare code obsolete. You had to find clever ways around it.

Warnings for hinting directives

When referring to identifiers marked with a hinting directive, you can get various warning messages that depend on the kind of identifier: unit, or other symbol. Read the rest of this entry »

Posted in Apple Pascal, Borland Pascal, DEC Pascal, Delphi, Delphi 2005, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi 6, Delphi 7, Delphi 8, Delphi XE, Delphi XE2, Delphi XE3, Delphi XE4, Development, Encoding, FreePascal, ISO-8859, ISO8859, Java, Lazarus, MQ Message Queueing/Queuing, QC, Reflection, Software Development, Sybase, Unicode, UTF-8, UTF8 | 2 Comments »

Recommended reads when dealing with Character Encodings in software

Posted by jpluimers on 2014/05/06

Apart from the mandatory Joel on Software article about Unicode and Character sets, these two articles are of great value too:

Fun to read from that blog is the Historical Technology  section including this article:

–jeroen

PS: The mandatory one is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) – Joel on Software.

 

Posted in .NET, Ansi, ASCII, CP437/OEM 437/PC-8, Delphi, Development, EBCDIC, Encoding, ISO-8859, ISO8859, Shift JIS, Software Development, Unicode, UTF-8, UTF8, Windows-1252 | Leave a Comment »

Some Unicode links

Posted by jpluimers on 2013/12/30

I see a lot of programmers struggle with Unicode and think it is difficult as getting the encoding decoding hassle right can take quite a bit of effort. There is a lot of fun in using Unicode as well, as the number of code points (in laymen speak: characters) is huge and the Unicode code points are well organized into various planes (or blocks) with related code points. I like Charbase: A visual unicode database a lot especially as they have pictograms of all code points that always show a picture, even if you don’t have a font that your browser can use to display the character belonging to the code point. Here are a few links from to characters and blocks of characters in their database that I like a lot: Read the rest of this entry »

Posted in Development, Encoding, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »

▶ Characters, Symbols and the Unicode Miracle – Computerphile – YouTube

Posted by jpluimers on 2013/12/23

Brilliant:

UTF-8 explained in less than 9 minutes.

The diagram fits almost on the back of a napkin, so he explained it with a big marker on classic 132-column fan fold green-bar continuous form  paper (we used to call it zebra-paper).

I’m definitely going to follow the Compuerphile videos and watch more of them.

Definitely a great addition to my UTF-8 posting category.

Thanks for everyone that pointed me to this video!

–jeroen 

via: ▶ Characters, Symbols and the Unicode Miracle – Computerphile – YouTube.

Posted in Development, Encoding, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »

Delphi XE3/XE4: removing empty .VLB files; XE5 update 2 and special offers are out. #codingindelphi

Posted by jpluimers on 2013/12/19

Even when not using Visual Live Binding, Delphi generates empty .VLB files in both Delphi XE3 (virtually always) and Delphi XE4 (most of the time).

Visual Live Binding is one way of binding data to UI in FireMonkey and can also be used in VCL, but does not have to (Alister Christie made a nice video ▶ Delphi Training Tutorial #77 – Visual Live Bindings – YouTube about it).

Empty VLB files, and a batch file to delete them

The “empty” VLB files are almost empty, as they are exactly 3 bytes long and contain the byte sequence EF BB BF which is the Unicode BOM (byte order mark) for the UTF-8 encoding. Read the rest of this entry »

Posted in Delphi, Delphi XE3, Delphi XE4, Delphi XE5, Development, Encoding, QC, Software Development, Unicode, UTF-8, UTF8 | Tagged: , | Leave a Comment »