The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 4,262 other subscribers

Archive for the ‘Mojibake’ Category

I’ve given up on entering non-ASCII characters when entering data on-line

Posted by jpluimers on 2019/06/17

I live in a street that has a non-ASCII character in it: Pyreneeën.

I’ve reverted back to entering the street name as plain ASCII for a simple reason:

Too often the ë gets mangled into encoding gibberish, similar to the é example in [WayBackWhen Good Characters Go Bad: A Guide to Diagnosing Character Display Problems as these characters are very near both in UTF-8 and in the [WayBackUnicode Characters in the Latin-1 Supplement Block:

I’ve seen these encodings, where only the top encoding is correct; the degeneration gets worse moving downwards, a classic Mojibake:

# encoded UTF-8 (hex.)
0 ë 0xC3 0xAB
1 ë 0xC3 0x83 0xC2 0xAB
2 ë 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0xAB
3 ë 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0xAB
4 ë 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x82 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0xAB
5 ë 0x26 0x65 0x75 0x6d 0x6c 0x3b

The last one seldomly happens, the first one relatively often, just like [Archive.is] fd.nl did a while on their finanancial pages.

These mistakes become sort of understandable (but not forgivable) when you look at the below table-fragment (the full table is at[WayBack] Unicode/UTF-8-character table – starting from code position 0080).

Read the rest of this entry »

Posted in Development, Encoding, Mojibake, Power User, Software Development, Unicode, Web Browsers | Leave a Comment »

Do not use non-ASCII characters as identifiers – not all your tools support them well enough

Posted by jpluimers on 2018/04/05

For a very long time I’ve discouraged people from using non-ASCII characters in identifiers. It still holds.

In the past, transliterations messed things up. Even with increased support for Unicode, tools still screw non-ASCII characters up.

Delphi is not alone in this (the most important one is the DFM view as text support), see this report: [RSP-16767] Viewing a form as text fails with non ascii control or event names – Embarcadero Technologies (you need an account for this, but the report is visible for anyone):

Viewing a form as text fails with non ascii control or event names Comment

Steps:

  1. create a new VCL forms application
  2. drop a label onto the form
  3. change the name of that label to lblÜberfall (note the U-umlaut)
  4. switch to view as text
  • exp: DFM content shown as text
  • act: first line is shown incorrectly (see screenhsot)

–jeroen

Source: [RSP-16767] Viewing a form as text fails with non ascii control or event names – Embarcadero Technologies

via: [WayBack] Code of the day – – Thomas Mueller (dummzeuch) – Google+:

function TNameGenerator.StrasseToStrasse(const _Strasse: string): string;
begin
Result := _Strasse;
end;

Strasse := StrasseToStrasse(_Strasse);

Read the rest of this entry »

Posted in ASCII, Conference Topics, Conferences, Delphi, Delphi 10 Seattle, Delphi 10.1 Berlin (BigBen), Delphi 2005, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi XE, Delphi XE2, Delphi XE3, Delphi XE4, Delphi XE5, Delphi XE6, Delphi XE7, Delphi XE8, Development, Encoding, Event, Mojibake, Software Development | Leave a Comment »

Encoding is hard… so how did the single quote become a circumflexed a followed by Euro sign and trade mark?

Posted by jpluimers on 2016/10/04

A while ago (in fact more than a year), I posted Encoding is hard…  go G+ with the below picture.

[Wayback] ftfy (“fixes text for you”, a parody on “fixed that for you”) [Wayback] fixes it, but:

How did the single quote become “’“?

Actually, because of a a common “beautification” of many Office suites (Microsoft and Open alike), the single quote was a special one: a Unicode Character ‘RIGHT SINGLE QUOTATION MARK’ (U+2019) which in UTF-8 is encoded as 0xE2 0x80 0x99.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, ISO-8859, ISO8859, Mojibake, Software Development, Unicode, UTF-8, UTF8, Windows-1252 | Leave a Comment »

installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog

Posted by jpluimers on 2016/09/06

Simple if you know it:

pip install ftfy

That installs it as a command which is a lot easier than using it from Github at [Waybackhttps://github.com/LuminosoInsight/python-ftfy

It knows how to solve the encoding issues in [Archive.is]  the future of publishing at W3C explaining about WTF-8 and Unicode history.

It didn’t solve my non-Unicode encoding issue: [Wayback] “v3/43/4r” -> “v¾¾r” -> “vóór”.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, Mojibake, Software Development, Unicode, UTF-8, UTF8 | 4 Comments »

[NL] encoding blijft moeilijk, waarom toch? (dit keer in een brief van @xs4all)

Posted by jpluimers on 2015/02/24

Hoe moeilijk kan het toch zijn om je encoding goed te doen.

Deze keer uit een brief van xs4all:

Mojibake encoding probleem

Mojibake encoding probleem

Als je een trema in een brief zet, dan controleer je toch even dat die ook goed op de brief wordt afgedrukt?

Read the rest of this entry »

Posted in Development, Encoding, ISO-8859, Mojibake, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »