The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,860 other subscribers

Archive for the ‘Mojibake’ Category

Do not use non-ASCII characters as identifiers – not all your tools support them well enough

Posted by jpluimers on 2018/04/05

For a very long time I’ve discouraged people from using non-ASCII characters in identifiers. It still holds.

In the past, transliterations messed things up. Even with increased support for Unicode, tools still screw non-ASCII characters up.

Delphi is not alone in this (the most important one is the DFM view as text support), see this report: [RSP-16767] Viewing a form as text fails with non ascii control or event names – Embarcadero Technologies (you need an account for this, but the report is visible for anyone):

Viewing a form as text fails with non ascii control or event names Comment

Steps:

  1. create a new VCL forms application
  2. drop a label onto the form
  3. change the name of that label to lblÜberfall (note the U-umlaut)
  4. switch to view as text
  • exp: DFM content shown as text
  • act: first line is shown incorrectly (see screenhsot)

–jeroen

Source: [RSP-16767] Viewing a form as text fails with non ascii control or event names – Embarcadero Technologies

via: [WayBack] Code of the day – – Thomas Mueller (dummzeuch) – Google+:

function TNameGenerator.StrasseToStrasse(const _Strasse: string): string;
begin
Result := _Strasse;
end;

Strasse := StrasseToStrasse(_Strasse);

Read the rest of this entry »

Posted in ASCII, Conference Topics, Conferences, Delphi, Delphi 10 Seattle, Delphi 10.1 Berlin (BigBen), Delphi 2005, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi XE, Delphi XE2, Delphi XE3, Delphi XE4, Delphi XE5, Delphi XE6, Delphi XE7, Delphi XE8, Development, Encoding, Event, Mojibake, Software Development | Leave a Comment »

Encoding is hard… so how did the single quote become a circumflexed a followed by Euro sign and trade mark?

Posted by jpluimers on 2016/10/04

A while ago (in fact more than a year), I posted Encoding is hard…  go G+ with the below picture.

[Wayback] ftfy (“fixes text for you”, a parody on “fixed that for you”) [Wayback] fixes it, but:

How did the single quote become “’“?

Actually, because of a a common “beautification” of many Office suites (Microsoft and Open alike), the single quote was a special one: a Unicode Character ‘RIGHT SINGLE QUOTATION MARK’ (U+2019) which in UTF-8 is encoded as 0xE2 0x80 0x99.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, ISO-8859, ISO8859, Mojibake, Software Development, Unicode, UTF-8, UTF8, Windows-1252 | Leave a Comment »

installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog

Posted by jpluimers on 2016/09/06

Simple if you know it:

pip install ftfy

That installs it as a command which is a lot easier than using it from Github at [Waybackhttps://github.com/LuminosoInsight/python-ftfy

It knows how to solve the encoding issues in [Archive.is]  the future of publishing at W3C explaining about WTF-8 and Unicode history.

It didn’t solve my non-Unicode encoding issue: [Wayback] “v3/43/4r” -> “v¾¾r” -> “vóór”.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, Mojibake, Software Development, Unicode, UTF-8, UTF8 | 4 Comments »

[NL] encoding blijft moeilijk, waarom toch? (dit keer in een brief van @xs4all)

Posted by jpluimers on 2015/02/24

Hoe moeilijk kan het toch zijn om je encoding goed te doen.

Deze keer uit een brief van xs4all:

Mojibake encoding probleem

Mojibake encoding probleem

Als je een trema in een brief zet, dan controleer je toch even dat die ook goed op de brief wordt afgedrukt?

Read the rest of this entry »

Posted in Development, Encoding, ISO-8859, Mojibake, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »

Foute foutmelding @heldenvannu (inschrijving: Pakketten | HELDEN VAN . NU)

Posted by jpluimers on 2013/03/29

Als je postcode “1060 NP” invult bij aansluiting en “1170 AB” bij facturatie, dan krijg je deze onterechte foutmeldingen:

  • Organisatie postcode is ongeldig
  • Facturatie gegevens postcode is ongeldig

Beetje vreemd, want al sinds de introductie van postcodes in Nederland in 1978 zit in een postcode 1 spatie, en tussen de postcode en de woonplaats 2 spaties.

Ook trema‘s gaan mis: bij postcode 1060 NP hoort de straat Pyreneeën in Amsterdam, maar bij Heldenvan.nu wordt het deze Mojibake:

Read the rest of this entry »

Posted in Development, Encoding, Mojibake, Opinions, Power User, Software Development, Unicode | Leave a Comment »

StUF – receiving data from a provider where UTF-8 is in fact ISO-8859

Posted by jpluimers on 2009/05/08

Recently when receiving information from a StUF webservice created by a large Dutch provider of government IT systems, we had an issue with characters having their high bit set.

Although the web-service pretended to send their information as UTF-8, in fact they were encoding using a form of ISO_8859.

The most likely character set they used is ISO-8859-1 (since that is the default encoding for the HTTP protocol), but it might also be ISO-8859-15 which is an adaption of ISO-8859-1 trading some typographic characters for the euro-sign and some characters from French and some characters used for transliteration of  Russian, Finnish and Estonian.
(note that the printable characters of both ISO-8859-1 and ISO-8859-15 can be displayed by the Windows-1252 code page)

Since it is not possible to reliably “guess” the right encoding (there are way to many possibilities, even IsTextUnicode that is used by Notepad fails, see below), the only way is to use a fixed reencoding that depends on the StUF data provider. Read the rest of this entry »

Posted in Development, Encoding, ISO-8859, ISO8859, Mojibake, Software Development, The Old New Thing, Unicode, UTF-8, UTF8, Windows Development, XML, XML/XSD | 5 Comments »