All categories

May 2024
M	T	W	T	F	S	S
	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Archive for the ‘Mojibake’ Category

I’ve given up on entering non-ASCII characters when entering data on-line

Posted by jpluimers on 2019/06/17

I live in a street that has a non-ASCII character in it: Pyreneeën.

I’ve reverted back to entering the street name as plain ASCII for a simple reason:

Too often the ë gets mangled into encoding gibberish, similar to the é example in [WayBack] When Good Characters Go Bad: A Guide to Diagnosing Character Display Problems as these characters are very near both in UTF-8 and in the [WayBack] Unicode Characters in the Latin-1 Supplement Block:

UTF-8 0xC3 0xA9: [WayBack] Unicode Character ‘LATIN SMALL LETTER E WITH ACUTE’ (U+00E9)
UTF-8 0xC3 0xAB: [WayBack] Unicode Character ‘LATIN SMALL LETTER E WITH DIAERESIS’ (U+00EB)

I’ve seen these encodings, where only the top encoding is correct; the degeneration gets worse moving downwards, a classic Mojibake:

# encoded UTF-8 (hex.)

0 ë 0xC3 0xAB

1 Ã« 0xC3 0x83 0xC2 0xAB

2 ÃÂ« 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0xAB

3 ÃÂÃÂ« 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0xAB

4 ÃÂÃÂÃÂÃÂ« 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x82 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0xAB

5 ë 0x26 0x65 0x75 0x6d 0x6c 0x3b

#	encoded	UTF-8 (hex.)
0	ë	`0xC3 0xAB`
1	Ã«	`0xC3 0x83 0xC2 0xAB`
2	ÃÂ«	`0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0xAB`
3	ÃÂÃÂ«	`0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0xAB`
4	ÃÂÃÂÃÂÃÂ«	`0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x82 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0xAB`
5	ë	`0x26 0x65 0x75 0x6d 0x6c 0x3b`

The last one seldomly happens, the first one relatively often, just like [Archive.is] fd.nl did a while on their finanancial pages.

These mistakes become sort of understandable (but not forgivable) when you look at the below table-fragment (the full table is at[WayBack] Unicode/UTF-8-character table – starting from code position 0080).

Read the rest of this entry »

Posted in Development, Encoding, Mojibake, Power User, Software Development, Unicode, Web Browsers | Leave a Comment »

Do not use non-ASCII characters as identifiers – not all your tools support them well enough

Posted by jpluimers on 2018/04/05

For a very long time I’ve discouraged people from using non-ASCII characters in identifiers. It still holds.

In the past, transliterations messed things up. Even with increased support for Unicode, tools still screw non-ASCII characters up.

Delphi is not alone in this (the most important one is the DFM view as text support), see this report: [RSP-16767] Viewing a form as text fails with non ascii control or event names – Embarcadero Technologies (you need an account for this, but the report is visible for anyone):

Viewing a form as text fails with non ascii control or event names Comment

Steps:

create a new VCL forms application

drop a label onto the form

change the name of that label to lblÜberfall (note the U-umlaut)

switch to view as text

exp: DFM content shown as text

act: first line is shown incorrectly (see screenhsot)

–jeroen

Source: [RSP-16767] Viewing a form as text fails with non ascii control or event names – Embarcadero Technologies

via: [WayBack] Code of the day – – Thomas Mueller (dummzeuch) – Google+:

function TNameGenerator.StrasseToStrasse(const _Strasse: string): string; begin Result := _Strasse; end;

…

Strasse := StrasseToStrasse(_Strasse);

Read the rest of this entry »

Posted in ASCII, Conference Topics, Conferences, Delphi, Delphi 10 Seattle, Delphi 10.1 Berlin (BigBen), Delphi 2005, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi XE, Delphi XE2, Delphi XE3, Delphi XE4, Delphi XE5, Delphi XE6, Delphi XE7, Delphi XE8, Development, Encoding, Event, Mojibake, Software Development | Leave a Comment »

Encoding is hard… so how did the single quote become a circumflexed a followed by Euro sign and trade mark?

Posted by jpluimers on 2016/10/04

A while ago (in fact more than a year), I posted Encoding is hard… go G+ with the below picture.

[Wayback] ftfy (“fixes text for you”, a parody on “fixed that for you”) [Wayback] fixes it, but:

How did the single quote become “â€™“?

Actually, because of a a common “beautification” of many Office suites (Microsoft and Open alike), the single quote was a special one: a Unicode Character ‘RIGHT SINGLE QUOTATION MARK’ (U+2019) which in UTF-8 is encoded as 0xE2 0x80 0x99.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, ISO-8859, ISO8859, Mojibake, Software Development, Unicode, UTF-8, UTF8, Windows-1252 | Leave a Comment »

installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog

Posted by jpluimers on 2016/09/06

Simple if you know it:

pip install ftfy

That installs it as a command which is a lot easier than using it from Github at [Wayback] https://github.com/LuminosoInsight/python-ftfy

It knows how to solve the encoding issues in [Archive.is] ÃƒÆ’Ã‚Æ’ÃƒÂ¢Ã‚â‚¬Ã‚Å¡ÃƒÆ’Ã‚â€šÃƒâ€šÃ‚Â the future of publishing at W3C explaining about WTF-8 and Unicode history.

It didn’t solve my non-Unicode encoding issue: [Wayback] “v3/43/4r” -> “v¾¾r” -> “vóór”.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, Mojibake, Software Development, Unicode, UTF-8, UTF8 | 4 Comments »

[NL] encoding blijft moeilijk, waarom toch? (dit keer in een brief van @xs4all)

Posted by jpluimers on 2015/02/24

Hoe moeilijk kan het toch zijn om je encoding goed te doen.

Deze keer uit een brief van xs4all:

Mojibake encoding probleem

Als je een trema in een brief zet, dan controleer je toch even dat die ook goed op de brief wordt afgedrukt?

Read the rest of this entry »

Posted in Development, Encoding, ISO-8859, Mojibake, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »

« Previous Entries

Next Entries »

	jpluimers on Ookla speedtest CLI for Window…
	Mateusz on Now that XE8 is out, some Turb…
	jpluimers on Some links that might help use…
	jpluimers on Hidden Features in Delphi rela…
	jpluimers on Watching “Why is C# Evol…

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

My Flickr Stream

Pages

All categories

Email Subscription

Archive for the ‘Mojibake’ Category

I’ve given up on entering non-ASCII characters when entering data on-line

Do not use non-ASCII characters as identifiers – not all your tools support them well enough

Encoding is hard… so how did the single quote become a circumflexed a followed by Euro sign and trade mark?

installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog

[NL] encoding blijft moeilijk, waarom toch? (dit keer in een brief van @xs4all)

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

My Flickr Stream

Pages

All categories

Email Subscription

Archive for the ‘Mojibake’ Category

I’ve given up on entering non-ASCII characters when entering data on-line

Rate this:

Share this:

Do not use non-ASCII characters as identifiers – not all your tools support them well enough

Rate this:

Share this:

Encoding is hard… ﻿so how did the single quote become a circumflexed a followed by Euro sign and trade mark?

Rate this:

Share this:

installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog

Rate this:

Share this:

[NL] encoding blijft moeilijk, waarom toch? (dit keer in een brief van @xs4all)

Rate this:

Share this:

Encoding is hard… so how did the single quote become a circumflexed a followed by Euro sign and trade mark?