I live in a street that has a non-ASCII character in it: Pyreneeën.
I’ve reverted back to entering the street name as plain ASCII for a simple reason:
Too often the ë gets mangled into encoding gibberish, similar to the é example in [WayBack] When Good Characters Go Bad: A Guide to Diagnosing Character Display Problems as these characters are very near both in UTF-8 and in the [WayBack] Unicode Characters in the Latin-1 Supplement Block:
I’ve seen these encodings, where only the top encoding is correct; the degeneration gets worse moving downwards, a classic Mojibake:
| # |
encoded |
UTF-8 (hex.) |
| 0 |
ë |
0xC3 0xAB |
| 1 |
ë |
0xC3 0x83 0xC2 0xAB |
| 2 |
ë |
0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0xAB |
| 3 |
ÃÂë |
0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0xAB |
| 4 |
ÃÂÃÂÃÂë |
0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x82 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0xAB |
| 5 |
ë |
0x26 0x65 0x75 0x6d 0x6c 0x3b |
The last one seldomly happens, the first one relatively often, just like [Archive.is] fd.nl did a while on their finanancial pages.
These mistakes become sort of understandable (but not forgivable) when you look at the below table-fragment (the full table is at[WayBack] Unicode/UTF-8-character table – starting from code position 0080).
Read the rest of this entry »