In this day and age, web sites with delivery back-ends still have Unicode issues: at least @Woonveilig, @Medireva and @PostNL still have trouble

February 2022
M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28

Posted by jpluimers on 2022/02/09

Nowadays, some 35 years after the first Unicode ideas got drafted and 30+ years after the Unicode Consortium saw the light, UTF-8 is served my more than 95% of the web as shown in yesterday’s post UTF-8 web adoption is huge, closing 100%, but only soured up since around 2006..

I mentioned this:

It means that nowadays there is a very small chance you will see mangled characters (what Japanese call mojibake) when you’re surfing the web.

Serving UTF8 does not mean no unicode problems.

Below are some issues that happened not too long ago and still happen. I have reported them to all parties involved through web-care, but no response whatsoever, and this is bad: Unicode support beyond basic ASCII for the below systems are still broken even for relatively simple non-ASCII characters based in diacritics decorating a standard ASCII character.

Yes, I know the realm of encoding and code pages is a mess, especially when handling data in multiple layers of an application stack. That’s why I wrote this post in the first place, and have a whole encoding category of blog posts plus a Mojibake subset.

Woonveilig back-end serving non-ASCII while the front-end limits to an ASCII-subset

This is the series of tweets:

[Archive.is] Jeroen Wiert Pluimers on Twitter: “Grappig als je op de shop van @WoonVeilig een juiste postcode en huisnummer invult, maar je straatnaam een diakriet zoals een trema of accent bevat. "De Straatnaam bevat onjuiste tekens." Nee hoor, jullie front-end snapt jullie back-end response niet. … “
[Archive.is] Jeroen Wiert Pluimers on Twitter: “Wat andere straatnamen die ook mis gaan staan in … Kennelijk zijn Woonveilig en @MediReva niet alleen met systemen die problemen hebben met diakrieten in straatnamen.”
- [Wayback] Stempassen Rotterdam foutgedrukt: ä, é en ï verdwenen – Rijnmond
  
  Een deel van de Rotterdamse stempassen voor het Oekraïne-referendum is foutgedrukt. Bij de straatnamen en locaties van de stembureaus zijn letters als de ï, ä en de é veranderd in onleesbare tekens.
  
  De zogenoemde diakrieten, zoals het trema en het accent aigu, zijn verkeerd afgedrukt. In totaal zijn 13.558 stempassen mislukt.
  
  Onder andere bij Buurthuis De Mozaïek, Wijkcentrum Oriënt, Pniël en basisschool Minister Marga Klompé zijn de tekens verdwenen. Ook Isaäc Hubertstraat, Rösener Manzstraat en Hammarskjöldplaats missen hun diakrieten.
[Archive.is] Jeroen Wiert Pluimers on Twitter: “Volgens het BAG (Basisregistratie Adressen en Gebouwen) is heel UTF-8 mogelijk voor straatnamen, dus horen systemen dat gewoon aan te kunnen en daarop in de hele keten getest te worden. De pagina .. bevat een kort stukje uit … hierover.”

I summarised what happened in Dutch at [Wayback/Archive.is] Woonveilig postcode check geeft straatnaam met diakriet, maar user-interface validatie keurt dat af.:

Postcode/huisnummer URL:

[Wayback/Archive.is] https://www.woonveilig.nl/engine/woonveilig__website__shop__order__address_c?building=1&zip=2717+BA

Validation result:

{"street":"C\u00e9sar Franckrode","houseNumber":1,"houseNumberAddition":"","postcode":"2717BA","city":"Zoetermeer","municipality":"Zoetermeer","province":"Zuid-Holland","rdX":91967,"rdY":453852,"latitude":52.06936633,"longitude":4.46788106,"bagNumberDesignationId":"0637200000256226","bagAddressableObjectId":"0637010000256227","addressType":"building","purposes":["residency"],"surfaceArea":103,"houseNumberAdditions":[""]}

Validation script URL:

[Wayback/Archive.is] https://www.woonveilig.nl/ENGINE/JAVASCRIPTS/WOONVEILIG/validation_c.js

Validation script fragment:

   this.setStreetNameBlur = function(element_o)
    {
        var valid_b = this.setValidationByRegExp(element_o, '^[A-Za-z\\. ]+$', 'De Straatnaam bevat onjuiste tekens.');

        if (valid_b)
            element_o.value = element_o.value.charAt(0).toUpperCase() + element_o.value.toLowerCase().substr(1);

In other words: the back-end correctly returns Unicode, but the frontend limits input to uppercase A-Z, lowercase a-z, period and space. Inconvenient, because technically speaking, Dutch street names are allowed to use any printable Unicode code point.

An example is again in Zoetermeer with the valid street César Franckrode (not my street: I don’t live in Zoetermeer), which the back-end correctly returns as escaped Unicode characters and also ends up with "De Straatnaam bevat onjuiste tekens." in the user interface. [Wayback/Archive.is] www.woonveilig.nl/engine/woonveilig__website__shop__order__address_c?building=1&zip=2717+BA

{"street":"C\u00e9sar Franckrode","houseNumber":1,"houseNumberAddition":"","postcode":"2717BA","city":"Zoetermeer","municipality":"Zoetermeer","province":"Zuid-Holland","rdX":91967,"rdY":453852,"latitude":52.06936633,"longitude":4.46788106,"bagNumberDesignationId":"0637200000256226","bagAddressableObjectId":"0637010000256227","addressType":"building","purposes":["residency"],"surfaceArea":103,"houseNumberAdditions":[""]}

Failing not just with non-ASCII printables

The same results for a different street, but in the city of Utrecht. Back in the days, the city replaced spaces by apostrophes: [Wayback/Archive.is] www.woonveilig.nl/engine/woonveilig__website__shop__order__address_c?building=1&zip=3543+EL

{"street":"Nat'King'Colestraat","houseNumber":1,"houseNumberAddition":"","postcode":"3543EL","city":"Utrecht","municipality":"Utrecht","province":"Utrecht","rdX":131693,"rdY":457113,"latitude":52.1017777,"longitude":5.04704835,"bagNumberDesignationId":"0344200000183532","bagAddressableObjectId":"0344010000180443","addressType":"building","purposes":["residency"],"surfaceArea":142,"houseNumberAdditions":[""]}

You guessed it: "De Straatnaam bevat onjuiste tekens."

Where Utrecht probably tried to avoid spaces in a street name thereby offending late Nat King Cole and his family by renaming him into Nat’King’Cole (no high hopes that WordPress refrains from messing with the apostrophes here). The sign with the street name was correct though: the people making it did have sense. After a complaint the street name was spelled incorrectly and didn’t match the signage, the city council indicated they would change the signage. [Archive.is] Google Street View indicated that in 2015 this had not happened yet:

Nat King Colestraat signage in Utrecht which on-line incorrectly is named Nat’King’Colestraat

Basically Utrecht traded a space problem with a quoting problem, waiting for a system that correctly works aruond the quoting problem. That will be a challenge, for instance with a street name like “d'Argtagnanstraat“. In that sense, Almere does better with [Wayback] Postcode Nat King Colestraat in Almere – Postcode bij adres.

Note that when writing this, I knew that there was a [Wayback] “d'Argtagnanstraat” in Wezet, Belgium, but missed that there was a [Wayback] “d'Artagnanlaan” in Maastricht, The Netherlands: [Wayback] Postcode d’Artagnanlaan in Maastricht – Postcode bij adres

Woonveilig mangles even more

Of course the above is not the only street name problem where the front-end disagrees with the back-end. The JavaScript front-end code forces the first letter of of the street name to me uppercase and everything else to be owercase, even if the back-end returned it correctly, or the user filled it in correctly.

In many cases, this does not hold, especially with street names containing spaces. An example with another street name in Zoetermeer is [Wayback/Archive.is] www.woonveilig.nl/engine/woonveilig__website__shop__order__address_c?building=150&zip=2713+RD:

{"street":"Van Leeuwenhoeklaan","houseNumber":150,"houseNumberAddition":"","postcode":"2713RD","city":"Zoetermeer","municipality":"Zoetermeer","province":"Zuid-Holland","rdX":93134,"rdY":452495,"latitude":52.05730863,"longitude":4.48514815,"bagNumberDesignationId":"0637200000209394","bagAddressableObjectId":"0637010000209395","addressType":"building","purposes":["residency"],"surfaceArea":79,"houseNumberAdditions":[""]}

The user interface incorrectly morphes this into “Van leeuwenhoeklaan” (note the lowercase l instead of the uppercase L): [Archive.is] Jeroen Wiert Pluimers on Twitter: “Inderdaad: “De Straatnaam bevat onjuiste tekens.”. Het kan nog gekker. … wat netjes “Van Leeuwenhoeklaan” en nummer “150” teruggeeft, dat wordt natuurlijk… Inderdaad, geen huisnummer en de L wordt lowercase: “Van Leeuwenhoeklaan”… “

Medireva -> PostNL export

I tried the various tables with encoding issues from Mojibake – Wikipedia, but had a hard time finding the cause for this one:

[Archive.is] Jeroen Wiert Pluimers on Twitter: “Still need to figure out what went wrong here between the @MediReva and @PostNL systems where “PYRENEEËN” on the medireva side (including package label) becomes “PYRENEEÓN” on the PostNL side. Luckily the postal code and house number are correct, so delivery is OK.… “

At first I thought the Ë (U+00CB) to Ó (U+00D3) mapping is odd as it does not match with any of the popular character sets:

Ë – Wikipedia

Character information

Preview Ë ë

Unicode name LATIN CAPITAL LETTER E WITH DIAERESIS LATIN SMALL LETTER E WITH DIAERESIS

Encodings decimal hex decimal hex

Unicode 203 U+00CB 235 U+00EB

UTF-8 195 139 C3 8B 195 171 C3 AB

Numeric character reference Ë Ë ë ë

Named character reference Ë ë

Windows 1252; ISO 8859–1, 2, 3, 4, 9, 10, 14, 15, 16 203 CB 235 EB

Character information
Preview	Ë	ë
Unicode name	LATIN CAPITAL LETTER E WITH DIAERESIS	LATIN SMALL LETTER E WITH DIAERESIS
Encodings	decimal	hex	decimal	hex
Unicode	203	U+00CB	235	U+00EB
UTF-8	195 139	C3 8B	195 171	C3 AB
Numeric character reference	Ë	Ë	ë	ë
Named character reference	Ë	ë
Windows 1252; ISO 8859–1, 2, 3, 4, 9, 10, 14, 15, 16	203	CB	235	EB

Ó – Wikipedia

Character information

Preview Ó ó

Unicode name LATIN CAPITAL LETTER O WITH ACUTE LATIN SMALL LETTER O WITH ACUTE

Encodings decimal hex decimal hex

Unicode 211 U+00D3 243 U+00F3

UTF-8 195 147 C3 93 195 179 C3 B3

Numeric character reference Ó Ó ó ó

Named character reference Ó ó

EBCDIC family 238 EE 206 CE

Windows 1252; ISO 8859–1/2/3/9/10/13/14/15/16 211 D3 243 F3

Character information
Preview	Ó	ó
Unicode name	LATIN CAPITAL LETTER O WITH ACUTE	LATIN SMALL LETTER O WITH ACUTE
Encodings	decimal	hex	decimal	hex
Unicode	211	U+00D3	243	U+00F3
UTF-8	195 147	C3 93	195 179	C3 B3
Numeric character reference	Ó	Ó	ó	ó
Named character reference	Ó	ó
EBCDIC family	238	EE	206	CE
Windows 1252; ISO 8859–1/2/3/9/10/13/14/15/16	211	D3	243	F3

From the above tables, I concluded I was basically looking for character sets that had either:

Ë (U+00CB) at code point 0xD3 in stead of 0xCB
Ó (U+00D3) at code point 0xCB in stead of 0xD3

Then I realised this might be based on some old software with DOS or Windows heritage dating back to DOS code pages.

In Western Europe, during the DOS age, both Code page 850 and Code page 858 were popular. Code page 858 basically is codepage 850 where codepoint 0xD5 replaced by the € Euro currency sign.

On Windows the most popular code pages was ISO 8859-1 (before the introduction of the Euro currency), then became ISO 8859-15 after the Euro currency was introduced on 2002-01-01 and on Windows has a kind of superset as Windows 1252. The differences between these code pages are as follows:

80 82 83 84 85 86 87 88 89 8A 8B 8C 8E 91 92 93 94 95 96 97 98 99 9A 9B 9C 9E 9F A4 A6 A8 B4 B8 BC BD BE

8859-1 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A ¤
00164 ¦
00a6 ¨
00A8 ´
00B4 ¸
00B8 ¼
00BC ½
00BD ¾
00BE

8859-15 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A €
20AC Š
0160 š
0161 Ž
017D ž
017E Œ
0152 œ
0153 Ÿ
0178

Windows 1252 €
20AC ‚
201A ƒ
0192 „
201E …
2026 †
2020 ‡
2021 ˆ
02C6 ‰
2030 Š
0160 ‹
2039 Œ
0152 Ž
017D ‘
2018 ’
2019 “
201C ”
201D •
2022 –
2013 —
2014 ˜
02DC ™
2122 š
0161 ›
203A œ
0153 ž
017E Ÿ
0178 ¤
0164 ¦
00A6 ¨
00A8 ´
00B4 ¸
00B8 ¼
00BC ½
00BD ¾
00BE

Note that various columns are missing as they have the same codepoints in all three code pages.

For non-Unicode software on Windows, code page Windows 1252 still is the most popular one. But what if the underlying software was older and assumed a DOS code page, or was historically forced to use a DOS code page?

Bingo: both Code page 850 and Code page 858 have Ë ad codepoint 0xD3 whereas Widows 1252, ISO 8859-1 and ISO 8859-15 had it at 0xCB. You see that Ë is at position 0xD3 in the below tables having Unicode code point U+00CB:

Code page 850

_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F

8_
128 Ç
00C7 ü
00FC é
00E9 â
00E2 ä
00E4 à
00E0 å
00E5 ç
00E7 ê
00EA ë
00EB è
00E8 ï
00EF î
00EE ì
00EC Ä
00C4 Å
00C5

9_
144 É
00C9 æ
00E6 Æ
00C6 ô
00F4 ö
00F6 ò
00F2 û
00FB ù
00F9 ÿ
00FF Ö
00D6 Ü
00DC ø
00F8 £
00A3 Ø
00D8 ×
00D7 ƒ
0192

A_
160 á
00E1 í
00ED ó
00F3 ú
00FA ñ
00F1 Ñ
00D1 ª
00AA º
00BA ¿
00BF ®
00AE ¬
00AC ½
00BD ¼
00BC ¡
00A1 «
00AB »
00BB

B_
176 ░
2591 ▒
2592 ▓
2593 │
2502 ┤
2524 Á
00C1 Â
00C2 À
00C0 ©
00A9 ╣
2563 ║
2551 ╗
2557 ╝
255D ¢
00A2 ¥
00A5 ┐
2510

C_
192 └
2514 ┴
2534 ┬
252C ├
251C ─
2500 ┼
253C ã
00E3 Ã
00C3 ╚
255A ╔
2554 ╩
2569 ╦
2566 ╠
2560 ═
2550 ╬
256C ¤
00A4

D_
208 ð
00F0 Ð
00D0 Ê
00CA Ë
00CB È
00C8 ı
0131 Í
00CD Î
00CE Ï
00CF ┘
2518 ┌
250C █
2588 ▄
2584 ¦
00A6 Ì
00CC ▀
2580

E_
224 Ó
00D3 ß
00DF Ô
00D4 Ò
00D2 õ
00F5 Õ
00D5 µ
00B5 þ
00FE Þ
00DE Ú
00DA Û
00DB Ù
00D9 ý
00FD Ý
00DD ¯
00AF ´
00B4

F_
240 SHY
00AD ±
00B1 ‗
2017 ¾
00BE ¶
00B6 §
00A7 ÷
00F7 ¸
00B8 °
00B0 ¨
00A8 ·
00B7 ¹
00B9 ³
00B3 ²
00B2 ■
25A0 NBSP
00A0

Code page 858

_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F

8_
128 Ç
00C7 ü
00FC é
00E9 â
00E2 ä
00E4 à
00E0 å
00E5 ç
00E7 ê
00EA ë
00EB è
00E8 ï
00EF î
00EE ì
00EC Ä
00C4 Å
00C5

9_
144 É
00C9 æ
00E6 Æ
00C6 ô
00F4 ö
00F6 ò
00F2 û
00FB ù
00F9 ÿ
00FF Ö
00D6 Ü
00DC ø
00F8 £
00A3 Ø
00D8 ×
00D7 ƒ
0192

A_
160 á
00E1 í
00ED ó
00F3 ú
00FA ñ
00F1 Ñ
00D1 ª
00AA º
00BA ¿
00BF ®
00AE ¬
00AC ½
00BD ¼
00BC ¡
00A1 «
00AB »
00BB

B_
176 ░
2591 ▒
2592 ▓
2593 │
2502 ┤
2524 Á
00C1 Â
00C2 À
00C0 ©
00A9 ╣
2563 ║
2551 ╗
2557 ╝
255D ¢
00A2 ¥
00A5 ┐
2510

C_
192 └
2514 ┴
2534 ┬
252C ├
251C ─
2500 ┼
253C ã
00E3 Ã
00C3 ╚
255A ╔
2554 ╩
2569 ╦
2566 ╠
2560 ═
2550 ╬
256C ¤
00A4

D_
208 ð
00F0 Ð
00D0 Ê
00CA Ë
00CB È
00C8 €
20AC Í
00CD Î
00CE Ï
00CF ┘
2518 ┌
250C █
2588 ▄
2584 ¦
00A6 Ì
00CC ▀
2580

E_
224 Ó
00D3 ß
00DF Ô
00D4 Ò
00D2 õ
00F5 Õ
00D5 µ
00B5 þ
00FE Þ
00DE Ú
00DA Û
00DB Ù
00D9 ý
00FD Ý
00DD ¯
00AF ´
00B4

F_
240 SHY
00AD ±
00B1 ‗
2017 ¾
00BE ¶
00B6 §
00A7 ÷
00F7 ¸
00B8 °
00B0 ¨
00A8 ·
00B7 ¹
00B9 ³
00B3 ²
00B2 ■
25A0 NBSP
00A0

Code page 850
	_0	_1	_2	_3	_4	_5	_6	_7	_8	_9	_A	_B	_C	_D	_E	_F
8_ 128	Ç 00C7	ü 00FC	é 00E9	â 00E2	ä 00E4	à 00E0	å 00E5	ç 00E7	ê 00EA	ë 00EB	è 00E8	ï 00EF	î 00EE	ì 00EC	Ä 00C4	Å 00C5
9_ 144	É 00C9	æ 00E6	Æ 00C6	ô 00F4	ö 00F6	ò 00F2	û 00FB	ù 00F9	ÿ 00FF	Ö 00D6	Ü 00DC	ø 00F8	£ 00A3	Ø 00D8	× 00D7	ƒ 0192
A_ 160	á 00E1	í 00ED	ó 00F3	ú 00FA	ñ 00F1	Ñ 00D1	ª 00AA	º 00BA	¿ 00BF	® 00AE	¬ 00AC	½ 00BD	¼ 00BC	¡ 00A1	« 00AB	» 00BB
B_ 176	░ 2591	▒ 2592	▓ 2593	│ 2502	┤ 2524	Á 00C1	Â 00C2	À 00C0	© 00A9	╣ 2563	║ 2551	╗ 2557	╝ 255D	¢ 00A2	¥ 00A5	┐ 2510
C_ 192	└ 2514	┴ 2534	┬ 252C	├ 251C	─ 2500	┼ 253C	ã 00E3	Ã 00C3	╚ 255A	╔ 2554	╩ 2569	╦ 2566	╠ 2560	═ 2550	╬ 256C	¤ 00A4
D_ 208	ð 00F0	Ð 00D0	Ê 00CA	Ë 00CB	È 00C8	ı 0131	Í 00CD	Î 00CE	Ï 00CF	┘ 2518	┌ 250C	█ 2588	▄ 2584	¦ 00A6	Ì 00CC	▀ 2580
E_ 224	Ó 00D3	ß 00DF	Ô 00D4	Ò 00D2	õ 00F5	Õ 00D5	µ 00B5	þ 00FE	Þ 00DE	Ú 00DA	Û 00DB	Ù 00D9	ý 00FD	Ý 00DD	¯ 00AF	´ 00B4
F_ 240	SHY 00AD	± 00B1	‗ 2017	¾ 00BE	¶ 00B6	§ 00A7	÷ 00F7	¸ 00B8	° 00B0	¨ 00A8	· 00B7	¹ 00B9	³ 00B3	² 00B2	■ 25A0	NBSP 00A0

Code page 858
	_0	_1	_2	_3	_4	_5	_6	_7	_8	_9	_A	_B	_C	_D	_E	_F
8_ 128	Ç 00C7	ü 00FC	é 00E9	â 00E2	ä 00E4	à 00E0	å 00E5	ç 00E7	ê 00EA	ë 00EB	è 00E8	ï 00EF	î 00EE	ì 00EC	Ä 00C4	Å 00C5
9_ 144	É 00C9	æ 00E6	Æ 00C6	ô 00F4	ö 00F6	ò 00F2	û 00FB	ù 00F9	ÿ 00FF	Ö 00D6	Ü 00DC	ø 00F8	£ 00A3	Ø 00D8	× 00D7	ƒ 0192
A_ 160	á 00E1	í 00ED	ó 00F3	ú 00FA	ñ 00F1	Ñ 00D1	ª 00AA	º 00BA	¿ 00BF	® 00AE	¬ 00AC	½ 00BD	¼ 00BC	¡ 00A1	« 00AB	» 00BB
B_ 176	░ 2591	▒ 2592	▓ 2593	│ 2502	┤ 2524	Á 00C1	Â 00C2	À 00C0	© 00A9	╣ 2563	║ 2551	╗ 2557	╝ 255D	¢ 00A2	¥ 00A5	┐ 2510
C_ 192	└ 2514	┴ 2534	┬ 252C	├ 251C	─ 2500	┼ 253C	ã 00E3	Ã 00C3	╚ 255A	╔ 2554	╩ 2569	╦ 2566	╠ 2560	═ 2550	╬ 256C	¤ 00A4
D_ 208	ð 00F0	Ð 00D0	Ê 00CA	Ë 00CB	È 00C8	€ 20AC	Í 00CD	Î 00CE	Ï 00CF	┘ 2518	┌ 250C	█ 2588	▄ 2584	¦ 00A6	Ì 00CC	▀ 2580
E_ 224	Ó 00D3	ß 00DF	Ô 00D4	Ò 00D2	õ 00F5	Õ 00D5	µ 00B5	þ 00FE	Þ 00DE	Ú 00DA	Û 00DB	Ù 00D9	ý 00FD	Ý 00DD	¯ 00AF	´ 00B4
F_ 240	SHY 00AD	± 00B1	‗ 2017	¾ 00BE	¶ 00B6	§ 00A7	÷ 00F7	¸ 00B8	° 00B0	¨ 00A8	· 00B7	¹ 00B9	³ 00B3	² 00B2	■ 25A0	NBSP 00A0

So the problem likely is sending the Ë in codepage 850/858 encoding with codepoint 0xD3, where the receiving system thinks it is ISO 8859-1 or ISO 8859-15 interpreting codepoint 0xD3 as Ó:

xs4all

As a side note: I never got response to [NL] encoding blijft moeilijk, waarom toch? (dit keer in een brief van @xs4all) either, but also did not have the opportunity to re-test that myself.

I tied figuring out if I could find the cause, but these are so distinct, I have not yet found it.

There is still a plan to switch my fibre and VoIP ISP from [Archive.is] @xs4all to [Archive.is] @FreedomNetNL, so maybe one day…

Late 1960s: the first diacritic in a Dutch street name

Back in the late 1960s, Tilburg wanted to name a street after the composer Georg Friedrich Händel (yup, the English Wikipedia page incorrectly rewrote it as George Frideric Handel).

This was the era where (then still 7-bit!) EBCDIC ruled, Unicode was not even in it’s infancy, and ASCII only became a standard in 1969 – the year I was born.

Luckily, already back-then the German language already had digraph replacement (because of the origins of the German umlaut) like ä → ae, ö → oe, ü → ue, so Handel became Haendel, and in 1969 the street was named Haendellaan as Tilburg was afraid “the Computer” (note singular!) would not cope with diacritics well.

With the advent of the Dutch postal code system in 1977, yes in addition to a phone book, in 1978 everybody got a “postcodeboek” as well, it ended up with just one postal code: [Wayback] Postcode Haendellaan in Tilburg – Postcode bij adres.

[Archive.is] straatnamen on Twitter: “Dat is natuurlijk een – makkelijk te voorkomen – fout van hun website en niet van jouw straatnaam. (Terzijde: toen de gemeente Tilburg in 1969 een straat naar Händel wilde noemen, was men bang dat ‘de computer’ geen raad zou weten met die ä en maakte men er Haendellaan van.)… “
[Archive.is] Jeroen Wiert Pluimers on Twitter: “1969, toen EBCDIC nog overheersend was, ASCII net geboren en ik op de wereld kwam. … “
- [Wayback] How it was: ASCII, EBCDIC, ISO, and Unicode – EDN
  
  There’s an old engineering joke that says: “Standards are great … everyone should have one!” The problem is that – very often – everyone does.
  
  EBCDIC character codes.

Buildings and address registration in The Netherlands: BAG

The BAG public register (Basisregistratie Adressen en Gebouwen – Wikipedia) has current information on all addresses and buildings.

FAQ: [Wayback] Basisregistraties Adressen en Gebouwen – PDF Gratis download [Wayback] 17470107.pdf
Small extract about legal code points in street names: [Wayback] Welke lettertekens mogen in de BAG gebruikt worden voor woonplaatsnamen en straatnamen? | test

–jeroen

Wat andere straatnamen die ook mis gaan staan in https://t.co/shOxpP3pDo

Kennelijk zijn Woonveilig en @MediReva niet alleen met systemen die problemen hebben met diakrieten in straatnamen.

— Jeroen Wiert Pluimers @wiert@mastodon.social (@jpluimers) June 6, 2021

1969, toen EBCDIC nog overheersend was, ASCII net geboren en ik op de wereld kwam.https://t.co/sKe8yhhS7g

— Jeroen Wiert Pluimers @wiert@mastodon.social (@jpluimers) June 6, 2021

Inderdaad: "De Straatnaam bevat onjuiste tekens.".

Het kan nog gekker. https://t.co/s5VrWExHaa wat netjes "Van Leeuwenhoeklaan" en nummer "150" teruggeeft, dat wordt natuurlijk…

Inderdaad, geen huisnummer en de L wordt lowercase: "Van Leeuwenhoeklaan" pic.twitter.com/TPEJHjmZXN

— Jeroen Wiert Pluimers @wiert@mastodon.social (@jpluimers) June 8, 2021

The problem likely is sending the Ë in codepage 850/858 encoding with codepoint 0xD3, where the receiving system thinks it is ISO 8859-1 or ISO 8859-15 interpreting codepoint 0xD3 as Ó.

2/2

— Jeroen Wiert Pluimers @wiert@mastodon.social (@jpluimers) June 9, 2021

This entry was posted on 2022/02/09 at 18:00 and is filed under Communications Development, CP850, Dark Pattern, Development, Encoding, ISO-8859, ISO8859, Mojibake, Software Development, Unicode, User Experience (ux), UTF-16, UTF-8, Windows-1252. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

	jpluimers on Ookla speedtest CLI for Window…
	Mateusz on Now that XE8 is out, some Turb…
	jpluimers on Some links that might help use…
	jpluimers on Hidden Features in Delphi rela…
	jpluimers on Watching “Why is C# Evol…

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

My Flickr Stream

Pages

All categories

Email Subscription