The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,887 other followers

Archive for the ‘CP850’ Category

Which encoding failure did encode “vóór” into “v3/43/4r”? – Stack Overflow

Posted by jpluimers on 2020/02/24

From quite some time ago, but still very relevant as encoding issues keep occurring:

A while ago, I saw the text “v3/43/4r” in a document.I know it comes from “vóór” (the acute accent emphasises in Dutch), and wonder which encoding failure was applied to get this wrong.

Source: [WayBackWhich encoding failure did encode “vóór” into “v3/43/4r”? – Stack Overflow

From the [WayBack] answer by rodrigo:

  • ó: is U+00F3, and occupies the same codepoint (0xF3) in a lot of different encodings (most ISO-8859-* and most western Windows-*).
  • In CP850 the codepint 0xF3 is ¾ (U+00BE), that is the three-quarters character. It is the same in other, less used, codepages (CP775, CP856, CP857, CP858).
  • The ¾ is sometimes transliterated to 3/4 when the character is not directly available.

And there you are! “vóór” -> “v¾¾r” -> “v3/43/4r”.

The first part (ó -> ¾) is the usual corruption of ANSI vs. OEM codepages in the Western Windows versions (in my country ANSI=Windows-1252, OEM=CP850). You can see it easily creating a file with NOTEPAD, writing vóór and dumping it in a command prompt with type.

–jeroen

Posted in CP850, Development, Encoding, Software Development, UTF-8, UTF8, Windows-1252 | Leave a Comment »

 
%d bloggers like this: