The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 4,262 other subscribers

Archive for May 8th, 2009

StUF – receiving data from a provider where UTF-8 is in fact ISO-8859

Posted by jpluimers on 2009/05/08

Recently when receiving information from a StUF webservice created by a large Dutch provider of government IT systems, we had an issue with characters having their high bit set.

Although the web-service pretended to send their information as UTF-8, in fact they were encoding using a form of ISO_8859.

The most likely character set they used is ISO-8859-1 (since that is the default encoding for the HTTP protocol), but it might also be ISO-8859-15 which is an adaption of ISO-8859-1 trading some typographic characters for the euro-sign and some characters from French and some characters used for transliteration of  Russian, Finnish and Estonian.
(note that the printable characters of both ISO-8859-1 and ISO-8859-15 can be displayed by the Windows-1252 code page)

Since it is not possible to reliably “guess” the right encoding (there are way to many possibilities, even IsTextUnicode that is used by Notepad fails, see below), the only way is to use a fixed reencoding that depends on the StUF data provider. Read the rest of this entry »

Posted in Development, Encoding, ISO-8859, ISO8859, Mojibake, Software Development, The Old New Thing, Unicode, UTF-8, UTF8, Windows Development, XML, XML/XSD | 5 Comments »