The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 4,173 other subscribers

Web means Unicode

Posted by jpluimers on 2010/02/12

Google published an interesting graph generated from their internal data based on their indexed web pages.Encodings on the web

A quick summary of popular encodings based on the graph:

  1. Unicode – almost 50% and rapidly rising
  2. ASCII20% and falling
  3. Western European* – 20% and falling
  4. Rest – 10% and falling

Conclusion: if you do something with the web, make sure you support Unicode.

When you are using Delphi, and need help with transitioning to Unicode: contact me.

–jeroen

* Western European encodings: Windows-1252, ISO-8859-1 and ISO-8859-15.

Reference: Official Google Blog: Unicode nearing 50% of the web.

Edit: 20100212T1500

Some people mentioned (either in the comments or otherwise) that a some sites pretend they emit Unicode, but in fact they don’t.
This doesn’t relieve you from making sure you support Unicode: Don’t pretend you support Unicode, but do it properly!

Examples of bad support for Unicode are not limited to the visible web, but also applications talking to the web, and to webservices (one of my own experiences is explained in StUF – receiving data from a provider where UTF-8 is in fact ISO-8859: it shows an example where a vendor does Unicode support really wrong).

So: when you support Unicode, support it properly.

–jeroen

7 Responses to “Web means Unicode”

  1. “Don’t pretend you support Unicode, but do it properly!”

    That’s hilarious, because anyone that takes an existing pre-Delphi 2009 application and thinks that “doing Unicode properly” is simply a question of eliminating hints and warnings is only pretending to support Unicode.

  2. BarryOw said

    The example above contained a metatag example.

    It disappeared!!!

    • jpluimers said

      Mail me the example and I’ll try to edit your comment (almost anything at pluimers.com gets to me eventually, but using my first name speeds things up considerably).

      –jeroen

  3. BarryOw said

    Postscript:

    Example:

    But I wish Microsoft would finally fix Notepad/Edit to work with Unicode. Real Unicode pages cannot be copied and saved. :(

  4. BarryOw said

    This was discussed before.

    Apparently UTF-8 is just the tag in the header. UTF-8 is compatible with a code page, and as long as you don’t use non Latin code page characters, no-one can say that is false.

    What is happening is that people are beginning to use the tag in the header, or more probably web tools are including the tag for users, especially the growth of web-based homepage tools and community sites.

    But the contents are still code page based.

    • You should realize though, that even then its quite a bit more than a “tag”. If any of those pages allow some sort of postback, then the postback parameters will be utf-8 encoded. With customers from all around the world, you (your web site) will need to handle that properly. Unless you want scrambled customer names, cities etc in your database.

      My point is that if you just see utf-8 as a “tag”, you will run into troubles very soon.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

 
%d bloggers like this: