The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,731 other followers

UTF-8 web adoption is huge, closing 100%, but only soured up since around 2006.

Posted by jpluimers on 2022/02/08

As a precursor to a post tomorrow showing that serving UTF8 does not mean organisations go without unicode problems, first some statistics.

The first Unicode ideas got drafted some 30 years ago in 1987. In 1991, more than 30 years ago, the Unicode Consortium saw the light. Nowadays more than 95% percent of the web-pages (close to 100% when you include plain ASCII) is served using the UTF-8 encoding.

It means that nowadays there is a very small chance you

will see mangled characters (what Japanese call mojibake) when you’re surfing the web.

Some nice graphs of unicode growth are at these locations are at these locations:

I think especially important are 2008 (when UTF-8 had outgrown all other individual encodings) and slightly after 2010, when UTF-8 alone covered more than 50% of the pages served. These exclude ASCII-only pages. Adding those would make the figures even larger.

graph showing a steep rise in the use of UTF-8 and a steep decline in other major encodings

Historical yearly trends in the usage statistics of character encodings for websites, June 2021

Historical yearly trends in the usage statistics of character encodings for websites, June 2021

–jeroen

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

 
%d bloggers like this: