April 2026
M	T	W	T	F	S	S
	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Archive for the ‘UTF-8’ Category

UTF-8, Explained Simply – YouTube

Posted by jpluimers on 2026/03/04

Cool interesting video: [Wayback/Archive] UTF-8, Explained Simply – YouTube

It covers both history from the late 1800s Baudot Code (also known as ITA1) via 1930s ITA2 and 1950’s EBCDIC / FIELDATA ages through 7-bit ASCII in the 1970s and incompatible UCS-2 (now UTF-16) of the 1990s to the current day and age of UTF-8 (which actually started out on a placemat in 1992).

Though mentioning 8-bit encoding, it skips details of extended ASCII encodings like ISO/IEC 8859 and Windows-1252.

It goes to quite some length on decoding UTF-8 and showing how forgiving the UTF-8 standard is. Yes, it is a self-synchronising code thanks to the venerable Ken Thompson.

Definitely worth watching as it also covers the Zero-width joiner which is not just important for combining Emoji, as it is used by many people nowadays, but got in fact implemented to support various scripts like Arabic script or any Indic script.

Oh, the placemat story: Read the rest of this entry »

Posted in ASCII, Development, EBCDIC, Encoding, ISO-8859, Software Development, UCS-2, Unicode, UTF-16, UTF-8, Windows-1252 | Leave a Comment »

Sequoiaview altrnatives

Posted by jpluimers on 2025/06/12

I wrote about Sequoiaview in depth in SequoiaView Homepage, made some research notes in “cushion treemap” delphi – Google Search and touched it slightly in A choco install list.

I never heard back from my request for Sequoiaview source code, and given ever increasing local storage media sizes, the speed of it now has become an issue, so I started looking to see if more alternatives have appeared and what sets them apart.

TL;DR

There is the open source WinDirStat that runs as non-admin and is about as slow as Sequoiaview
There is the closed source but free for personal use WizTree that requires admin elevation and is much faster than Sequoiaview and WinDirStat

Neither of them allow for a view that is cushion treemap only.

The reason that WizTree is fast is that it directly uses the NTFS MFT (Master File Table) to read the information from. This requires elevated permissions.

This is the same mechanism used by the Everything search tool, but unlike Everything, WizTree:

Read the rest of this entry »

Posted in C++, Development, Encoding, Mojibake, Software Development, UTF-8, Windows Development | Tagged: include | Leave a Comment »

As a tribute to their @isotopp handle history, Kris now changed its name to KÃ¶hntopp

Posted by jpluimers on 2024/12/17

[Wayback/Archive] Jeroen Wiert Pluimers: “LOL, just saw @isotopp changed…” – Mastodon

LOL, just saw @isotopp changed his name to KÃ¶hntopp

Well done, Kris. Well done.

https://ftfy.vercel.app/?s=Ã¶

( the history of the iso isotopp handle is so great, that I was glad I captured it from Twitter before that content got deleted; it is now at https://wiert.me/2022/06/09/how-isotopp-became-the-online-handle-of-kristian-kohntopp/ )

This Vercel app cannot be archived in the Wayback Machine properly as it then returns a HTTP 500. The Archive.is save succeeded though: [Wayback/Archive] https://ftfy.vercel.app/?s=Ã¶:

Read the rest of this entry »

Posted in Development, Encoding, ISO-8859, ISO8859, Mojibake, Software Development, Unicode, UTF-8 | Leave a Comment »

The mojibake “creÃ«er”

Posted by jpluimers on 2024/08/22

A while ago, I found the “creÃ«er” mojibake in a Dutch page on the IKEA site.

They were not alone to make this mistake which is easily explained using [Wayback/Archive] ftfy:

>>> ftfy.fix_and_explain("creÃ«er")
ExplainedText(text='creëer', explanation=[('encode', 'latin-1'), ('decode', 'utf-8')])

(you can run this on-line at [Wayback/Archive] Welcome to Python.org: interactive shell, see my post The things I didn’t notice during cancer survival: ftfy 6.0 and more versions got released during my recovery on how to do this)

So the text is easily fixed:

Read the rest of this entry »

Posted in Development, Encoding, ftfy, ISO-8859, ISO8859, Software Development, Unicode, UTF-8, UTF8, Web Development | Leave a Comment »

A while ago I bumped into some GPI Mojibake examples, but soon found out I should use the ftfy test cases

Posted by jpluimers on 2022/11/22

I have been into more and more Mojibake example pages like [Wayback] Mojibake: Question Marks, Strange Characters and Other Issues | GPI

Have you ever found strange characters like these �� when viewing content in applications or websites in other languages?

They made me realise that all these (including the Mojibake examples on my blog) are just artifacts, but the real list of examples is the set of ftfy test cases at [Wayback/Archive.is] python-ftfy/test_cases.json at master · LuminosoInsight/python-ftfy

I got reminded when Waternet moved from paper mail using “Pyreneeën” to email using “PyreneeÃ«n“. Not as bad as Waterschap AGV did earlier: they took it one level further and made “PyreneeÃÂ«n” out of it, see Last year, a classic Mojibake was introduced when Waterschap Amstel, Gooi en Vecht redesigned their IT systems.

This seems like a trend where newer systems perform worse than older systems. I wonder why that is.

BTW: the trick on the [Wayback/Archive] Python.org shell to run ftfy (which is not installed by default) is first dropping to the shell (see my post How do I drop a bash shell from within Python? – Stack Overflow), then starting python again:

Read the rest of this entry »

Posted in CP850, Development, Encoding, ftfy, ISO-8859, Mojibake, Python, Scripting, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »

Unicode symbols in a batch file – Stack Overflow

Posted by jpluimers on 2022/06/30

Even with a batch file saved as UTF-8 (with or without BOM), by default it does not show most non-ASCII Unicode characters.

The reason is that the default codepage usually is an ANSI one like codepage 437.

Thanks [Wayback] niutech for answering [Wayback/Archive.is] Unicode symbols in a batch file – Stack Overflow:

You can manually set the codepage to UTF-8 by typing chcp 65001 at the top of your batch file.

Codepage 65001 is Windows speak for the UTF-8 code page. I have some more blog entries mentioning codepage 65001.

An example where I needed this was to show how to address the localghost from a batch file (see The spookback localghost address to resolve 👻). This was the resulting UTF-8 saved batch file:

chcp 65001
ping 👻
ping xn--9q8h

For single-byte non-ASCII characters, you can usually get away with setting the encoding of your batch file to your default code page as mentioned in [Wayback/Archive.is] cmd – Using box-drawing Unicode characters in batch files – Stack Overflow.

–jeroen

Posted in Batch-Files, Development, Encoding, Scripting, Software Development, Unicode, UTF-8, Windows Development | Leave a Comment »

Get it at a discount while it is hot: Delphi Thread Safety Patterns eBook by Dalija Prasnikar and Neven Prasnikar Jr.

Posted by jpluimers on 2022/06/01

Get the new [Wayback/Archive] Delphi Thread Safety Patterns eBook at a discount while it is hot:

Use Coupon Code: DTSPATT10 at checkout to get a $10 discount.
This promotional offer is valid through June 14.

Read the rest of this entry »

Posted in Delphi, Development, Encoding, ISO-8859, ISO8859, Mojibake, Multi-Threading / Concurrency, Software Development, UTF-8, Windows-1252 | Leave a Comment »

Last year, a classic Mojibake was introduced when Waterschap Amstel, Gooi en Vecht redesigned their IT systems

Posted by jpluimers on 2022/03/16

Last year, Waterschap Amstel, Gooi en Vecht sent me a paper letter notifying the yearly water bill was going to be late as they were redesigning their IT systems.

Their letter introduced a classic Mojibake that had not been present in all their older paper letter communication.

Street name on a letter via the old IT systems is "Pyreneeën":
Street name on a letter via the new IT systems is "PyreneeÃÂ«n":

Read the rest of this entry »

Posted in Development, Encoding, ftfy, Mojibake, Python, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »

In this day and age, web sites with delivery back-ends still have Unicode issues: at least @Woonveilig, @Medireva and @PostNL still have trouble

Posted by jpluimers on 2022/02/09

Nowadays, some 35 years after the first Unicode ideas got drafted and 30+ years after the Unicode Consortium saw the light, UTF-8 is served my more than 95% of the web as shown in yesterday’s post UTF-8 web adoption is huge, closing 100%, but only soured up since around 2006..

I mentioned this:

It means that nowadays there is a very small chance you will see mangled characters (what Japanese call mojibake) when you’re surfing the web.

Serving UTF8 does not mean no unicode problems.

Below are some issues that happened not too long ago and still happen. I have reported them to all parties involved through web-care, but no response whatsoever, and this is bad: Unicode support beyond basic ASCII for the below systems are still broken even for relatively simple non-ASCII characters based in diacritics decorating a standard ASCII character.

Yes, I know the realm of encoding and code pages is a mess, especially when handling data in multiple layers of an application stack. That’s why I wrote this post in the first place, and have a whole encoding category of blog posts plus a Mojibake subset.

Read the rest of this entry »

Posted in Communications Development, CP850, Dark Pattern, Development, Encoding, ISO-8859, ISO8859, Mojibake, Software Development, Unicode, User Experience (ux), UTF-16, UTF-8, Windows-1252 | Leave a Comment »

C# Effective way to find any file’s Encoding – Stack Overflow

Posted by jpluimers on 2022/02/09

Note: notepad cannot correctly guess the encoding, see the “old new thing”: [Wayback] Some files come up strange in Notepad | The Old New Thing (talking about ANSI a.k.a. Windows-1252, UTF-16LE, UTF-16BE, UTF-8, UTF-7 somewith and some without BOM as Notepad does not understand all permutations)

David Cumps discovered that certain text files come up strange in Notepad. The reason is that Notepad has to edit files in a variety of encodings, and when its back against the wall, sometimes it’s forced to guess.

[Wayback] C# Effective way to find any file’s Encoding – Stack Overflow shows how to detect various byte order marks in C#.

–jeroen

Posted in ASCII, Development, Encoding, Software Development, Unicode, UTF-16, UTF-32, UTF-8, UTF16, UTF32, UTF8 | Leave a Comment »

« Previous Entries

	Attila Kovacs on Crowbarring Windows 95 into Wi…
	Jeroen Wiert Pluimer… on Does Odido (the old T-Mobile N…
	Lars Fosdal on Security alarm provider Woonve…
	Thomas Mueller on Question got closed in May 202…
	Thaddy de Koning on Formulier voor bewindvoerders…

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

My Flickr Stream

Pages

All categories

Email Subscription

Archive for the ‘UTF-8’ Category

UTF-8, Explained Simply – YouTube

Sequoiaview altrnatives

TL;DR

As a tribute to their @isotopp handle history, Kris now changed its name to KÃ¶hntopp

The mojibake “creÃ«er”

A while ago I bumped into some GPI Mojibake examples, but soon found out I should use the ftfy test cases

Unicode symbols in a batch file – Stack Overflow

Get it at a discount while it is hot: Delphi Thread Safety Patterns eBook by Dalija Prasnikar and Neven Prasnikar Jr.

Last year, a classic Mojibake was introduced when Waterschap Amstel, Gooi en Vecht redesigned their IT systems

In this day and age, web sites with delivery back-ends still have Unicode issues: at least @Woonveilig, @Medireva and @PostNL still have trouble

Serving UTF8 does not mean no unicode problems.

C# Effective way to find any file’s Encoding – Stack Overflow

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

My Flickr Stream

Pages

All categories

Email Subscription

Archive for the ‘UTF-8’ Category

Rate this:

Share this:

TL;DR

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Serving UTF8 does not mean no unicode problems.

Rate this:

Share this:

Rate this:

Share this: