Archive for the ‘Unicode’ Category

MacOS and Windows: sorting – Simple to enter Unicode character that would sort after Z in most cases? – Stack Overflow

Posted by jpluimers on 2026/03/10

TL;DR: There is no simple character that works on both MacOS and Windows.

[Wayback/Archive] sorting – Simple to enter Unicode character that would sort after Z in most cases? – Stack Overflow (thanks [Wayback/Archive] sorin and [Wayback/Archive] degenerate):

A

On Windows, none of these options work because they all sort before A.

A solution I ended up using is an Arabic character:

ٴ This folder comes after z in windows

Source

According to [Wayback/Archive] What Unicode character is this ?, the above mentioned character is U+0674 : ARABIC LETTER HIGH HAMZA.

Note that on Windows the ٴ character displays at the start of the filename, but on MacOS in Finder it ends up behind the extension (as Arabic script is right-to-left) and is very hard to remove. On the MacOS Terminal it ends up on the left and is easy to modify.

Read the rest of this entry »

Posted in Apple, Encoding, Mac OS X / OS X / MacOS, Power User, Unicode, Windows | Leave a Comment »

UTF-8, Explained Simply – YouTube

Posted by jpluimers on 2026/03/04

Cool interesting video: [Wayback/Archive] UTF-8, Explained Simply – YouTube

It covers both history from the late 1800s Baudot Code (also known as ITA1) via 1930s ITA2 and 1950’s EBCDIC / FIELDATA ages through 7-bit ASCII in the 1970s and incompatible UCS-2 (now UTF-16) of the 1990s to the current day and age of UTF-8 (which actually started out on a placemat in 1992).

Though mentioning 8-bit encoding, it skips details of extended ASCII encodings like ISO/IEC 8859 and Windows-1252.

It goes to quite some length on decoding UTF-8 and showing how forgiving the UTF-8 standard is. Yes, it is a self-synchronising code thanks to the venerable Ken Thompson.

Definitely worth watching as it also covers the Zero-width joiner which is not just important for combining Emoji, as it is used by many people nowadays, but got in fact implemented to support various scripts like Arabic script or any Indic script.

Oh, the placemat story: Read the rest of this entry »

Posted in ASCII, Development, EBCDIC, Encoding, ISO-8859, Software Development, UCS-2, Unicode, UTF-16, UTF-8, Windows-1252 | Leave a Comment »

Reminder to self: Remeha Calenta links

Posted by jpluimers on 2025/09/29

Even een reminder aan mezelf, omwat we aan Remeha Calenta Ace hebben en het niet helemaal duidelijk is of deze bij de vorige bewoners echt goed op het huis is ingeregeld.

Het nadeel van Remeha is dat ze volstrekt chaotisch onduidelijke handleidinge hebben waar het lastig is essentiele instellingen in terug te vinden: er zijn veel te veel instellingen met allemaal onduidelijke nummering waarbij hele reeksen nummers worden overgeslagen.

De Remeha Calenta Ace serie is inmidels zo’n 7 jaar in productie.

Read the rest of this entry »

Posted in Development, DIY, Encoding, LifeHacker, Power User, Software Development, Unicode | Leave a Comment »

Unicode subscripts and superscripts: Latin, Greek, Cyrillic, and IPA tables; Source: Small caps: Unicode – Wikipedia

Posted by jpluimers on 2025/03/05

I originally searched for the tables below to see if I could get the visualisations of TeX and LaTeX right for infinite loop in “LaTeX: A Document Preparation System” by Leslie Lamport, printed in 1994..

Didn’t work, neither did using plain html super and subscript. The only thing that worked was using CSS styles (I chose to embed them, as separate CSS files are a huge premium over the WordPress plan), which also preserves actual meaning for screen readers:

Read the rest of this entry »

Posted in accessibility (a11y), CSS, Development, HTML, Power User, Software Development, Unicode, URL Encoding, User Experience (ux), Web Development | Leave a Comment »

As a tribute to their @isotopp handle history, Kris now changed its name to KÃ¶hntopp

Posted by jpluimers on 2024/12/17

[Wayback/Archive] Jeroen Wiert Pluimers: “LOL, just saw @isotopp changed…” – Mastodon

LOL, just saw @isotopp changed his name to KÃ¶hntopp

Well done, Kris. Well done.

https://ftfy.vercel.app/?s=Ã¶

( the history of the iso isotopp handle is so great, that I was glad I captured it from Twitter before that content got deleted; it is now at https://wiert.me/2022/06/09/how-isotopp-became-the-online-handle-of-kristian-kohntopp/ )

This Vercel app cannot be archived in the Wayback Machine properly as it then returns a HTTP 500. The Archive.is save succeeded though: [Wayback/Archive] https://ftfy.vercel.app/?s=Ã¶:

Read the rest of this entry »

Posted in Development, Encoding, ISO-8859, ISO8859, Mojibake, Software Development, Unicode, UTF-8 | Leave a Comment »

Unicode: Keyboard Symbols ⌘ ↵ ⌫

Posted by jpluimers on 2024/12/11

I wish I had bumped into this page a way sooner as it contains most if not all the keyboard symbols I ever looked for: [Wayback/Archive] Unicode: Keyboard Symbols ⌘ ↵ ⌫

The page contains a lot more than just this diagram (which already is a great start):

⎋
 ` 1 2 3 4 5   6 7 8 9 0  - = ⌫    ⎀ ⤒ ⇞
 ⇥ Q W E R T   Y U I O P  [ ] \    ⌦ ⤓ ⇟
 🄰  A S D F G   H J K L ;  ' ↵
 ⇧   Z X C V B   N M , . /  ⇧        ↑
 ⎈ ❖ ⎇    ␣    ⎇ ❖ ▤ ⎈           ← ↓ →

🌐 ⌃ ⌥ ⌘

Some more symbols are at these pages:

Read the rest of this entry »

Posted in Development, Encoding, Hardware, Keyboards and Keyboard Shortcuts, KVM keyboard/video/mouse, Power User, Software Development, Unicode | Leave a Comment »

Unicode spaces (not just en and em, but also em fractions 1/2, 1/3, 1/4, 1/6, 1/5, 4/18 and remarks)

Posted by jpluimers on 2024/10/03

For my link archive (please check the page as by now the table might have changed from what I quote below) [Wayback/Archive] Unicode spaces and the WordPress classic editor might have mangled it.

I like the table as it embeds the spaces between foo and bar so it easy to copy paste them to code or documentation.

Read the rest of this entry »

Posted in Development, Encoding, Software Development, Unicode | Leave a Comment »

The mojibake “creÃ«er”

Posted by jpluimers on 2024/08/22

A while ago, I found the “creÃ«er” mojibake in a Dutch page on the IKEA site.

They were not alone to make this mistake which is easily explained using [Wayback/Archive] ftfy:

>>> ftfy.fix_and_explain("creÃ«er")
ExplainedText(text='creëer', explanation=[('encode', 'latin-1'), ('decode', 'utf-8')])

(you can run this on-line at [Wayback/Archive] Welcome to Python.org: interactive shell, see my post The things I didn’t notice during cancer survival: ftfy 6.0 and more versions got released during my recovery on how to do this)

So the text is easily fixed:

Read the rest of this entry »

Posted in Development, Encoding, ftfy, ISO-8859, ISO8859, Software Development, Unicode, UTF-8, UTF8, Web Development | Leave a Comment »

The regexp for an emoticon ?

Posted by jpluimers on 2024/08/08

I responded to [Wayback/Archive] jilles.com on Twitter: “@0xD4ni @Twitter What is the regexp for an emoticon ?” with [Wayback/Archive] Jeroen Wiert Pluimers on Twitter: “@jilles_com @0xD4ni @Twitter \p{So}+ See …”.

I got the answer from [Wayback/Archive] java – What is the regex to extract all the emojis from a string? – Stack Overflow (thanks [Wayback/Archive] vishalaksh, and [Wayback/Archive] Desgard_Duan) which refers to the quoted section below.

Note that correctly matching highly depends on the versions of the libraries you use: there have been lots of releases of Unicode versions over the last years (since 2014 roughly every 12 months) each usually adding more Emoji.

In addition, many Emoji are not single Unicode codepoints: often they are code points (with or without any of the variation selectors) stacked on top of each other with zero-width joiners like I described in Kris on Twitter: “Company chat: »Right, we need more languages with Emoji as variable type indicators and pointer symbols.«….

I tried fiddling on [Wayback/Archive] regex101: build, test, and debug regex and could not always getting it to work as I hoped for, but also could not figure out how recent their libraries are.

Read the rest of this entry »

Posted in Conference Topics, Conferences, Development, Emoticons, Encoding, Event, Geeky, RegEx, Software Development, Unicode | Leave a Comment »

Some notes on codepoints.net and beta.codepoints.net

Posted by jpluimers on 2024/08/07

At the time of writing a lot of this might be more recent, but for quite some time codepoints.net had not been updated with code point information newer Unicode releases.

Basically it was stuck at Unicode version 8.0 with some 120k glyphs. At the time of writing Unicode version 15.0 is in beta and the difference between 15.0 and 8.0 is some 24k glyphs.

So I had a quick twitter chat with the author and jotted down the links in this blog post so I won’t forget them.

There I learned it was open source (I think it is the only Unicode codepoint site that is).

Here it goes:

Read the rest of this entry »

Posted in *nix, *nix-tools, Apache2, codepoints.net, Conference Topics, Conferences, Database Development, Debian, Development, DVCS - Distributed Version Control, Encoding, Event, GitHub, Linux, MySQL, PHP, Power User, Scripting, Software Development, Source Code Management, Unicode, Web Development | Leave a Comment »

« Previous Entries

	Jeroen Wiert Pluimer… on Pie Comic by John McNamee: Mov…
	Attila Kovacs on Crowbarring Windows 95 into Wi…
	Jeroen Wiert Pluimer… on Does Odido (the old T-Mobile N…
	Lars Fosdal on Security alarm provider Woonve…
	Thomas Mueller on Question got closed in May 202…

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

My Flickr Stream

Pages

All categories

Email Subscription

Archive for the ‘Unicode’ Category

MacOS and Windows: sorting – Simple to enter Unicode character that would sort after Z in most cases? – Stack Overflow

A

UTF-8, Explained Simply – YouTube

Reminder to self: Remeha Calenta links

As a tribute to their @isotopp handle history, Kris now changed its name to KÃ¶hntopp

Unicode: Keyboard Symbols ⌘ ↵ ⌫

Unicode spaces (not just en and em, but also em fractions 1/2, 1/3, 1/4, 1/6, 1/5, 4/18 and remarks)

The mojibake “creÃ«er”

The regexp for an emoticon ?

Some notes on codepoints.net and beta.codepoints.net

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

My Flickr Stream

Pages

All categories

Email Subscription

Archive for the ‘Unicode’ Category

A

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this: