The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,708 other followers

Archive for the ‘Unicode’ Category

Unicode ligatures: not all software does normalised search forgetting ffi 

Posted by jpluimers on 2019/06/26

Via a private share, I found out that some software forgets to perform a Unicode normalisation when doing a search.

That means that ligatures do not match the non-ligatures in for instance these words:

  • “ff” and “ff”, as in “difference” versus “difference”
  • “fi” and “fi” as in “notification” versus “notification”.

For more information, read [WayBackUnicode equivalence – Wikipedia and make sure you know about these normal forms:

NFD
Normalization Form Canonical Decomposition
Characters are decomposed by canonical equivalence, and multiple combining characters are arranged in a specific order.
NFC
Normalization Form Canonical Composition
Characters are decomposed and then recomposed by canonical equivalence.
NFKD
Normalization Form Compatibility Decomposition
Characters are decomposed by compatibility, and multiple combining characters are arranged in a specific order.
NFKC
Normalization Form Compatibility Composition
Characters are decomposed by compatibility, then recomposed by canonical equivalence.

–jeroen

Posted in Development, Encoding, Software Development, Unicode | Leave a Comment »

I’ve given up on entering non-ASCII characters when entering data on-line

Posted by jpluimers on 2019/06/17

I live in a street that has a non-ASCII character in it: Pyreneeën.

I’ve reverted back to entering the street name as plain ASCII for a simple reason:

Too often the ë gets mangled into encoding gibberish, similar to the é example in [WayBackWhen Good Characters Go Bad: A Guide to Diagnosing Character Display Problems as these characters are very near both in UTF-8 and in the [WayBackUnicode Characters in the Latin-1 Supplement Block:

I’ve seen these encodings, where only the top encoding right and the degeneration gets worse over time:

# encoded UTF-8 (hex.)
0 ë 0xC3 0xAB
1 ë 0xC3 0x83 0xC2 0xAB
2 ë 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0xAB
3 ë 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0xAB
4 ë 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0x83 0xC3 0x83 0xC2 0x83 0xC3 0x82 0xC2 0x82 0xC3 0x83 0xC2 0x82 0xC3 0x82 0xC2 0xAB
5 ë 0x26 0x65 0x75 0x6d 0x6c 0x3b

The last one seldomly happens, the first one relatively often, just like [Archive.is] fd.nl did a while on their finanancial pages.

These mistakes become sort of understandable (but not forgivable) when you look at the below table-fragment (the full table is at[WayBack] Unicode/UTF-8-character table – starting from code position 0080).

Read the rest of this entry »

Posted in Development, Encoding, Power User, Software Development, Unicode, Web Browsers | Leave a Comment »

Getting rid of trailing line-endings in the draw.io web interface

Posted by jpluimers on 2018/12/03

One of the things that bugged me for a long time is that every now and then for some shapes, when editing their text, the draw.io web interface puts in trailing line feeds after the text, messing up layout.

The easiest way to work around it is by searching inside the diagram XML for
"
, then replacing that with a ".

(the above code got screwed by WordPress.com saving it, so the search is in this small gist below)

This behaviour is intermittent on the drawio MacOS desktop app.

–jeroen

 

Posted in Cloud Apps, Development, draw.io, Encoding, Internet, Power User, Software Development, Unicode | Leave a Comment »

Unicode spaces

Posted by jpluimers on 2018/09/25

For my link archive:

Via: [WyBack] Are there blank characters in unicode that have the same widths as period, comma and digits? – Lars Fosdal – Google+

Answer: no, though better fonts have period, comma, colon, semicolon and other punctuations the same width as the punctuation space.

The use-case:

I wanted right justified text without having to do custom positioning/drawing – where the decimal zero is white space.

F.x. here 12 instead of 12.0

9.5
11.6
12 <– #$2008 and #$2007
13.4

I.e. PunctuationSpace and FigureSpace

I don’t want to deal with positioning/rendering since it happens inside a third party component.

–jeroen

Posted in Development, Encoding, Font, Power User, Software Development, Unicode | Leave a Comment »

GitHub – keith-turner/ecoji: Encodes (and decodes) data as emojis

Posted by jpluimers on 2018/03/14

[WayBack] GitHub – keith-turner/ecoji: Encodes (and decodes) data as emojis:

Ecoji 🏣🔉🦐🔼

Ecoji encodes data as 1024 emojis, its base1024 with an emoji character set. As a bonus, includes code to decode emojis to original data.

Sick. Works splendid when all your systems are fully nice to Unicode.

None are. So there’s a German word for it:

Nein

Via:

 

–jeroen

Read the rest of this entry »

Posted in Development, Encoding, Fun, Go (golang), Software Development, Unicode | Leave a Comment »

 
%d bloggers like this: