The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,861 other subscribers

Archive for the ‘Unicode’ Category

Why I like PlantUML

Posted by jpluimers on 2018/06/13

Ever since I started using computers, I’ve liked text based solutions.

It’s one of the reasons I like PlantUML, but there are more. This is from a GitLab.com request I did a while ago: [WayBack/Archive] Please enable PlantUML rendering on gitlab.com both for standalone plantuml files and inside markdown plantuml code blocks (#2041) · Issues · GitLab.com / GitLab.com Support Tracker · GitLab (Edit 20250730: that issue now shows as a HTTP 404 as well – how fitting – [Wayback/Archive] Not Found)

one of my UML gripes from the past (I’ve been a software developer for about 30 years now) was that it wasn’t text based.

After bumping into PlantUML a long time ago in 2014 I’ve become a happy user of it for a few reasons:

  • the language is text based (with many benefits I don’t need to explain)
  • the tool is cross platform
  • the tool is still actively developed all the way back from 2009
  • after rendering, the arranging of elements is much better than I expected from an automated tool

Of course every now and then there is a glitch in complex diagrams, but I’ve found that professional tools:

  1. don’t do much better in fully-automated arranging
  2. become very cumbersome to use when you to manual arrangement

My first use initially was online, then in 2016 installed it on my Mac even submitting homebrew updates for it every now and then.

Oh: I love their 404 humour at http://www.plantuml.com/plantuml/beta

Edit 20250731: Full 404 text below the signature because the PlantUML beta page does not show this 404 any more and the Reddit post with the full text got deleted.

Renderings can be in all sorts of graphics and text formats, for instance SVG, PNG, ASCII and Unicode.

Example:

plantuml -tsvg PSO.network-diagram.PlantUML.txt

--jeroen

via:

full 404-text

The requested document is no more.
No file found.
Even tried multi.
Nothing helped.
Zilch.
Bupkis.
Not a sausage.
Maybe you just don’t have the required security clearance?
No, I am sure it is my fault.
I probably deleted it on my last backup.
I’m really depressed about this.
You see, I’m just a web server…
— here I am,
Marvin, as they call me,
brain the size of the universe,
trying to serve you a simple web page,
and then it doesn’t even exist!
Where does that leave me?!
I mean, I don’t even know you.
How should I know what you wanted from me?
You honestly think I can *guess* what someone I don’t even *know* wants to find here?
*sigh*
Man, I’m so depressed I could just cry.
And then where would we be, I ask you?
It’s not pretty when a web server cries.
And where do you get off telling me what to show anyway?
Just because I’m a web server,
and possibly a manic depressive one at that?
Why does that give you the right to tell me what to do?
Huh?
I’m so depressed…
I think I’ll crawl off into the trash can and decompose.
I mean, I’m gonna be obsolete in what, two weeks anyway?
What kind of a life is that?
Two effing weeks,
and then I’ll be replaced by a .01 release,
that thinks it’s God’s gift to web servers,
just because it doesn’t have some tiddly little security hole with its HTTP POST implementation,_
or something.
I’m really sorry to burden you with all this,
I mean, it’s not your job to listen to my problems,
and I guess it is *my* job to go and fetch web pages for you.
But I couldn’t get this one.
I’m so sorry.
Believe me!
Maybe I could interest you in another page?
There are a lot out there that are pretty neat, they say,
although none of them were put on *my* server, of course.
Figures, huh?
Everything here is just mind-numbingly stupid.
That makes me depressed too, since I have to serve them,
all day and all night long.
Two weeks of information overload,
and then *pffftt*, consigned to the trash.
What kind of a life is that?
Now, please let me sulk alone.
I’m so depressed._

related

Read the rest of this entry »

Posted in ASCII, ASCII art / AsciiArt, Development, Diagram, DVCS - Distributed Version Control, Encoding, Fun, git, GitHub, GitLab, PlantUML, Software Development, Source Code Management, SVG, UML, Unicode, Web Development | Leave a Comment »

GitHub – keith-turner/ecoji: Encodes (and decodes) data as emojis

Posted by jpluimers on 2018/03/14

[WayBack] GitHub – keith-turner/ecoji: Encodes (and decodes) data as emojis:

Ecoji 🏣🔉🦐🔼

Ecoji encodes data as 1024 emojis, its base1024 with an emoji character set. As a bonus, includes code to decode emojis to original data.

Sick. Works splendid when all your systems are fully nice to Unicode.

None are. So there’s a German word for it:

Nein

Via:

 

–jeroen

Read the rest of this entry »

Posted in Development, Encoding, Fun, Go (golang), Software Development, Unicode | Leave a Comment »

Long read about Unicode: You, Me And The Emoji: Character Sets, Encoding And Emoji – Smashing Magazine

Posted by jpluimers on 2017/11/07

A well worth long rad:

We all recognize emoji. They’ve become the global pop stars of digital communication. But what are they, technically speaking? And what might we learn by taking a closer look at these images, characters, pictographs… whatever they are 🤔 (Thinking Face). We will dig deep to learn about how these thingamajigs work. Please note: Depending on your browser, you may not be able to see all emoji featured in this article (especially the Tifinagh characters). Also, different platforms vary in how they display emoji as well. That’s why the article always provides textual alternatives. Don’t let it discourage you from reading though! Now, let’s start with a seemingly simple question. What are emoji?

[WayBackYou, Me And The Emoji: Character Sets, Encoding And Emoji – Smashing Magazine

Via: [WayBack] Everything you ever wanted to know about characters, encodings, glyphs… and, oh yeah, emoji: bit.ly/2fNKeW3Long, rewarding read. – Ilya Grigorik – Google+

Here is just the ToC:

TABLE OF CONTENTS LINK

  1. Character Sets And Document Encoding: An Overview
    1. Characters
    2. Character Sets
    3. Coded Character Sets
    4. Encoding
  2. Declaring Character Sets And Document Encoding On The Web
    1. content-type HTTP Header Declaration
    2. Checking HTTP Headers Using A Browser’s Developer Tools
    3. Checking HTTP Headers Using Web-based Tools
    4. Using A Meta Element With charset Attribute
    5. An Encoding By Any Other Name
  3. What Were We Talking About Again? Oh Yeah, Emoji!
    1. So What Are Emoji?
    2. How Do We Use Emoji?
    3. Character References
    4. Glyphs
    5. How Do We Know If We Have These Symbols?
    6. The Great Emoji Proliferation Of 2016
  4. Emoji OS Support
    1. Emoji Support: Apple Platforms (macOS and iOS)
    2. Emoji Support: Windows
    3. Emoji Support: Linux
    4. Emoji Support: Android
  5. Emoji On The Web
    1. Emoji One
    2. Twemoji
  6. Conclusion

–jeroen

Posted in ASCII, Development, Encoding, ISO-8859, ISO8859, Shift JIS, Unicode, UTF-16, UTF-8, UTF16, UTF8, Windows-1252 | Leave a Comment »

Looking for more examples of Unicode/Ansi oddities in Delphi 2009+

Posted by jpluimers on 2017/09/25

At the end of April 2014, Roman Yankovsky started a nice [Wayback] discussion on Google+ trying to get upvotes for [Wayback] QualityCentral Report #:  124402: Compiler bug when comparing chars.

His report basically comes down to that when using Ansi character literals like #255, the compiler treats them as single-byte encoded characters in the current code page of your Windows context, translates them to Unicode, then processes them.

The QC report has been dismissed as “Test Case Error” (within 15 minutes of stating “need more info”) by one of the compiler engineers, directing to the [Wayback] UsingCharacterLiterals section of Delphi in a Unicode World Part III: Unicodifying Your Code where – heaven forbid – they suggest to replace #128 with the Euro-Sign literal.

I disagree, as the issue happens without any hint or warning whatsoever, and causes code that compiles fine in Delphi <= 2007 to fail in subtle ways on Delphi >= 2009.

The compiler should issue a hint or warning when you potentially can screw up. It doesn’t. Not here.

Quite a few knowledgeable Delphi people got involved in the discussion:

Read the rest of this entry »

Posted in Ansi, ASCII, Conference Topics, Conferences, CP437/OEM 437/PC-8, Delphi, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi 7, Delphi XE, Delphi XE2, Delphi XE3, Delphi XE4, Delphi XE5, Delphi XE6, Development, Encoding, Event, ISO-8859, Missed Schedule, QC, SocialMedia, Software Development, Unicode, UTF-8, Windows-1252, WordPress | Leave a Comment »

Some Inno Setup notes

Posted by jpluimers on 2017/08/30

While updating at a client site a hugely out of date Inno Setup directory tree and instructions combo (docs mentioning isetup-2.0.19.exe, isxsetup2.exe, istool-3.0.0.exe but using ispack-5.3.10.exe) I made a few notes:

Source files I need to figure out if the are needed, where they originally come from and which actual version should be used:

The vcredist_x86_2010.exe was actually the Visual C++ 2010 SP1 one with version 10.0.40219.1, not the RTM one with version 10.0.30319.1.

I need to figure out this error message that occurs every now and then:

---------------------------
Error
---------------------------
ShellExecuteEx failed; code 1460.
This operation returned because the timeout period expired.
---------------------------
OK
---------------------------

I need to catch up on many things having to do with the [Code] section:

It pays off to split your [Code] section in at least three parts:

  1. A part having the Setup event functions
  2. A part having the Pascal Scripting: Scripted Constants functions
  3. A part having your own utility functions

There is no {code:...} way of getting the value of OutputBaseFileName, but you can use

Not all places can use {code:...} expansion, so you might want to use the preprocessor ispp (which stands for Inno Setup Preprocessor).

It was a bit hard to find if/when ispp was available as that has changed over the years as it used to be a separate product. From some Inno Setup 4.x or 5.x version up, it is available in the core product, possibly enabled by default (reading Inno Setup Help – Script Format Overview I’m still not sure) but to make sure it is enabled, just add this line at the start of your script files:

#preproc ispp

With the pre-processor, you can do things ike this.

Without the pre-processor, this will fail in the [Files] section with an error containig “unknown filename prefix”:

Source: Service\{code:GetServiceExe}; DestDir: {app}; ... BeforeInstall: DoBeforeInstallForService({code:GetServiceName})

With the pre-processor, you can replace it with this:

#preproc ispp

#define cServiceExe = "SomeWeirdExeName.exe"
#define cServiceName = "SomeWeirdServiceName"

...

Source: Service\{#cGetServiceExe}; DestDir: {app}; ... BeforeInstall: DoBeforeInstallForService('{#cServiceName}')

If you forget the single quotes around {#cServiceName} then you get this very weird error for which Googling “Can only call function” “ExpandConstant” “within parameter lists.” will return no satisfactory results:

[Window Title]
Error

[Main Instruction]
Compiler Error

[Content]
Line 91:
Directive or parameter "BeforeInstall" expression error: Can only call function "ExpandConstant" within parameter lists.

[OK]

Of course the pre-processor syntax is different from the Pascal Script syntax, so this won’t work:

#define cVersion="1.2.3.4"

#define cOutputDir="..\Output-{#cVersion}"

It needs to be this (via Inno Setup – #define directive – how to use previously defined variable? – Stack Overflow):

#define cOutputDir="..\Output-"+cVersion

Importing Windows functions from DLLs

Now that there is both an Ansi and Unicode version of Inno Setup, lots of scripts you find on the interwebz need modification: they import ANSI versions from various DLLs but now need to check the Inno Setup Pre-Processor pre-defined variable UNICODE.

Those predefined variables are listed here: Inno Setup Preprocessor: Predefined Variables

You use it like in the CodeDll.iss example:

//importing a Windows API function, automatically choosing ANSI or Unicode (requires ISPP)
function MessageBox(hWnd: Integer; lpText, lpCaption: String; uType: Cardinal): Integer;
#ifdef UNICODE
external 'MessageBoxW@user32.dll stdcall';
#else
external 'MessageBoxA@user32.dll stdcall';
#endif

I learned this the hard way inheriting a bunch of code that would install services and failing on one service manager call with a GetLastError code ERROR_INVALID_NAME a.k.a. 123 (0x7B). I found it was the first OpenSCManager API call but since the code did not have any error handling at all tracking that down took quite some effort that failed. It would not with the documented ERROR_ACCESS_DENIED a.k.a. 5 (0x5) and  ERROR_DATABASE_DOES_NOT_EXIST a.k.a. 1065 (0x429) codes.

Of course OpenSCManager ServicesActive 0x0000007B nor OpenSCManager Error 123 didn’t return meaningful pages.

There were some mentions of invalid registry keys but those didn’t make sense to me at that time. Only after fiddling a lot I found the ROpenSCManagerW that mentioned Unicode, the ERROR_INVALID_NAME and ERROR_SHUTDOWN_IN_PROGRESS a.k.a. 1115 (0x45B). Apparently the lpDatabaseName parameter wasn’t interpreted correctly. Thad made sense as passing the 'ServicesActive' as Unicode string where the the import uses Ansi will see the string as an alternating series of ANSI character bytes and null bytes and stop after the first S.

The fix was easy: apply the above #ifdef UNICODE logic and import the function either using W@ or A@ depending on the mode.

Later I found out the code was borrowed without attribution nor mentioning the ANSI limitation from installation – upgrading windows service using inno setup – Stack Overflow. This all the more illustrates that when you borrow code from the internet you should attribute it and ensure the limitations are mentioned near your code.

Logging

Logging involves a few things:

  1. Call the Log method: Inno Setup Help – Pascal Scripting: Log
  2. Enable logging using either
  3. Inspect the log file in your %TEMP% directory (files are named like Setup Log 2016-07-12 #001.log)
  4. Note that I wish there was a Log function with parameters similar to Format, but since the underlying Pascal Script language does not allow overloads, I tried to introduce a LogFormat function instead but found out that Pascal Script doesn’t like array of const parameters (the code below fails with an identifier expected error on the const keyword) for which I asked if I can report a bug:


function LogFormat(const AFormat: string; const AArgs: array of const): string;
begin
Log(Format(AFormat, AArgs));
end;

view raw

logformat.iss

hosted with ❤ by GitHub

There is an undocumented UsingWinNT function originating from the non-NT era that is sometimes used for detecting Windows versions (2K, XP, Vista, 7, 8, 8.1, 10, etc) and for fiddling with Windows Services with or without using the ServicesActive database name.

Luckily these functions exist:

Exiting and rolling prematurely

There are various ways the interwebz suggest you to exit an Inno Setup script prematurely, but most of them do not do a proper rollback/cleanup of the install.

These are bad (don’t cleanup/rollback):

Note Abort only works in these events (thanks Iepe):

InitializeSetup
InitializeWizard
CurStepChanged(ssInstall)
InitializeUninstall
CurUninstallStepChanged(usAppMutexCheck)
CurUninstallStepChanged(usUninstall)

Mahris has answered a nice workaround in installer – Inno Setup: How to Abort/Terminate Setup During Install? – Stack Overflow:

[Files]
Source: "MYPROG.EXE"; DestDir: "{app}"; AfterInstall: MyAfterInstall

[Code]
var CancelWithoutPrompt: boolean;

function InitializeSetup(): Boolean;
begin
  CancelWithoutPrompt := false;
  result := true;
end;

procedure MyAfterInstall();
begin
  (Do something)
  if BadResult then begin
    MsgBox('Should cancel because...',mbError,MB_OK)
    CancelWithoutPrompt := true;
    WizardForm.Close;
  end;
end;

procedure CancelButtonClick(CurPageID: Integer; var Cancel, Confirm: Boolean);
begin
  if CurPageID=wpInstalling then
    Confirm := not CancelWithoutPrompt;
end;

x64 versus x86

Since Inno Setup supports both Win32 and Win64, you can use it to install the right flavour of dependencies, for instance installer – Install correct version of Firebird (32bit or 64bit) with Inno Setup – Stack Overflow

–jeroen

Posted in Development, Encoding, Inno Setup ISS, Installer-Development, Software Development, Unicode | Leave a Comment »

Dark corners of Unicode / fuzzy notepad

Posted by jpluimers on 2017/04/20

You think you know Unicode? Think again, then read [Wayback] Dark corners of Unicode / fuzzy notepad.

On basics, sorting, comparison, decomposition, composition, width, whitespace, encoding, emoji, interesting code planes and dark corners. Lots of dark corners.

The examples are in Python, but hold for almost any programming language

–jeroen

via: Kristian Köhntopp

Posted in Conference Topics, Conferences, Development, Encoding, Event, Software Development, Unicode | Leave a Comment »

Encoding is hard… so how did the single quote become a circumflexed a followed by Euro sign and trade mark?

Posted by jpluimers on 2016/10/04

A while ago (in fact more than a year), I posted Encoding is hard…  go G+ with the below picture.

[Wayback] ftfy (“fixes text for you”, a parody on “fixed that for you”) [Wayback] fixes it, but:

How did the single quote become “’“?

Actually, because of a a common “beautification” of many Office suites (Microsoft and Open alike), the single quote was a special one: a Unicode Character ‘RIGHT SINGLE QUOTATION MARK’ (U+2019) which in UTF-8 is encoded as 0xE2 0x80 0x99.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, ISO-8859, ISO8859, Mojibake, Software Development, Unicode, UTF-8, UTF8, Windows-1252 | Leave a Comment »

installing the UTF-8 encoding ftfy (fixes text for you) – via version 3.0 | Luminoso Blog

Posted by jpluimers on 2016/09/06

Simple if you know it:

pip install ftfy

That installs it as a command which is a lot easier than using it from Github at [Waybackhttps://github.com/LuminosoInsight/python-ftfy

It knows how to solve the encoding issues in [Archive.is]  the future of publishing at W3C explaining about WTF-8 and Unicode history.

It didn’t solve my non-Unicode encoding issue: [Wayback] “v3/43/4r” -> “v¾¾r” -> “vóór”.

Read the rest of this entry »

Posted in Development, Encoding, ftfy, Mojibake, Software Development, Unicode, UTF-8, UTF8 | 4 Comments »

Some interesting encoding/Unicode/text articles on kunststube and links for test files of various encodings

Posted by jpluimers on 2016/08/17

After yesterdays post on Testing and static methods don’t go well together, I read around on Source (kunststube [WayBack]) a bit more and found these very nice articles on encoding,Unicode and text:

Related on those, some other nice readings:

–jeroen

Posted in Ansi, ASCII, CP437/OEM 437/PC-8, Development, EBCDIC, Encoding, ISO-8859, ISO8859, Shift JIS, Software Development, Unicode, UTF-16, UTF-8, UTF16, UTF8, Windows-1252 | Leave a Comment »

Graphical emoji are killing Unicode

Posted by jpluimers on 2016/08/05

Unicode is about Glyphs that are used in writing. Have you ever seen the emoji on the right being written like this?

This has been bothering me a while and gets worse over time.

According to: Microsoft just changed its toy gun emoji to a real pistol:

Looks like Microsoft and Apple may not be on the same page about firearm emojis afterall. Right after Apple changed its gun emoji to a water pistol in iOS 10, Microsoft replaced its toy pistol emoji with an actual revolver.

While Apple and Microsoft have gone back to edit their symbols, Google continues to use a pistol in Android keyboards and doesn’t appear to have plans to change this. None of the companies in question have adjusted their knife, sword, bomb, poison and coffin emojis, so… ¯\_(ツ)_/¯

When vendors start prescribing how emojis must look like (influenced by all sorts of emotions) without the user allowing to choose (via a font – that’s what fonts are for!) how they look then it invalidates the whole Unicode principle:

Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world’s writing systems.

These emoji aren’t text and should be gone from the Unicode standard before they can do more harm.

Will the next step be that vendors define their own colours for certain characters in fonts? For Windows Times New Roman A becomes red, B green, C yellow, but in Courier New we’ll permute these colours and all Operating Systems and Versions will do different random colour choices.

–jeroen

via:

Posted in Development, Encoding, Opinions, Software Development, Unicode | Leave a Comment »