The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My work

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,308 other followers

Archive for the ‘Unicode’ Category

Long read about Unicode: You, Me And The Emoji: Character Sets, Encoding And Emoji – Smashing Magazine

Posted by jpluimers on 2017/11/07

A well worth long rad:

We all recognize emoji. They’ve become the global pop stars of digital communication. But what are they, technically speaking? And what might we learn by taking a closer look at these images, characters, pictographs… whatever they are 🤔 (Thinking Face). We will dig deep to learn about how these thingamajigs work. Please note: Depending on your browser, you may not be able to see all emoji featured in this article (especially the Tifinagh characters). Also, different platforms vary in how they display emoji as well. That’s why the article always provides textual alternatives. Don’t let it discourage you from reading though! Now, let’s start with a seemingly simple question. What are emoji?

[WayBackYou, Me And The Emoji: Character Sets, Encoding And Emoji – Smashing Magazine

Via: [WayBack] Everything you ever wanted to know about characters, encodings, glyphs… and, oh yeah, emoji: bit.ly/2fNKeW3Long, rewarding read. – Ilya Grigorik – Google+

Here is just the ToC:

TABLE OF CONTENTS LINK

  1. Character Sets And Document Encoding: An Overview
    1. Characters
    2. Character Sets
    3. Coded Character Sets
    4. Encoding
  2. Declaring Character Sets And Document Encoding On The Web
    1. content-type HTTP Header Declaration
    2. Checking HTTP Headers Using A Browser’s Developer Tools
    3. Checking HTTP Headers Using Web-based Tools
    4. Using A Meta Element With charset Attribute
    5. An Encoding By Any Other Name
  3. What Were We Talking About Again? Oh Yeah, Emoji!
    1. So What Are Emoji?
    2. How Do We Use Emoji?
    3. Character References
    4. Glyphs
    5. How Do We Know If We Have These Symbols?
    6. The Great Emoji Proliferation Of 2016
  4. Emoji OS Support
    1. Emoji Support: Apple Platforms (macOS and iOS)
    2. Emoji Support: Windows
    3. Emoji Support: Linux
    4. Emoji Support: Android
  5. Emoji On The Web
    1. Emoji One
    2. Twemoji
  6. Conclusion

–jeroen

Posted in ASCII, Development, Encoding, ISO-8859, ISO8859, Shift JIS, Unicode, UTF-16, UTF-8, UTF16, UTF8, Windows-1252 | Leave a Comment »

Looking for more examples of Unicode/Ansi oddities in Delphi 2009+

Posted by jpluimers on 2017/09/25

At the end of April 2014, Roman Yankovsky started a nice discussion on Google+ trying to get upvotes for QualityCentral Report #:  124402: Compiler bug when comparing chars.

His report basically comes down to that when using Ansi character literals like #255, the compiler treats them as single-byte encoded characters in the current code page of your Windows context, translates them to Unicode, then processes them.

The QC report has been dismissed as “Test Case Error” (within 15 minutes of stating “need more info”) by one of the compiler engineers, directing to the UsingCharacterLiterals section of Delphi in a Unicode World Part III: Unicodifying Your Code where – heaven forbid – they suggest to replace #128 with the Euro-Sign literal.

I disagree, as the issue happens without any hint or warning whatsoever, and causes code that compiles fine in Delphi <= 2007 to fail in subtle ways on Delphi >= 2009.

The compiler should issue a hint or warning when you potentially can screw up. It doesn’t. Not here.

Quite a few knowledgeable Delphi people got involved in the discussion:

Read the rest of this entry »

Posted in Ansi, ASCII, CP437/OEM 437/PC-8, Delphi, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi 7, Delphi XE, Delphi XE2, Delphi XE3, Delphi XE4, Delphi XE5, Delphi XE6, Development, Encoding, ISO-8859, QC, Software Development, Unicode, UTF-8, Windows-1252 | Leave a Comment »

Some Inno Setup notes

Posted by jpluimers on 2017/08/30

While updating at a client site a hugely out of date Inno Setup directory tree and instructions combo (docs mentioning isetup-2.0.19.exe, isxsetup2.exe, istool-3.0.0.exe but using ispack-5.3.10.exe) I made a few notes:

Source files I need to figure out if the are needed, where they originally come from and which actual version should be used:

The vcredist_x86_2010.exe was actually the Visual C++ 2010 SP1 one with version 10.0.40219.1, not the RTM one with version 10.0.30319.1.

I need to figure out this error message that occurs every now and then:

---------------------------
Error
---------------------------
ShellExecuteEx failed; code 1460.
This operation returned because the timeout period expired.
---------------------------
OK
---------------------------

I need to catch up on many things having to do with the [Code] section:

It pays off to split your [Code] section in at least three parts:

  1. A part having the Setup event functions
  2. A part having the Pascal Scripting: Scripted Constants functions
  3. A part having your own utility functions

There is no {code:...} way of getting the value of OutputBaseFileName, but you can use

Not all places can use {code:...} expansion, so you might want to use the preprocessor ispp (which stands for Inno Setup Preprocessor).

It was a bit hard to find if/when ispp was available as that has changed over the years as it used to be a separate product. From some Inno Setup 4.x or 5.x version up, it is available in the core product, possibly enabled by default (reading Inno Setup Help – Script Format Overview I’m still not sure) but to make sure it is enabled, just add this line at the start of your script files:

#preproc ispp

With the pre-processor, you can do things ike this.

Without the pre-processor, this will fail in the [Files] section with an error containig “unknown filename prefix”:

Source: Service\{code:GetServiceExe}; DestDir: {app}; ... BeforeInstall: DoBeforeInstallForService({code:GetServiceName})

With the pre-processor, you can replace it with this:

#preproc ispp

#define cServiceExe = "SomeWeirdExeName.exe"
#define cServiceName = "SomeWeirdServiceName"

...

Source: Service\{#cGetServiceExe}; DestDir: {app}; ... BeforeInstall: DoBeforeInstallForService('{#cServiceName}')

If you forget the single quotes around {#cServiceName} then you get this very weird error for which Googling “Can only call function” “ExpandConstant” “within parameter lists.” will return no satisfactory results:

[Window Title]
Error

[Main Instruction]
Compiler Error

[Content]
Line 91:
Directive or parameter "BeforeInstall" expression error: Can only call function "ExpandConstant" within parameter lists.

[OK]

Of course the pre-processor syntax is different from the Pascal Script syntax, so this won’t work:

#define cVersion="1.2.3.4"

#define cOutputDir="..\Output-{#cVersion}"

It needs to be this (via Inno Setup – #define directive – how to use previously defined variable? – Stack Overflow):

#define cOutputDir="..\Output-"+cVersion

Importing Windows functions from DLLs

Now that there is both an Ansi and Unicode version of Inno Setup, lots of scripts you find on the interwebz need modification: they import ANSI versions from various DLLs but now need to check the Inno Setup Pre-Processor pre-defined variable UNICODE.

Those predefined variables are listed here: Inno Setup Preprocessor: Predefined Variables

You use it like in the CodeDll.iss example:

//importing a Windows API function, automatically choosing ANSI or Unicode (requires ISPP)
function MessageBox(hWnd: Integer; lpText, lpCaption: String; uType: Cardinal): Integer;
#ifdef UNICODE
external 'MessageBoxW@user32.dll stdcall';
#else
external 'MessageBoxA@user32.dll stdcall';
#endif

I learned this the hard way inheriting a bunch of code that would install services and failing on one service manager call with a GetLastError code ERROR_INVALID_NAME a.k.a. 123 (0x7B). I found it was the first OpenSCManager API call but since the code did not have any error handling at all tracking that down took quite some effort that failed. It would not with the documented ERROR_ACCESS_DENIED a.k.a. 5 (0x5) and  ERROR_DATABASE_DOES_NOT_EXIST a.k.a. 1065 (0x429) codes.

Of course OpenSCManager ServicesActive 0x0000007B nor OpenSCManager Error 123 didn’t return meaningful pages.

There were some mentions of invalid registry keys but those didn’t make sense to me at that time. Only after fiddling a lot I found the ROpenSCManagerW that mentioned Unicode, the ERROR_INVALID_NAME and ERROR_SHUTDOWN_IN_PROGRESS a.k.a. 1115 (0x45B). Apparently the lpDatabaseName parameter wasn’t interpreted correctly. Thad made sense as passing the 'ServicesActive' as Unicode string where the the import uses Ansi will see the string as an alternating series of ANSI character bytes and null bytes and stop after the first S.

The fix was easy: apply the above #ifdef UNICODE logic and import the function either using W@ or A@ depending on the mode.

Later I found out the code was borrowed without attribution nor mentioning the ANSI limitation from installation – upgrading windows service using inno setup – Stack Overflow. This all the more illustrates that when you borrow code from the internet you should attribute it and ensure the limitations are mentioned near your code.

Logging

Logging involves a few things:

  1. Call the Log method: Inno Setup Help – Pascal Scripting: Log
  2. Enable logging using either
  3. Inspect the log file in your %TEMP% directory (files are named like Setup Log 2016-07-12 #001.log)
  4. Note that I wish there was a Log function with parameters similar to Format, but since the underlying Pascal Script language does not allow overloads, I tried to introduce a LogFormat function instead but found out that Pascal Script doesn’t like array of const parameters (the code below fails with an identifier expected error on the const keyword) for which I asked if I can report a bug:

There is an undocumented UsingWinNT function originating from the non-NT era that is sometimes used for detecting Windows versions (2K, XP, Vista, 7, 8, 8.1, 10, etc) and for fiddling with Windows Services with or without using the ServicesActive database name.

Luckily these functions exist:

Exiting and rolling prematurely

There are various ways the interwebz suggest you to exit an Inno Setup script prematurely, but most of them do not do a proper rollback/cleanup of the install.

These are bad (don’t cleanup/rollback):

Note Abort only works in these events (thanks Iepe):

InitializeSetup
InitializeWizard
CurStepChanged(ssInstall)
InitializeUninstall
CurUninstallStepChanged(usAppMutexCheck)
CurUninstallStepChanged(usUninstall)

Mahris has answered a nice workaround in installer – Inno Setup: How to Abort/Terminate Setup During Install? – Stack Overflow:

[Files]
Source: "MYPROG.EXE"; DestDir: "{app}"; AfterInstall: MyAfterInstall

[Code]
var CancelWithoutPrompt: boolean;

function InitializeSetup(): Boolean;
begin
  CancelWithoutPrompt := false;
  result := true;
end;

procedure MyAfterInstall();
begin
  (Do something)
  if BadResult then begin
    MsgBox('Should cancel because...',mbError,MB_OK)
    CancelWithoutPrompt := true;
    WizardForm.Close;
  end;
end;

procedure CancelButtonClick(CurPageID: Integer; var Cancel, Confirm: Boolean);
begin
  if CurPageID=wpInstalling then
    Confirm := not CancelWithoutPrompt;
end;

x64 versus x86

Since Inno Setup supports both Win32 and Win64, you can use it to install the right flavour of dependencies, for instance installer – Install correct version of Firebird (32bit or 64bit) with Inno Setup – Stack Overflow

–jeroen

Posted in Development, Encoding, Inno Setup ISS, Installer-Development, Software Development, Unicode | Leave a Comment »

Dark corners of Unicode / fuzzy notepad

Posted by jpluimers on 2017/04/20

You think you know Unicode? Think again, then read Dark corners of Unicode / fuzzy notepad.

On basics, sorting, comparison, decomposition, composition, width, whitespace, encoding, emoji, interesting code planes and dark corners. Lots of dark corners.

–jeroen

via: Kristian Köhntopp

Posted in Development, Encoding, Software Development, Unicode | Leave a Comment »

Encoding is hard… so how did the single quote become a circumflexed a followed by Euro sign and trade mark?

Posted by jpluimers on 2016/10/04

A while ago (in fact more than a year), I posted Encoding is hard…  go G+ with the below picture.

ftfy (fixes text for you) fixes it, but:

How did the single quote become “’”?

Actually, because of a a common “beautification” of many Office suites (Microsoft and Open alike), the single quote was a special one: a Unicode Character ‘RIGHT SINGLE QUOTATION MARK’ (U+2019) which in UTF-8 is encoded as 0xE2 0x80 0x99.

Read the rest of this entry »

Posted in Development, Encoding, ISO-8859, ISO8859, Software Development, Unicode, UTF-8, UTF8, Windows-1252 | Leave a Comment »

 
%d bloggers like this: