The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My work

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,842 other followers

Delphi – HIGHCHARUNICODE directive (Delphi) – RAD Studio

Posted by jpluimers on 2010/01/18

I forgot about it, but this thread (which got wiped by Embarcadero) reminded be about the differences between these two character values.

Quoting from the first post:

c1 := #128;
c2 := chr(128);
Assert(c1 = c2);

the assertion fails, meaning that c1 <> c2.

In fact c1 = #$20AC and c2 = #$80.

Since Chr is a pseudo-function that does a conversion from an integer to a Unicode character, c2 ends up as Unicode codepoint U+0080, whereas c1 gets converted from the AnsiChar value 0x80 (the [WayBack] Euro Sign in a lot of Ansi codepages) into Unicode codepoint U+20AC.

[Way Back] Allen Bauer correctly mentioned that in order to define a character constant as a true Unicode codepoint, you have to use 4 hexadecimal digits:

c1 := #$0080;
c2 := chr(128);
Assert(c1 = c2);

This syntax with 4 hexadecimal digits is backwards compatible: with the above code, Pre-Delphi-2009 compilers, will get Ansi codepoint 128.

If you cannot rely on the encoding of your Delphi source files (for instance because your version control system mangles them, or for other reasons) that is the only way to go, hence my SO answer on [WayBack] Wrong Unicode conversion, how to store accent characters in Delphi 2010 source code and handle character sets?

Don’t rely on the encoding of your Delphi source code files.

It might be mangled when using any non-Unicode tool to work on your text files (or even buggy Unicode aware tools).

The best way is to specify your characters as a 4-digit Unicode code point.

const
   MyEuroSign = #$20AC;

A few more notes:

Here you can find a few of the Unicode codepoints (thanks [WayBack] Thomas Schild!):

[Way Back] Rudy Velthuis explains that you can automagically force the Delphi compiler to always use Unicode codepoints using the $HIGHCHARUNICODE directive (I didn’t know that <g>). That is not always what you want though. So it is better to expand your character constants into 4 hexadecimal digits.

See: [Archive.is] HIGHCHARUNICODE directive (Delphi) – RAD Studio (which got first fully documented in XE3, as the 2009 documentation left out the #xxx case).

Some more people that got bitten by this

–jeroen

2 Responses to “Delphi – HIGHCHARUNICODE directive (Delphi) – RAD Studio”

  1. Very important to note here, is that you are actually working with VARIABLES and not with CONSTANTS. For constants you have to realize how types are deferred:

    const
    c1 = #$80;

    var
    c2 : char = #$80;

    assert(c1=c2);

    The above assertion will fail as the deferred type for c1 will be ANSICHAR! (c2 will be #$20AC as you pointed out) For me both character representations appear to equal ( the Euro sign €), but in other parts of the world c1 may be something different – i.e. assert has to fail, as we don’t check the locale here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: