The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My work

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,842 other followers

Looking for more examples of Unicode/Ansi oddities in Delphi 2009+

Posted by jpluimers on 2017/09/25

At the end of April 2014, Roman Yankovsky started a nice discussion on Google+ trying to get upvotes for QualityCentral Report #:  124402: Compiler bug when comparing chars.

His report basically comes down to that when using Ansi character literals like #255, the compiler treats them as single-byte encoded characters in the current code page of your Windows context, translates them to Unicode, then processes them.

The QC report has been dismissed as “Test Case Error” (within 15 minutes of stating “need more info”) by one of the compiler engineers, directing to the UsingCharacterLiterals section of Delphi in a Unicode World Part III: Unicodifying Your Code where – heaven forbid – they suggest to replace #128 with the Euro-Sign literal.

I disagree, as the issue happens without any hint or warning whatsoever, and causes code that compiles fine in Delphi <= 2007 to fail in subtle ways on Delphi >= 2009.

The compiler should issue a hint or warning when you potentially can screw up. It doesn’t. Not here.

Quite a few knowledgeable Delphi people got involved in the discussion:

Thee consensus here is that this is at least confusing (especially as there are differences between the HIGHCHARUNICODE OFF and ON modes of the compiler) even though Embarcadero keeps insisting this is “as designed”.

To proof either way, I’ve started to write some unit tests to see what succeeds and what fails, but need help, so:

Help needed: test cases

In my book, if existing Ansi based code fails in a Unicode Delphi compiler, and there was no hint or warning indicating potential failure, it is a compiler bug.

So I made some preliminary test cases, but need more.

There are two areas you can help with:

  1. Formulate simple tests (even a small console app or unit proving your point will do).
  2. Run the existing tests on various code pages (preferably outside USA and Western Europe) and/or Delphi versions so I can summarise results by codepage and Delphi version.

I’m specifically looking for code that works in Delphi <= 2007, and fails in Delphi >= 2009, where the compiler does not issue a hint or warning about failure.

But I also will consider code that generates a hint/warning and still succeeds, or code that gives a hint/warning then fails in a different way.

I know that Delphi <= 2007 had their specific share of codepage problems as well, so I’m not looking for those.

Results of the help

I have written a unit test generator, that – based on the above help – generates unit tests for all characters #0..#255 and #$00..#$FF, and maybe even #$0000..#$FFFF.

There is already my CodeGeneratorUnit.pas and demo CodeGenerator project that I demoed quite a while ago at the CodeRage session A Pragmatic & Powerful Code Generator with Generics and Anonymous Methods.

First result:

  • Currently there is one failure on CodePage 437: Character #128 fails:
    Expected #128 to equal TTestChar($0080), but equals TTestChar($20AC)., expected: <€> but was: <€>

    This is odd as #128 in CodePage 437 is an uppercase C cedilla (Ç). The Euro Sign is #128 in most Windows 125x encodings.
    This probably means that the way used by CHCP to obtain the CodePage is not the one that Delphi use. I will look into that soon; most likely, GetACP will work.

–jeroen

Related:

via [WayBackRoman Yankovsky – Google+ – I think I found a compiler issue, could you please vote for….:

For “if AChar <= #255 then” compiler (XE6) gently generates the following code:

005D734A 66817DFA4F04     cmp word ptr [ebp-$06],$044f

$044f does not equal 255!

But It works fine when Ord function is used:

if Ord(AChar) <= 255  then

005D7352 668B45FA         mov ax,[ebp-$06]
005D7356 663DFF00         cmp ax,$00ff

PS: Somehow this post missed schedule in 2014 (WordPress.com has a habit of that every now and then), but [WayBack] varc: Char;begincase c ofChar(#$C0)..Char(#$D6) : begin end;end;Why I get error:[dcc32 Error] E2011 Low bound exceeds high bound on Delphi Tokyo? – Jacek Laskowski – Google+ made me find it back.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: