The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,860 other subscribers

Archive for the ‘Encoding’ Category

Michael Kaplan Obituary – Berkowitz-Kumin-Bookatz | Cleveland Heights OH (and a whole bunch of info in zero width Unicode stuff)

Posted by jpluimers on 2018/01/02

I totally missed the passing of Michael Scott Kaplan some 2 years ago, so a belated R.I.P. is in place.

Obituary for Michael Kaplan, Michael Scott Kaplan, 45, passed away Wednesday, October 21, 2015, in Redmond, WA, after a brave battle with MS for 25 years. He was a lead software developer for Microsoft.

Source: [WayBackMichael Kaplan Obituary – Berkowitz-Kumin-Bookatz | Cleveland Heights OH

Michael was the leading source on i18n, L10N, Unicode, sorting, normalisation and other things having to do with languages, representations and writing.

Besides that he was a really nice guy of which I enjoyed his MSDN materials.

Other people enjoy that too, so I’m glad his writings have been archived: [first archive.is, second archive.is, WayBackSorting it All Out: Archives

Here are some additional links:

More on miloush.net:

Read the rest of this entry »

Posted in Ansi, Development, Encoding, internatiolanization (i18n) and localization (l10), Software Development, The Old New Thing, UTF-8, UTF8, Windows Development | Leave a Comment »

Valid reasons for having Delphi AnsiString on Mobile platform…not only for Internet but for Shaders also. //…

Posted by jpluimers on 2017/12/27

It’s too bad that you need workarounds to get ByteStrings working on mobile devices as there are APIs there (like shaders) that work best with them.

There was a nice discussion on this last year at [WayBack] I miss AnsiString on Mobile…not only for Internet but for Shaders also.// FMX.Context.GLES.pasconstGLESHeaderHigh: array [0..24] of byte =(Byte(‘p), … – Paul TOTH – Google+ based in this code example in the FMX library undocumented unit FMX.Context.GLES:

// FMX.Context.GLES.pas

const
  GLESHeaderHigh: array [0..24] of byte =
    (Byte('p'), Byte('r'), Byte('e'), Byte('c'), Byte('i'), Byte('s'), Byte('i'), Byte('o'), Byte('n'), Byte(' '),
     Byte('h'), Byte('i'), Byte('g'), Byte('h'), Byte('p'), Byte(' '), Byte(' '), Byte(' '), Byte('f'), Byte('l'),
     Byte('o'), Byte('a'), Byte('t'), Byte(';'), Byte(#13));

There are more than 500 places in the Delphi library sources that uses this construct and even more that do other fiddling (like [WayBackTEncoding.GetBytes) to get from strings to bytes.

I wonder if by now we still need the workarounds that Andreas Hausladen provides:

–jeroen

Posted in Conference Topics, Conferences, Delphi, Development, Encoding, Event, Software Development | 6 Comments »

Long read about Unicode: You, Me And The Emoji: Character Sets, Encoding And Emoji – Smashing Magazine

Posted by jpluimers on 2017/11/07

A well worth long rad:

We all recognize emoji. They’ve become the global pop stars of digital communication. But what are they, technically speaking? And what might we learn by taking a closer look at these images, characters, pictographs… whatever they are 🤔 (Thinking Face). We will dig deep to learn about how these thingamajigs work. Please note: Depending on your browser, you may not be able to see all emoji featured in this article (especially the Tifinagh characters). Also, different platforms vary in how they display emoji as well. That’s why the article always provides textual alternatives. Don’t let it discourage you from reading though! Now, let’s start with a seemingly simple question. What are emoji?

[WayBackYou, Me And The Emoji: Character Sets, Encoding And Emoji – Smashing Magazine

Via: [WayBack] Everything you ever wanted to know about characters, encodings, glyphs… and, oh yeah, emoji: bit.ly/2fNKeW3Long, rewarding read. – Ilya Grigorik – Google+

Here is just the ToC:

TABLE OF CONTENTS LINK

  1. Character Sets And Document Encoding: An Overview
    1. Characters
    2. Character Sets
    3. Coded Character Sets
    4. Encoding
  2. Declaring Character Sets And Document Encoding On The Web
    1. content-type HTTP Header Declaration
    2. Checking HTTP Headers Using A Browser’s Developer Tools
    3. Checking HTTP Headers Using Web-based Tools
    4. Using A Meta Element With charset Attribute
    5. An Encoding By Any Other Name
  3. What Were We Talking About Again? Oh Yeah, Emoji!
    1. So What Are Emoji?
    2. How Do We Use Emoji?
    3. Character References
    4. Glyphs
    5. How Do We Know If We Have These Symbols?
    6. The Great Emoji Proliferation Of 2016
  4. Emoji OS Support
    1. Emoji Support: Apple Platforms (macOS and iOS)
    2. Emoji Support: Windows
    3. Emoji Support: Linux
    4. Emoji Support: Android
  5. Emoji On The Web
    1. Emoji One
    2. Twemoji
  6. Conclusion

–jeroen

Posted in ASCII, Development, Encoding, ISO-8859, ISO8859, Shift JIS, Unicode, UTF-16, UTF-8, UTF16, UTF8, Windows-1252 | Leave a Comment »

Until someone writes proper string visualisers for the Delphi debugger…

Posted by jpluimers on 2017/10/31

A few tricks to write long strings to files when the Delphi debugger cuts them off (just because they like using 4k buffers internally);

  • TStringStream.Create(lRequestMessage).SaveToFile('c:\temp\temp.txt')
  • TIniFile.Create('c:\a.txt').WriteString('a','a',BigStringVar)
  • TFileStream.Create('c:\a.txt', fmCreate or fmShareDenyNone).WriteBuffer(Pointer(TEncoding.UTF8.GetBytes(BigStringVar))^,Length(TEncoding.UTF8.GetBytes(BigStringVar)))

They all work form the debug inspector, but they do leak memory. See comments below.

Via:

–jeroen

Read the rest of this entry »

Posted in About, Conference Topics, Conferences, Delphi, Development, Encoding, Event, Software Development | 6 Comments »

cURL – POST an XML file as a stream

Posted by jpluimers on 2017/10/25

I hope I’m not alone on this but I find the cURL documentation hard to follow and short on examples.

My goal was to mimic some HTTP XML posting traffic a server gets from IoT devices. Google Chrome Postman (or Postman REST Client) reproduction is very easy and will send.

TL;DR

  1. ensure you have an empty --header "Content-Type:" header: this ensures that cURL doesn’t add one and does not mess on how the content is being transferred.
  2. use the --data or --data-binary command with an @ to post a file as body.
  3. if you want --write-out then be sure you have a recent cURL version.

This is how the IoT or Postman will send.

  • Post headers like these:

Host:127.0.0.1:8080
Content-Length: 245
Connection:Keep-Alive

  • Content like this:


<?xml version="1.0"?>
<Root Attribute="value">
<Branch>
<Leaf>content</Leaf>
</Branch>
<Branch Attribute="value">
<Bough Attribute="value">
<Twig Attribute="value">
<Leaf Attribute="value"/>
</Twig>
</Bough>
</Branch>
</Root>

The data is being streamed to the HTTP server even with the very limited set of headers.

I’ve been unable to come up with exact cURL statement that exactly matches the headers and way the content is being transferred.

This is what I tried (in all examples, %1 is the IPv4 address of the HTTP 1.1 server):

  • POST with the all the headers and the --data command:

curl --request POST --header "Host: %1:8080" --header "Content-Length: 245" --header "Connection: Keep-Alive" --data @httpPostSample.xml http://%1:8080/target

This will hang the connection: somehow cURL will never notify the upload is done and the HTTP server keeps waiting. When you put --verbose or --trace-ascii - on the command-line you will see something like this before hanging: * upload completely sent off: 245 out of 245 bytes.

Note the trick to emit the ASCII trace to stdout using --trace-ascii with the minus sign: thanks to [WayBack] Daniel Stenberg for answering [WayBackHow can I see the request headers made by curl when sending a request to the server? – Stack Overflow.

You can do the same with --trace which dumps all characters (not only ASCII) including their HEX representation

  • POST with the all but the Content-Length headers and the --data command:

curl --request POST --header "Host: %1:8080" --header "Connection: Keep-Alive" --data @httpPostSample.xml http://%1:8080/target

This will automatically add a Content-Length: 245 header and complete the transfer. But it will also add a Content-Type: application/x-www-form-urlencoded header causing the content not being posted as a body.

  • POST with a --form file= command:

curl --request POST --header "Host: %1:8080" --header "Connection: Keep-Alive" --form file=@httpPostSample.xml http://%1:8080/target

This will automatically ad a Content-Length: xxx header (way longer than 245) because it converts the request into a Content-Type: multipart/form-data; boundary=------------------------e1c0d47bac806954 one (the hex at the end differs) which is totally unlike what Postman does.

It is also unlike to what the HTTP server accepts.

curl --request POST --header "Host: %1:8080" --header "Connection: Keep-Alive" --data-binary @httpPostSample.xml http://%1:8080/target

curl –request POST –header “Host: %1:8080” –header “Connection: Keep-Alive” –data-binary @httpPostSample.xml http://%1:8080/target

It turns out that --data-ascii is exactly the same as --data and that --data-binary just skips some new-line conversion when compared to --data or --data-ascii. Contrary to the --data-raw documentation that suggest it is equivalent to --data-binary it seems --data-raw behaves exactly like --data and --data-ascii. Odd.

So these are all stuck with the Content-Type: application/x-www-form-urlencoded and I thought I was running out of options.

Then I found [WayBacksoundmonster had posted an answer at [WayBackhttp – What is the cURL command-line syntax to do a POST request? – Super User mentioning to add a Content-Type header.

So I changed the request to include the --header "Content-Type: text/xml; charset=UTF-8"  header:

  • curl --request POST --header "Content-Type: text/xml; charset=UTF-8" --header "Host: %1:8080" --header "Connection: Keep-Alive" --data @httpPostSample.xml http://%1:8080/target

This works. But: the Content-Type header is not present in the original request.

Finally it occurred to me: What if cURL would not insert a Content-Type header if I add an empty Content-Type header?.

That works!

  • curl --request POST --header "Content-Type:" --header "Host: %1:8080" --header "Connection: Keep-Alive" --data @httpPostSample.xml http://%1:8080/target

It posts exactly the same content as the IoT devices and Postman do.

Phew!

 

I tried to combine this with the --write-out (a.k.a. -w) option, but for older versions of cURL (I could reproduce with 7.34) that forces cURL back in to Content-Type: application/x-www-form-urlencoded mode so watch your cURL version!

Later I will put more research in chuncked transfer. Links that might help me:

–jeroen

Some of the references:

Posted in *nix, bash, cURL, Development, Encoding, Power User, Scripting, Software Development | Leave a Comment »

If Beyond Compare indicates “editing disabled” after starting from SourceTree, then your integration is wrong.

Posted by jpluimers on 2017/10/16

SourceTree 2.1 still doesn't recognise that Beyond Compare is installed.

SourceTree 2.1 still doesn’t recognise that Beyond Compare is installed.

I noticed that on my Mac, Beyond Compare wasn’t able to edit diffed files when it had been started from SourceTree. This struck me as odd since on Windows this worked fine. So I did a bit of digging and found out both SourceTee and I screwed up:

Editing Disabled

Luckily [WayBackZoë Peterson (lead developer on Beyond Compare and formerly Turbo Power Abbrevia project admin) had answered this before, and all these show “Editing disabled” in the user-interface:

Beyond Compare will disable editing of a file any of the following reasons:

  • It’s one of the input files in a 3-way merge
  • The comparison was cancelled
  • The comparison encountered an error (corrupt file, invalid character encoding, out of memory, gamma rays, etc)
  • The file format’s conversion settings don’t support converting back to the original format (MS Word, PDF)
  • The file is on a read-only “filesystem” (7zip/RAR/CHM archives, CD/DVD-ROMs)
  • A file or parent folder had editing explicitly disabled by the user in the session settings or using the /ro command line switches
  • The viewer itself doesn’t support editing (eg, Hex Compare)

Source: [WayBackversion control – Beyond Compare 3 editing disabled – Stack Overflow

So the last instruction should be:

Set both Visual Diff Tool and Merge Tool to Other, then set both the Diff Command and Merge Command to the value you obtained above (in my case /usr/local/bin/bcomp) and these arguments:

  1. Diff Command Arguments
    "$LOCAL" "$REMOTE"
  2. Merge Command Arguments
    "$LOCAL" "$REMOTE" "$BASE" "$MERGED"

Note that somewhere during 2.2, SourceTree has added Beyond Compare integration and fixed some of the issues, but there are still issues left:

No Editing Disabled

There is only one occasion where the UI does not show “Editing Disabled”, but where you cannot edit the file itself (you can only edit the current line in the line diff view at the bottom of the UI). Zoë mentioned that too:

Also, the Full Edit (F2) toggle in the Text Compare View menu switches between inline editing and line-based mode. If it’s disabled you can copy/delete whole lines and type in the line details edits at the bottom of the window, but the main windows won’t have a cursor, typing is disabled, and it will always select whole lines. Unlike the above items, this doesn’t show “Editing Disabled” in the status bar.

Source: [WayBackversion control – Beyond Compare 3 editing disabled – Stack Overflow

By default this setting is bound to the F2 key on Windows, so if you accidentally press that when Beyond Compare is active, you might be in for a surprise.

Screen shots:

Read the rest of this entry »

Posted in Beyond Compare, Encoding, Power User | Leave a Comment »

Encoding horror: Wayback Machine “Sorry.This snapshot cannot be displayed due to an internal error.”

Posted by jpluimers on 2017/10/13

Sorry.This snapshot cannot be displayed due to an internal error.

When the Wayback Machine tries to display the archived https://plus.google.com/+KristianKöhntopp/posts/2yw9QFgCdtx which is about Unicode encoding horror.

The real horror? This used to work in the past.

Luckily it’s archived on https://archive.fo/b36gn

–jeroen

Later: credit where credit is due, as they fixed it:

[WayBack] WayBack didn’t respond to me, but instead fixed the archival of +Kristian Köhntopp’s G+ posts:… – Jeroen Wiert Pluimers – Google+

https://web.archive.org/web/*/https://plus.google.com/+KristianK%C3%B6hntopp/posts/*

Posted in Development, Encoding, Internet, InternetArchive, Power User, Software Development, WayBack machine | Leave a Comment »

Looking for more examples of Unicode/Ansi oddities in Delphi 2009+

Posted by jpluimers on 2017/09/25

At the end of April 2014, Roman Yankovsky started a nice [Wayback] discussion on Google+ trying to get upvotes for [Wayback] QualityCentral Report #:  124402: Compiler bug when comparing chars.

His report basically comes down to that when using Ansi character literals like #255, the compiler treats them as single-byte encoded characters in the current code page of your Windows context, translates them to Unicode, then processes them.

The QC report has been dismissed as “Test Case Error” (within 15 minutes of stating “need more info”) by one of the compiler engineers, directing to the [Wayback] UsingCharacterLiterals section of Delphi in a Unicode World Part III: Unicodifying Your Code where – heaven forbid – they suggest to replace #128 with the Euro-Sign literal.

I disagree, as the issue happens without any hint or warning whatsoever, and causes code that compiles fine in Delphi <= 2007 to fail in subtle ways on Delphi >= 2009.

The compiler should issue a hint or warning when you potentially can screw up. It doesn’t. Not here.

Quite a few knowledgeable Delphi people got involved in the discussion:

Read the rest of this entry »

Posted in Ansi, ASCII, Conference Topics, Conferences, CP437/OEM 437/PC-8, Delphi, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi 7, Delphi XE, Delphi XE2, Delphi XE3, Delphi XE4, Delphi XE5, Delphi XE6, Development, Encoding, Event, ISO-8859, Missed Schedule, QC, SocialMedia, Software Development, Unicode, UTF-8, Windows-1252, WordPress | Leave a Comment »

Some Inno Setup notes

Posted by jpluimers on 2017/08/30

While updating at a client site a hugely out of date Inno Setup directory tree and instructions combo (docs mentioning isetup-2.0.19.exe, isxsetup2.exe, istool-3.0.0.exe but using ispack-5.3.10.exe) I made a few notes:

Source files I need to figure out if the are needed, where they originally come from and which actual version should be used:

The vcredist_x86_2010.exe was actually the Visual C++ 2010 SP1 one with version 10.0.40219.1, not the RTM one with version 10.0.30319.1.

I need to figure out this error message that occurs every now and then:

---------------------------
Error
---------------------------
ShellExecuteEx failed; code 1460.
This operation returned because the timeout period expired.
---------------------------
OK
---------------------------

I need to catch up on many things having to do with the [Code] section:

It pays off to split your [Code] section in at least three parts:

  1. A part having the Setup event functions
  2. A part having the Pascal Scripting: Scripted Constants functions
  3. A part having your own utility functions

There is no {code:...} way of getting the value of OutputBaseFileName, but you can use

Not all places can use {code:...} expansion, so you might want to use the preprocessor ispp (which stands for Inno Setup Preprocessor).

It was a bit hard to find if/when ispp was available as that has changed over the years as it used to be a separate product. From some Inno Setup 4.x or 5.x version up, it is available in the core product, possibly enabled by default (reading Inno Setup Help – Script Format Overview I’m still not sure) but to make sure it is enabled, just add this line at the start of your script files:

#preproc ispp

With the pre-processor, you can do things ike this.

Without the pre-processor, this will fail in the [Files] section with an error containig “unknown filename prefix”:

Source: Service\{code:GetServiceExe}; DestDir: {app}; ... BeforeInstall: DoBeforeInstallForService({code:GetServiceName})

With the pre-processor, you can replace it with this:

#preproc ispp

#define cServiceExe = "SomeWeirdExeName.exe"
#define cServiceName = "SomeWeirdServiceName"

...

Source: Service\{#cGetServiceExe}; DestDir: {app}; ... BeforeInstall: DoBeforeInstallForService('{#cServiceName}')

If you forget the single quotes around {#cServiceName} then you get this very weird error for which Googling “Can only call function” “ExpandConstant” “within parameter lists.” will return no satisfactory results:

[Window Title]
Error

[Main Instruction]
Compiler Error

[Content]
Line 91:
Directive or parameter "BeforeInstall" expression error: Can only call function "ExpandConstant" within parameter lists.

[OK]

Of course the pre-processor syntax is different from the Pascal Script syntax, so this won’t work:

#define cVersion="1.2.3.4"

#define cOutputDir="..\Output-{#cVersion}"

It needs to be this (via Inno Setup – #define directive – how to use previously defined variable? – Stack Overflow):

#define cOutputDir="..\Output-"+cVersion

Importing Windows functions from DLLs

Now that there is both an Ansi and Unicode version of Inno Setup, lots of scripts you find on the interwebz need modification: they import ANSI versions from various DLLs but now need to check the Inno Setup Pre-Processor pre-defined variable UNICODE.

Those predefined variables are listed here: Inno Setup Preprocessor: Predefined Variables

You use it like in the CodeDll.iss example:

//importing a Windows API function, automatically choosing ANSI or Unicode (requires ISPP)
function MessageBox(hWnd: Integer; lpText, lpCaption: String; uType: Cardinal): Integer;
#ifdef UNICODE
external 'MessageBoxW@user32.dll stdcall';
#else
external 'MessageBoxA@user32.dll stdcall';
#endif

I learned this the hard way inheriting a bunch of code that would install services and failing on one service manager call with a GetLastError code ERROR_INVALID_NAME a.k.a. 123 (0x7B). I found it was the first OpenSCManager API call but since the code did not have any error handling at all tracking that down took quite some effort that failed. It would not with the documented ERROR_ACCESS_DENIED a.k.a. 5 (0x5) and  ERROR_DATABASE_DOES_NOT_EXIST a.k.a. 1065 (0x429) codes.

Of course OpenSCManager ServicesActive 0x0000007B nor OpenSCManager Error 123 didn’t return meaningful pages.

There were some mentions of invalid registry keys but those didn’t make sense to me at that time. Only after fiddling a lot I found the ROpenSCManagerW that mentioned Unicode, the ERROR_INVALID_NAME and ERROR_SHUTDOWN_IN_PROGRESS a.k.a. 1115 (0x45B). Apparently the lpDatabaseName parameter wasn’t interpreted correctly. Thad made sense as passing the 'ServicesActive' as Unicode string where the the import uses Ansi will see the string as an alternating series of ANSI character bytes and null bytes and stop after the first S.

The fix was easy: apply the above #ifdef UNICODE logic and import the function either using W@ or A@ depending on the mode.

Later I found out the code was borrowed without attribution nor mentioning the ANSI limitation from installation – upgrading windows service using inno setup – Stack Overflow. This all the more illustrates that when you borrow code from the internet you should attribute it and ensure the limitations are mentioned near your code.

Logging

Logging involves a few things:

  1. Call the Log method: Inno Setup Help – Pascal Scripting: Log
  2. Enable logging using either
  3. Inspect the log file in your %TEMP% directory (files are named like Setup Log 2016-07-12 #001.log)
  4. Note that I wish there was a Log function with parameters similar to Format, but since the underlying Pascal Script language does not allow overloads, I tried to introduce a LogFormat function instead but found out that Pascal Script doesn’t like array of const parameters (the code below fails with an identifier expected error on the const keyword) for which I asked if I can report a bug:


function LogFormat(const AFormat: string; const AArgs: array of const): string;
begin
Log(Format(AFormat, AArgs));
end;

view raw

logformat.iss

hosted with ❤ by GitHub

There is an undocumented UsingWinNT function originating from the non-NT era that is sometimes used for detecting Windows versions (2K, XP, Vista, 7, 8, 8.1, 10, etc) and for fiddling with Windows Services with or without using the ServicesActive database name.

Luckily these functions exist:

Exiting and rolling prematurely

There are various ways the interwebz suggest you to exit an Inno Setup script prematurely, but most of them do not do a proper rollback/cleanup of the install.

These are bad (don’t cleanup/rollback):

Note Abort only works in these events (thanks Iepe):

InitializeSetup
InitializeWizard
CurStepChanged(ssInstall)
InitializeUninstall
CurUninstallStepChanged(usAppMutexCheck)
CurUninstallStepChanged(usUninstall)

Mahris has answered a nice workaround in installer – Inno Setup: How to Abort/Terminate Setup During Install? – Stack Overflow:

[Files]
Source: "MYPROG.EXE"; DestDir: "{app}"; AfterInstall: MyAfterInstall

[Code]
var CancelWithoutPrompt: boolean;

function InitializeSetup(): Boolean;
begin
  CancelWithoutPrompt := false;
  result := true;
end;

procedure MyAfterInstall();
begin
  (Do something)
  if BadResult then begin
    MsgBox('Should cancel because...',mbError,MB_OK)
    CancelWithoutPrompt := true;
    WizardForm.Close;
  end;
end;

procedure CancelButtonClick(CurPageID: Integer; var Cancel, Confirm: Boolean);
begin
  if CurPageID=wpInstalling then
    Confirm := not CancelWithoutPrompt;
end;

x64 versus x86

Since Inno Setup supports both Win32 and Win64, you can use it to install the right flavour of dependencies, for instance installer – Install correct version of Firebird (32bit or 64bit) with Inno Setup – Stack Overflow

–jeroen

Posted in Development, Encoding, Inno Setup ISS, Installer-Development, Software Development, Unicode | Leave a Comment »

OpenSuSE Tumbleweed – testing the password of any user with getent and openssl

Posted by jpluimers on 2017/06/21

For one of my VMs I forgot to note which of the initial password I had changed, so I wanted to check them.

Since I didn’t have a keyboard attached to the console and ssh wasn’t allowing root, I needed an alternative than actual login to test the passwords.

Luckily /etc/shadow, with getent and openssl came to the rescue.

Since getent varies per distribution, here is how it works on OpenSuSE:

Read the rest of this entry »

Posted in *nix, *nix-tools, ash/dash, bash, bash, Development, Encoding, Hashing, Linux, md5, openSuSE, Power User, Scripting, Security, SHA, SHA-256, SHA-512, Software Development, SuSE Linux | Leave a Comment »