The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,861 other subscribers

Archive for the ‘UTF8’ Category

Delphi “type types”: similar types but not the same type identity, some examples.

Posted by jpluimers on 2013/03/12

Few people know about a Delphi language feature that has been present since Delphi 1: prepending the type definition with a type keyword to make the type getting a new identity.

Each time I use it, I have to do some browsing for the consequences, and this time I wrote down some notes and created a small example program (source is also below).

This time I needed it when writing class wrappers on top of the Delphi bindings for WebSphere MQ.

WebSphere MQ has Queues where you can put and get messages. It also has Queue Managers to which you connect, and that provide queuing services and manages queues.

Both Queues and Queue Managers have names that can be up to 48 (single byte) characters long.
Those names mean totally different things, so though the have similar data types, they have a different identity.

The same holds for 20 byte character arrays (they can be used as names for ChannelNameShortConnectionName and MCAName). The 264 byte character array is so far used for ConnectionName only.

Distinguishing those types: That’s what “type types” in Delphi are all about. Read the rest of this entry »

Posted in CP437/OEM 437/PC-8, Delphi, Delphi 1, Delphi 2005, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi 3, Delphi 4, Delphi 5, Delphi 6, Delphi 7, Delphi 8, Delphi x64, Delphi XE, Delphi XE2, Delphi XE3, Development, Encoding, Shift JIS, Software Development, Unicode, UTF-8, UTF8 | 1 Comment »

Searchable UTF8 Unicode Characters

Posted by jpluimers on 2012/05/31

Brilliant: site where you can Search UTF8 Unicode Characters.

If you know part of the name of a Unicode character, you can now try and find it, then copy/paste it from that site.

Edit: 20160403: that site disappeared, but this one works: Unicode Character Search and Shapecatcher.com: Unicode Character Recognition still works.

–jeroen

Posted in Development, Encoding, Power User, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »

*nix – Mastering the VI editor

Posted by jpluimers on 2010/04/13

Every once in a while I need to do some text editing in a *nix environment that has a minimum toolset installed.

Which means: use VI.

VI is a versatile text editor from the early *nix days, but it is not straight forward to use.
Since I don’t use VI often enough, I tend to forget some of the commands.

Time to share my favourite VI link: Mastering the VI editor edit: link rot, now it is at Mastering the VI editor.

The link points to the basic stuff, but the page contains most of what you ever want to know about VI.

–jeroen

PS:
If possible, I install the JOE text editor on systems where I am admin.
JOE uses WordStar like key bindings, and supports UTF-8. Talking about “something old, something new” :-)

Posted in *nix, CommandLine, Development, Encoding, Power User, Software Development, Unicode, UTF-8, UTF8, vi | 1 Comment »

Web means Unicode

Posted by jpluimers on 2010/02/12

Google published an interesting graph generated from their internal data based on their indexed web pages.Encodings on the web

A quick summary of popular encodings based on the graph:

  1. Unicode – almost 50% and rapidly rising
  2. ASCII20% and falling
  3. Western European* – 20% and falling
  4. Rest – 10% and falling

Conclusion: if you do something with the web, make sure you support Unicode.

When you are using Delphi, and need help with transitioning to Unicode: contact me.

–jeroen

* Western European encodings: Windows-1252, ISO-8859-1 and ISO-8859-15.

Reference: Official Google Blog: Unicode nearing 50% of the web.

Edit: 20100212T1500

Some people mentioned (either in the comments or otherwise) that a some sites pretend they emit Unicode, but in fact they don’t.
This doesn’t relieve you from making sure you support Unicode: Don’t pretend you support Unicode, but do it properly!

Examples of bad support for Unicode are not limited to the visible web, but also applications talking to the web, and to webservices (one of my own experiences is explained in StUF – receiving data from a provider where UTF-8 is in fact ISO-8859: it shows an example where a vendor does Unicode support really wrong).

So: when you support Unicode, support it properly.

–jeroen

Posted in .NET, ASP.NET, C#, Database Development, Delphi, Development, Encoding, Firebird, IIS, InterBase, ISO-8859, ISO8859, Prism, SOAP/WebServices, Software Development, SQL Server, Unicode, UTF-8, UTF8, Visual Studio and tools, Web Development | 7 Comments »

Delphi – MD5: the MessageDigest_5 unit has been there since Delphi 2007

Posted by jpluimers on 2009/12/11

I still see a lot of people crafting their own MD5 implementation.
A lot of the existing MD5 implementations do not work well in Delphi 2009 and later (because they need to be adapted to Unicode).
Many of those existing implementations behave differently if you pass the same ASCII characters as AnsiString or UnicodeString.

The MessageDigest_5 unit has been available in Delphi since Delphi 2007.
This is the location relative to your installation directory: source\Win32\soap\wsdlimporter\MessageDigest_5.pas

(Edit: 20091223:  Since Delphi 7.01, Indy has provided the unit IdHashMessageDigest which also does md5, see the comments below)

So this unit used by the WSDL, and more importantly: works with Unicode (if you pass it a string with Unicode characters, it will convert them to UTF-8 first).
The unit is not in your default search path, and has not been very well promoted (the only link at the Embarcadero site was an article by Pawel Glowacki), so few people know about it.

Now you know too :-)

Note that MD5 is normally used to hash binary data.
It is not wise to send a non ASCII string through both the AnsiString and UnicodeString versions: because of the different encoding (and therefore a different binary representation), you will get different results depending on the Delphi version used.

A sample of the usage showing the above AnsiString/UnicodeString issue is not present for ASCII strings, nor for ANSI strings: this is because both get encoded using UTF-8 before hashing.
Delphi 2007 did not do the UTF-8 encoding, so you will see different results here.
You will also see that Writeln uses the Console for encoding, and those are different than the code editor.

Edit: 20091216 – added RawByteString example to show that the conversion does not matter.

<br />program md5;<br /><br />{$APPTYPE CONSOLE}<br /><br />uses<br /><%%KEEPWHITESPACE%%>  SysUtils,<br /><%%KEEPWHITESPACE%%>  MessageDigest_5 in 'C:\Program Files\Embarcadero\RAD Studio\7.0\source\Win32\soap\wsdlimporter\MessageDigest_5.pas';<br /><%%KEEPWHITESPACE%%>  // Vista/Windows 7: MessageDigest_5 in 'C:\Program Files (x86)\Embarcadero\RAD Studio\7.0\source\Win32\soap\wsdlimporter\MessageDigest_5.pas';<br /><br />function GetMd5(const Value: AnsiString): string; overload;<br />var<br /><%%KEEPWHITESPACE%%>  hash: MessageDigest_5.IMD5;<br /><%%KEEPWHITESPACE%%>  fingerprint: string;<br />begin<br /><%%KEEPWHITESPACE%%>  hash := MessageDigest_5.GetMD5();<br /><%%KEEPWHITESPACE%%>  hash.Update(Value);<br /><%%KEEPWHITESPACE%%>  fingerprint := hash.AsString();<br /><%%KEEPWHITESPACE%%>  Result := LowerCase(fingerprint);<br />end;<br /><br />function GetMd5(const Value: UnicodeString): string; overload;<br />var<br /><%%KEEPWHITESPACE%%>  hash: MessageDigest_5.IMD5;<br /><%%KEEPWHITESPACE%%>  fingerprint: string;<br />begin<br /><%%KEEPWHITESPACE%%>  hash := MessageDigest_5.GetMD5();<br /><%%KEEPWHITESPACE%%>  hash.Update(Value);<br /><%%KEEPWHITESPACE%%>  fingerprint := hash.AsString();<br /><%%KEEPWHITESPACE%%>  Result := LowerCase(fingerprint);<br />end;<br /><br />var<br /><%%KEEPWHITESPACE%%>  SourceAnsiString: AnsiString;<br /><%%KEEPWHITESPACE%%>  SourceUnicodeString: UnicodeString;<br /><%%KEEPWHITESPACE%%>  SourceRawByteString: RawByteString;<br /><br />begin<br /><%%KEEPWHITESPACE%%>  try<br /><%%KEEPWHITESPACE%%>    SourceAnsiString := 'foobar';<br /><%%KEEPWHITESPACE%%>    SourceUnicodeString := 'foobar';<br /><%%KEEPWHITESPACE%%>    SourceRawByteString := 'foobar';<br /><br /><%%KEEPWHITESPACE%%>    Writeln(GetMd5(SourceAnsiString));<br /><%%KEEPWHITESPACE%%>    Writeln(GetMd5(SourceUnicodeString));<br /><%%KEEPWHITESPACE%%>    Writeln(GetMd5(SourceRawByteString));<br /><br /><%%KEEPWHITESPACE%%>    SourceAnsiString := 'föøbår';<br /><%%KEEPWHITESPACE%%>    SourceUnicodeString := 'föøbår';<br /><%%KEEPWHITESPACE%%>    SourceRawByteString := 'föøbår';<br /><%%KEEPWHITESPACE%%>    Writeln(SourceAnsiString, ' ', GetMd5(SourceAnsiString));<br /><%%KEEPWHITESPACE%%>    Writeln(SourceUnicodeString, ' ', GetMd5(SourceUnicodeString));<br /><%%KEEPWHITESPACE%%>    Writeln(SourceRawByteString, ' ', GetMd5(SourceRawByteString));<br /><%%KEEPWHITESPACE%%>  except<br /><%%KEEPWHITESPACE%%>    on E: Exception do<br /><%%KEEPWHITESPACE%%>      Writeln(E.ClassName, ': ', E.Message);<br /><%%KEEPWHITESPACE%%>  end;<br />end.<br />

–jeroen

Posted in Delphi, Development, Encoding, Hashing, md5, Power User, Security, Software Development, Unicode, UTF-8, UTF8 | 28 Comments »

CodeRage 4 session material download locations changed – CodeCentral messed up

Posted by jpluimers on 2009/09/18

Somehow, CodeCentral managed to not only delete my uploaded CodeRage 4 session materials (the videos are fine!), but also newer uploads with other submissions.

Since I’m in crush mode to get the BASTA! and DelphiLive 2009 Germany sessions done, and all Embarcadero sites having to do with their membership server perform like a dead horse, I have temporary moved the downloads, and corrected my earlier post on the downloads.

I have notified Embarcadero of the problems, so I’m pretty sure they are being addressed on their side as well.
Edit 20090919: Embarcadero indicated that between 20090913 and 20090914 there has been a database restore on CodeCentral. Some entries therefore have been permanently lost.

When I get back from the conferences, and CodeCentral is more responsive, I’ll retry uploading it there, and correct the posts.
In the mean time, be sure not to save shortcuts to the current locations, as they are bound to change.

The are the new download locations can all be found in my xs4all temporary CodeRage 4 2009 download folder.

This is the full list of my CodeRage 4 sessions and places where you can download everything: Read the rest of this entry »

Posted in .NET, ASCII, CodeRage, CommandLine, Conferences, CP437/OEM 437/PC-8, Database Development, Debugging, Delphi, Development, Encoding, Event, Firebird, InterBase, ISO-8859, ISO8859, Prism, Software Development, SQL Server, Unicode, UTF-8, UTF8, Windows-1252 | 1 Comment »

CodeRage 4: session replays are online too!

Posted by jpluimers on 2009/09/13

Embarcadero has made available the replays of the CodeRage 4 sessions.
You can find them in the CodeRage 4 sessions overview.

In order to download them from that overview, NOTE: To access this session replay, you must be logged into EDN. you can login or sign-up (which is free).

To make it easier to find all the relevant downloads, below is an overview of my sessions and their links.

Let me know what you use it for, I’m always interested!

Update 20090918: changed the download locations because CodeCentral messed up.
Read the rest of this entry »

Posted in .NET, ASCII, C#, C# 2.0, CodeRage, CommandLine, Conferences, CP437/OEM 437/PC-8, Database Development, Debugging, Delphi, Development, Encoding, Event, Firebird, InterBase, ISO-8859, ISO8859, Java, Prism, Software Development, Unicode, UTF-8, UTF8, Visual Studio and tools, XML, XML/XSD, XSD | 4 Comments »

CodeRage 4: session “Using Unicode and Other Encodings in your Programs” video, chat and Q&A transcripts

Posted by jpluimers on 2009/09/11

Not only can you watch the video and download the CodeRage 4 session on materials on Using Unicode and Other Encodings in your Programs, but below you can also find the chat transcripts below.

VIP Room Transcript with Q&A

(9/11/2009 9:09:19 AM) The topic is: Session Room 2 – “Using Unicode and Other Encodings in your Programs” by Jeroen Pluimers


Public Room Transcript

(5:52:14 PM) Christine_Ellis has set the topic to: Session Room 2 – “Using Unicode and Other Encodings in your Programs” by Jeroen Pluimers
(9/11/2009 9:12:47 AM) Jeroen_Pluimers: I got a bunch of 406 error messages in the jibber chat client, so I was afraid it lost the connection :-)
(9/11/2009 9:15:45 AM) Jim_Ferguson: Jeroen, Have you been getting a bunch of internal errors when you get fancy with generics?
(9/11/2009 9:15:53 AM) Borland: BTW, DavidI, excellent internet radio choice of KPIG. Very good Blues.
(9/11/2009 9:36:43 AM) Mandy_Walker: http://etncaweb04.embarcadero.com/resources/technical_papers/Delphi-and-Unicode_Marco-Cantu.pdf
(9/11/2009 9:39:14 AM) Jim_Ferguson: Strings are getting fatter on the back end.
(9/11/2009 9:40:39 AM) Mandy_Walker: Sorry, for incorrect link from slide. Better http://etnaweb04.embarcadero.com/resources/technical_papers/
(9/11/2009 9:49:20 AM) Jim_Ferguson: TBYtes and pChar arent equivalt. TBytes is a dynamic array. Shouln’t be pByte instead of TBytes?
(9/11/2009 9:53:05 AM) Jim_Ferguson: Sounds like Intel needs to build Unicode into the processor. There is a lot of out board thinking when it comes to characters now.
(9/11/2009 9:57:19 AM) Mandy_Walker: http://it-republik.de/konferenzen/delphi_live/material/DelpiLive09_Cantu_Unicode.pdf
(9/11/2009 9:58:26 AM) Borland: Thanks Joroen, this was packed with great information sources.
(9/11/2009 9:59:09 AM) Jeroen_Pluimers: http://en.wordpress.com/tag/coderage/
(9/11/2009 9:59:17 AM) Jeroen_Pluimers: https://wiert.wordpress.com/2009/09/09/coderage-4-session-materials-available-for-download/
(9/11/2009 9:59:24 AM) Mandy_Walker: Thx, Jeroen
(9/11/2009 10:00:47 AM) Erwin_Mouthaan: Bedankt Jeroen. Leuke presentatie!
(9/11/2009 10:01:04 AM) jthurman: Jeroen is a marching band guy?
(9/11/2009 10:01:05 AM) M_L: Thanks!
(9/11/2009 10:01:21 AM) Jeroen_Pluimers: http://www.wmc.nl
(9/11/2009 10:01:22 AM) Giel: Bedankt Jeroen!
(9/11/2009 10:01:32 AM) Robert_D_Smith: I played snare and bass drum, Jeroen
(9/11/2009 10:01:40 AM) jthurman: Jeroen: I teach high school marching band in the USA. We should talk sometime.
(9/11/2009 10:02:05 AM) Jeroen_Pluimers: write me an email: jeroen@pluimers.com
(9/11/2009 10:02:09 AM) jthurman: Will do
(9/11/2009 10:02:45 AM) Robert_Evans: Thanks Jeroen. Great stuff!
(9/11/2009 10:03:11 AM) Jeroen_Pluimers: you are welcome; let me know when you have questions or run into things that I might be able to help with
(9/11/2009 10:04:48 AM) Jeroen_Pluimers: talking about migration projects: we have done quite a few for clients; so if you need help with that as well, drop me an email
(9/11/2009 10:05:46 AM) Christine_Ellis has set the topic to: Session Room 2 – “New Features in the RAD Studio IDE” by Mark Duncan & Darren Kosinski

–jeroen

Posted in .NET, C#, CommandLine, Delphi, Development, Encoding, ISO-8859, ISO8859, Prism, Software Development, Unicode, UTF-8, UTF8, XML, XML/XSD, XSD | 1 Comment »

CodeRage 4: session “Practical XML in Delphi” chat and Q&A transcripts

Posted by jpluimers on 2009/09/09

Not only can you download CodeRage 4 session on materials on Practical XML in Delphi, but below you can also find the chat transcripts below.

Note the times are a bit odd: when the chat window refreshes, it sometimes uses the PST time zone, but new posts are using the local time zone.
Hence the sudden jump from 9 AM to  almost 6 PM.

VIP Room Transcript with Q&A

[5:46:28 PM] <davidi>

Q: thomasgrubb asked: “Is there an implementation for XMLDocument (for Delphi Win32) that is file-mapped, e.g., the whole doc is not loaded into memory?”
A: Not that Jeroen is aware of.
[5:46:54 PM] <davidi>

Q: thomasgrubb asked: “Is there an implementation for XMLDocument (for Delphi Win32) that is file-mapped, e.g., the whole doc is not loaded into memory?”
A: Not that Jeroen is aware of. Send Jeroen an email and he will blog about other solutions.
[5:47:20 PM] <davidi>

Q: thomasgrubb asked: “For Embarcadero Technologies: Are you going to develop a better option for validating XML on the Win32 side in the future?”
A: David I – replied – I will forward this to R&D and Product management
[5:53:14 PM] <davidi>

Q: devtux asked: “are you using any XML test generator? Please, suggest one if yes”
A: XMLSpy
[5:53:47 PM] <davidi>

Q: richz asked: “I’ve been trying for weeks to find out how to have the Win32 Delphi IDE generate code to serialize/de-serialize my class properties to an XML file. Is there anything in the IDE to do that?”
A: From Delphi 2010 on – you can use DBX support for JSON!

Public Room Transcript

[7:58:58 AM] * Christine_Ellis has set the topic to: Session Room 2 – Next Session”Practical XML in Delphi” at 8AM PDT
[8:02:15 AM] <Jeroen_Pluimers> Starting livemeeting
[8:03:59 AM] * Jeroen_Pluimers is wondering why LiveMeeting is always asking for email/company. Does it suffer from Korsakov’s disease?
[8:07:34 AM] <Christine_Ellis> It asks because we tell it to.
[8:08:22 AM] <Jeroen_Pluimers> but it never remembers, even if you start it with the same session parametes.
[8:08:41 AM] <Christine_Ellis> live meeting doesn’t use cookies and doesn’t know who you are
[8:08:47 AM] <Jeroen_Pluimers> ok.
[8:09:29 AM] <Jeroen_Pluimers> can we do a quick audio test?
[8:12:48 AM] <Jeroen_Pluimers> I mean: fro my current Microphone; it works with sound recorder, but wonder if Live Meeting will get it today as well.
[8:15:55 AM] * Christine_Ellis has set the topic to: Session Room 2 – “Practical XML in Delphi
[8:35:37 AM] <Peter_Wolf> a lot of memory = usually 10 timer more than the size of XML file bytes
[8:36:27 AM] <Peter_Wolf> … the size of XML file in bytes
[8:39:14 AM] <Jeroen_Pluimers> @Peter: that totally depends on what you use to read that XML. The MSXML and Internet Explorer are notorous memory hogs. But .NET is much more efficient on memory usage.
[8:40:15 AM] <Peter_Wolf> i ment MSXML which is default for most users
[8:41:17 AM] <Jeroen_Pluimers> @Peter: yup, that’s why I mentioned that as the first one. Most of the Win32 users will use MSXML, because that is the default for Win32.
[8:43:45 AM] * Jeroen_Pluimers warns: be carefull where you press ESC in IE: it can unload your chat window.
[8:47:29 AM] <Scott_Hollows> my brain hurts
[8:48:57 AM] <Jeroen_Pluimers> Scott: let me know later on if I can make it more clear to you.
[8:50:27 AM] <Ryan_Ford> Will this presentation be available for download?
[8:51:05 AM] <Jeroen_Pluimers> @Ryan: yes it will.
[8:52:59 AM] <Ryan_Ford> Its so nice to run 8GB for development
[8:52:59 AM] <Jeroen_Pluimers> @Ryan: the session materials are available for download here: https://wiert.wordpress.com/2009/09/09/coderage-4-session-materials-available-for-download/ The replays will be available for download after the conference.
[8:58:56 AM] <Jeroen_Pluimers> My VIP room died.
[9:00:08 AM] <AbsaLootly> … you have to hate it when that happens…
[9:01:46 AM] <Ryan_Ford> What alternatives for MSXML are there for WIN32
[9:02:22 AM] <Peter_Wolf> it also takes forever to open really big XML files wh MSXML
[5:45:31 PM] <AbsaLootly> I saw one developer try to put an entire database in one xml file… it took several hours to load it.
[5:51:59 PM] <Jeroen_Pluimers> MSXML
[5:52:03 PM] <Jeroen_Pluimers> ADOM XML
[5:52:05 PM] <Jeroen_Pluimers> Xerces
[5:52:56 PM] <Jeroen_Pluimers> That straight from the Delphi 2010 TXMLDocument.DOMVendor property
[5:53:25 PM] <Jeroen_Pluimers> XMLSpy can generate test ML
[5:54:16 PM] <Rich__> Thx
[5:55:17 PM] <Jim_Ferguson> Can you briefly describe JSON?
[5:56:02 PM] <Jim_Ferguson> what tool do you use transcribe your chat?
[5:56:23 PM] <Jon> it’s called a keyboard :)

–jeroen

Posted in .NET, CodeRage, CommandLine, Conferences, Database Development, Debugging, Delphi, Development, Encoding, Event, ISO-8859, ISO8859, Prism, Software Development, Source Code Management, TFS (Team Foundation System), UTF-8, UTF8, Visual Studio and tools, XML, XML/XSD, XSD | Leave a Comment »

CodeRage 4: session materials are available for download« The Wiert Corner – Jeroen Pluimers’ irregular stream of Wiert stuff

Posted by jpluimers on 2009/09/09

My CodeRage 4 session materials are available for download:

CodeRage 4 is a free, virtual conference on Embarcadero technologies with a lot of Delphi sessions.
It is held from September 8 till 11, 2009, i.e. while I write this :-)
If you want to watch sessions live, be sure to register through LiveMeeting (the technology they use for making this all happen).

Let me know if you download, and what you are using the sample code for.

–jeroen

Posted in .NET, CodeRage, CommandLine, Conferences, Database Development, Debugging, Delphi, Development, Encoding, Event, Firebird, InterBase, ISO-8859, ISO8859, Prism, Software Development, Source Code Management, SQL Server, TFS (Team Foundation System), Unicode, UTF-8, UTF8, Visual Studio and tools, XML, XML/XSD, XSD | 4 Comments »