The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 4,259 other subscribers

Delphi – MD5: the MessageDigest_5 unit has been there since Delphi 2007

Posted by jpluimers on 2009/12/11

I still see a lot of people crafting their own MD5 implementation.
A lot of the existing MD5 implementations do not work well in Delphi 2009 and later (because they need to be adapted to Unicode).
Many of those existing implementations behave differently if you pass the same ASCII characters as AnsiString or UnicodeString.

The MessageDigest_5 unit has been available in Delphi since Delphi 2007.
This is the location relative to your installation directory: source\Win32\soap\wsdlimporter\MessageDigest_5.pas

(Edit: 20091223:  Since Delphi 7.01, Indy has provided the unit IdHashMessageDigest which also does md5, see the comments below)

So this unit used by the WSDL, and more importantly: works with Unicode (if you pass it a string with Unicode characters, it will convert them to UTF-8 first).
The unit is not in your default search path, and has not been very well promoted (the only link at the Embarcadero site was an article by Pawel Glowacki), so few people know about it.

Now you know too :-)

Note that MD5 is normally used to hash binary data.
It is not wise to send a non ASCII string through both the AnsiString and UnicodeString versions: because of the different encoding (and therefore a different binary representation), you will get different results depending on the Delphi version used.

A sample of the usage showing the above AnsiString/UnicodeString issue is not present for ASCII strings, nor for ANSI strings: this is because both get encoded using UTF-8 before hashing.
Delphi 2007 did not do the UTF-8 encoding, so you will see different results here.
You will also see that Writeln uses the Console for encoding, and those are different than the code editor.

Edit: 20091216 – added RawByteString example to show that the conversion does not matter.

<br />program md5;<br /><br />{$APPTYPE CONSOLE}<br /><br />uses<br /><%%KEEPWHITESPACE%%>  SysUtils,<br /><%%KEEPWHITESPACE%%>  MessageDigest_5 in 'C:\Program Files\Embarcadero\RAD Studio\7.0\source\Win32\soap\wsdlimporter\MessageDigest_5.pas';<br /><%%KEEPWHITESPACE%%>  // Vista/Windows 7: MessageDigest_5 in 'C:\Program Files (x86)\Embarcadero\RAD Studio\7.0\source\Win32\soap\wsdlimporter\MessageDigest_5.pas';<br /><br />function GetMd5(const Value: AnsiString): string; overload;<br />var<br /><%%KEEPWHITESPACE%%>  hash: MessageDigest_5.IMD5;<br /><%%KEEPWHITESPACE%%>  fingerprint: string;<br />begin<br /><%%KEEPWHITESPACE%%>  hash := MessageDigest_5.GetMD5();<br /><%%KEEPWHITESPACE%%>  hash.Update(Value);<br /><%%KEEPWHITESPACE%%>  fingerprint := hash.AsString();<br /><%%KEEPWHITESPACE%%>  Result := LowerCase(fingerprint);<br />end;<br /><br />function GetMd5(const Value: UnicodeString): string; overload;<br />var<br /><%%KEEPWHITESPACE%%>  hash: MessageDigest_5.IMD5;<br /><%%KEEPWHITESPACE%%>  fingerprint: string;<br />begin<br /><%%KEEPWHITESPACE%%>  hash := MessageDigest_5.GetMD5();<br /><%%KEEPWHITESPACE%%>  hash.Update(Value);<br /><%%KEEPWHITESPACE%%>  fingerprint := hash.AsString();<br /><%%KEEPWHITESPACE%%>  Result := LowerCase(fingerprint);<br />end;<br /><br />var<br /><%%KEEPWHITESPACE%%>  SourceAnsiString: AnsiString;<br /><%%KEEPWHITESPACE%%>  SourceUnicodeString: UnicodeString;<br /><%%KEEPWHITESPACE%%>  SourceRawByteString: RawByteString;<br /><br />begin<br /><%%KEEPWHITESPACE%%>  try<br /><%%KEEPWHITESPACE%%>    SourceAnsiString := 'foobar';<br /><%%KEEPWHITESPACE%%>    SourceUnicodeString := 'foobar';<br /><%%KEEPWHITESPACE%%>    SourceRawByteString := 'foobar';<br /><br /><%%KEEPWHITESPACE%%>    Writeln(GetMd5(SourceAnsiString));<br /><%%KEEPWHITESPACE%%>    Writeln(GetMd5(SourceUnicodeString));<br /><%%KEEPWHITESPACE%%>    Writeln(GetMd5(SourceRawByteString));<br /><br /><%%KEEPWHITESPACE%%>    SourceAnsiString := 'föøbår';<br /><%%KEEPWHITESPACE%%>    SourceUnicodeString := 'föøbår';<br /><%%KEEPWHITESPACE%%>    SourceRawByteString := 'föøbår';<br /><%%KEEPWHITESPACE%%>    Writeln(SourceAnsiString, ' ', GetMd5(SourceAnsiString));<br /><%%KEEPWHITESPACE%%>    Writeln(SourceUnicodeString, ' ', GetMd5(SourceUnicodeString));<br /><%%KEEPWHITESPACE%%>    Writeln(SourceRawByteString, ' ', GetMd5(SourceRawByteString));<br /><%%KEEPWHITESPACE%%>  except<br /><%%KEEPWHITESPACE%%>    on E: Exception do<br /><%%KEEPWHITESPACE%%>      Writeln(E.ClassName, ': ', E.Message);<br /><%%KEEPWHITESPACE%%>  end;<br />end.<br />

–jeroen

28 Responses to “Delphi – MD5: the MessageDigest_5 unit has been there since Delphi 2007”

  1. Tom Borysiak said

    I hope this can help anyone else who may run into this, Marcel’s changes to get Peter’s code to compile can result in inconsistent results. I would randomly get extra chars returned in the DigestStr. I found that changing TDigestStr = String[0..32]; —> TDigestStr = Array[0..32] of Char; will allow the code to compile and produces correct results everytime.

  2. And this:


    Procedure TMD5.Add (Const Value: String);
    Begin
    Update(PChar(RawByteString(Value))^, Length(RawByteString(Value)));
    End;

  3. Guus Creuwels said

    Hi,

    Is the speed of the MessageDigest_5 functions the same or better as the MD5 functions from http://www.sawatzki.de/Download/Delphi_MD5.zip?

    I used Indy version but had some performance issues with calculating the hash of large files. The md5 methods from http://www.sawatzki.de are much faster.

    Thanks.

    Regards,
    Guus

    • jpluimers said

      Peter Sawatzki writes very optimal code, so I think his code will be faster.

      Right now I don’t have time to measure, can you try to measure?

      –jeroen

      • GuusCreuwels said

        Unfortunately the md5 units from Peter do not compile in Delphi 2010. That’s how I landed on this page…

      • Only a few changes and it will work ;-) See Marco Cantu’s Delphi 2009 handbook and will see which changes come in D2009 ( D2010 also ;-) ) ,..


        Type
        TDigestStr = Array[0..32] of AnsiChar;

        and function GetDigestStr:


        Function TCustomMD5.GetDigestStr: TDigestStr;
        Const
        hc: Array[0..$F] Of AnsiChar = '0123456789abcdef';
        Var
        aDigest: TDigest;
        i: 0..15;
        Begin
        aDigest:= Digest;
        // Result[0]:= #32;
        For i:= 0 To 15 Do Begin
        Result[0+i Shl 1] := hc[aDigest[i] Shr 4];
        Result[1+i Shl 1] := hc[aDigest[i] And $F];
        End
        End;

        works it now? I think yes ;-)

        Bye, Marcel – Czech republic :)

  4. Alan said

    Sorry, it should be replaced with lineText rather than Value:

    ReadLn(myFile, lineText);
    hash.Update(lineText);
    ReadLn(myFile, lineText);
    hash.Update(lineText);
    ReadLn(myFile, lineText);
    hash.Update(lineText);
    fingerprint := hash.AsString();
    Result := LowerCase(fingerprint);

  5. Marry said

    Hi, can you write me great procedure to get MD5 from a file? Right now I am using File2String procedure then use your function. It consumes much memory and slow.
    Thx.

    • jpluimers said

      Just feed the MD5 engine a string from the file at a time and you should be fine!

      –jeroen

      • Alan said

        Are you saying something like this if my file has 3 lines of text:

        var
        hash: MessageDigest_5.IMD5;
        fingerprint: string;
        myFile: TextFile;
        lineText: String;
        ……
        begin
        hash := MessageDigest_5.GetMD5();
        …..
        ReadLn(myFile, lineText);
        hash.Update(Value);
        ReadLn(myFile, lineText);
        hash.Update(Value);
        ReadLn(myFile, lineText);
        hash.Update(Value);
        fingerprint := hash.AsString();
        Result := LowerCase(fingerprint);
        end;

    • Use Peter Sawatzki’s code – there are functions:

      Function FileMD5Digest (Const FileName: TFileName): TDigestStr;
      Function StringMD5Digest (S: String): TDigestStr;
      
  6. Morwath said

    Maybe in Delphi 2014 we’ll see a SHA hash…

  7. nader said

    thanks
    for this tip. it help me very well.

  8. apz28 said

    Why did it not use RawString type and ignore the conversion all together

    • jpluimers said

      Actually, it is RawByteString, and in this case the conversion does not matter.
      I have added a RawByteString to the example: the results are the same as for UnicodeString and AnsiString.

      –jeroen

  9. Gad D Lord said

    Perfect. So far I have used the
    IdHashCrc.pas
    IdHashMessageDigest.pas
    IdHashSHA1.pas

    methods from Indy source. They also have SHA-1 and CRC-16, CRC-32.

    • jpluimers said

      Duh – I totally forgot that Indy has that as well.
      And even better: IdHashMessageDigest.pas has been there since Delphi 7:

      D7.01.Architect\Source\Indy\IdHashMessageDigest.pas

      –jeroen

  10. Yogi Yang said

    Thanks for this jewel.

  11. Peter Bartholdsson said

    And I see you write ASCII above, reading too fast as usual.
    Still gut reaction is don’t use as it doesn’t actually produce the expected MD5 for an unicode string. ;)

  12. Peter Bartholdsson said

    Actually it’ll only produce the same result if you’re using ASCII characters as it converts the unicode string to UTF-8.
    Using this class to produce a MD5 of string values seems risky to me. Do it properly, don’t expect a unicodestring and ansistring to produce the same MD5, because they most certainly shouldn’t.

    The following example should produce different MD5 checksums (using latin-1 / ISO/IEC 8859-1 as your ansistring locale):
    SourceAnsiString := ‘fööbar’;
    SourceUnicodeString := ‘fööbar’;

    • jpluimers said

      Actually, they don’t. At least not when used in Delphi 2009 or 2010. The reason is that both strings get converted to UTF-8 before hashing.
      Delphi 2007 does not do that conversion, so you will see different results between Delphi 2007 and 2009/2010.

      But you should normally only use md5 for hashing binary data.

      –jeroen

  13. IL said

    Thank you for info, Jeroen. But, oops! RAD Studio 7.0 is Delphi 2010, not 2009. Nor 2007 trial version, nor 2009 does contain MessageDigest_5 unit source or compiled.

    • jpluimers said

      Actually, since I have the RTL and VCL sources for all Delphi versions in a central place:

      RTL-VCL-Sources\D2007\source\Win32\soap\wsdlimporter\MessageDigest_5.pas
      RTL-VCL-Sources\D2009\source\Win32\soap\wsdlimporter\MessageDigest_5.pas
      RTL-VCL-Sources\D2010\source\Win32\soap\wsdlimporter\MessageDigest_5.pas

      Maybe I have them because I always have the Enterprise or Architect edition.

      –jeroen

  14. Dimitrij said

    Thanks. I had no idea about its existance…:)

Leave a reply to jpluimers Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.