The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,862 other subscribers

Archive for the ‘Encoding’ Category

Comparisons for EBCDIC CCSID 37, 500 and 1047

Posted by jpluimers on 2011/09/20

The referenced article explains the difference in code points between EBCDIC CCSID 37 and EBCDIC CCSID 500, and the difference in code points between EBCDIC CCSID 37 and EBCDIC CCSID 1047:
IBM CCSID Comparisons – United States.

Basically, these are the codepoints that are sensitive:

4A, 4F, 5A, 5F, AD, B0, BA, BB and BD.

–jeroen

Posted in Development, EBCDIC, Encoding, MQ Message Queueing/Queuing, Software Development, WebSphere MQ | Leave a Comment »

Long pathname support: Watch for MAX_PATH (was: Windows pathname max length problem « Dropbox Forums)

Posted by jpluimers on 2010/06/16

When you want to support long pathnames on Windows, you need to watch for the MAX_PATH limitation.
Some tools, like RoboCopy (developed in C++), Beyond Compare (developed in Delphi) and others get it right.
Getting it right does not depend in your development environment: it is all about calling the right API’s with the right parameters.

Let me take a tool – in this case DropBox, though other tools suffer from the same problem – and investigate how they should do it.

Even though DropBox is cross platform, the Windows version of DropBox limits itself to synchronizing files having less than 260 characters in their path.
This is a big drawback: it is so 20th century having a limitation like this. Read the rest of this entry »

Posted in .NET, Delphi, Development, Encoding, Opinions, Software Development, Unicode | 6 Comments »

Why SizeOf for character arrays is evil: stackoverflow question “Why does this code fail in D2010, but not D7?”

Posted by jpluimers on 2010/05/11

This Why does this code fail in D2010, but not D7 question on stackoverflow once again shows that SizeOf on character arrays usualy is evil.

My point in this posting is that you should always try to write code that is semantically correct.

By writing semantically correct code, you have a much better chance of surviving a major change like a Unicode port.

The code below is semantically wrong: it worked in Delphi 7 by accident, not by design:
Like many Windows API functions, GetTempPath expects the first parameter (called nBufferLength) number of characters, not the number of bytes. Read the rest of this entry »

Posted in Delphi, Delphi 2005, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi 3, Delphi 4, Delphi 5, Delphi 6, Delphi 7, Delphi XE, Delphi XE2, Delphi XE3, Development, Encoding, ISO-8859, Software Development, Unicode | Leave a Comment »

.NET/C# – TEE filter that also runs on Windows (XP) Embedded – update

Posted by jpluimers on 2010/04/15

Last week, I posted a C# implementation of the tee filter from Sterling W. “Chip” Camden.

Since then I have modified it slightly.
Not because the implementation is bad, but because some pieces of software play dirty when saving their redirected output.

One of those applications is SubInAcl, otherwise a great tool for showing and modifying ACL and Ownership information of Windows NT objects (files, registry entries, etc).

However, when redirecing output or piping it, it writes a zero byte after each byte of text.
I’m not sure why: it might try to face some Unicode output, or just be buggy.

The new sourcecode is below. You can also download the project and binary as tee.C#.7z (you need the freeware 7zip compression tool to decompress this).
Read the rest of this entry »

Posted in .NET, ASCII, C#, C# 2.0, CommandLine, Development, Encoding, Software Development, Unicode, Visual Studio and tools | Leave a Comment »

*nix – Mastering the VI editor

Posted by jpluimers on 2010/04/13

Every once in a while I need to do some text editing in a *nix environment that has a minimum toolset installed.

Which means: use VI.

VI is a versatile text editor from the early *nix days, but it is not straight forward to use.
Since I don’t use VI often enough, I tend to forget some of the commands.

Time to share my favourite VI link: Mastering the VI editor edit: link rot, now it is at Mastering the VI editor.

The link points to the basic stuff, but the page contains most of what you ever want to know about VI.

–jeroen

PS:
If possible, I install the JOE text editor on systems where I am admin.
JOE uses WordStar like key bindings, and supports UTF-8. Talking about “something old, something new” :-)

Posted in *nix, CommandLine, Development, Encoding, Power User, Software Development, Unicode, UTF-8, UTF8, vi | 1 Comment »

Stopping percentage expansion in a Windows batch file

Posted by jpluimers on 2010/04/09

A while ago, I asked a question on percentage expansion in batch-files on superuser.com: a great site with great answers, similar to stackoverflow.com and serverfault.com , but now for asking “Power User” kind of questions.

The percentage sign (%) in URL‘s is to escape (or  URLencode) characters that should not be in a URL itself.
(Note that the old new thing had a very interesting article on URL encoding: there are many different opinions on how to to this ‘right’, and a few of these ‘right’ opinions are not always compatible with each other).

In Windows batch-files and the command-line, the percentage sign is used to expand environment variables, arguments and for loop indexes.
To make life ‘easier’, inside a batch-file, the percentage sign has a slightly different meaning than on the command-line itself.

This can break your batch-files when you use URL encoded parameters.
It does not matter if these parameters are quoted or not: cmd.exe expands them, unless you escape them properly.

So, the command for downloading the URL with wget (similar to curl) differs from running it on the plain command-line or in a batch-file.

Escaping percentage in batch-files

So the best way to escape percentages in batch files is to double them: each % becomes %%.
There is even a very old (MS-DOS era!) knowledge base article about this topic, that I just found when doing the research for this blog article :-)

URL decode

As a sidenote: manually decode thesed escaped URL’s is always a pain.
There are many sites that can do URL decoding on-line.

PS: This was the original question: How can I stop percentage expansion in a batch file? – Super User.

Posted in *nix, Batch-Files, CommandLine, Development, Encoding, Power User, Scripting, Software Development, URL Encoding, wget | 3 Comments »

.NET/C# – TEE filter that also runs on Windows (XP) Embedded

Posted by jpluimers on 2010/04/07

The usage of tee - image courtesey of Wikipedia

The tee command stems from a *nix background.
It is a command-line filter that allows you to deviate a stream from the regular stdout/stdin redirected pipeline into a file.

Recently, I needed this in a Windows Embedded Stadard (a.k.a. WES) system for logging purposes.
This way, a post-install-script (similar to the Windows Post-Install Wizard, but command-line based) could log to both the console and a log-file at the same time.

WES is the successor Windows XP Embedded (a.k.a. XPe), which is a modularized version of Windows XP.
Se WES usually means that you don’t have the luxury of everything that Windows XP has.
This in turn means that you need to be careful when selecting external tools: a lot of stuff that works on plain Windows XP won’t work.

There are various Win32 ports of tee available.
This time however, I needed a Unicode implementation, so I searched for a .NET based implementation.

Windows PowerShell 2.0 does contain a tee implementation, but:

  1. We don’t have the luxury of having PowerShell in our WES image
  2. PowerShell tee first writes the contents to e temporary file, which interferes with how we build this WES image.

Luckily Sterling W. “Chip” Camden started with such a .NET implementation of tee – in Visual C++ – back in 2005.
Though his TEE page indicates it is based on .NET 1.1, his current implementation is done in Visual Studio 2008 using C++.

Now that is a problem for the targeted WES image: that image is based on .NET 2.0.
But when using Visual C++ in .NET, you need additional run-time libraries (for instance the ones for Visual C++ 2005, or the ones for Visual C++ 2008).

If you don’t have these installed, tee.exe does not start, and you get error messages like this on the command-line:

K:\Post-Install-Scripts>tee
The system cannot execute the specified program.

and entries like this in the Eventlog:

Event Type: Error
Event Source: SideBySide
Event Category: None
Event ID: 59
Date: 01/04/2010
Time: 19:09:22
User: N/A
Computer: MYMACHINE
Description:
Generate Activation Context failed for K:\Post-Install-Scripts\tee.exe. Reference error message: The operation completed successfully.

The odd thing in this error message is “The operation completed successfully”: it didn’t :-)

Anyway: translating the underlying C++ code to C# is pretty straightforward, so:

The C# implementation

I did change a few things, none of them major:

  • replaced some for statements with foreach
  • renamed a few variables to make them more readable
  • added using statements for stdin and stdout
  • added try…finally for cleaning up the binary writers
  • moved the logic for duplicate filenames into a separate method, and moved the moment of checking to the point of adding the filename to the filenames
  • moved the help into a separate method
  • added support for the -h (same behaviour as –help or /?) command-line argument

The implementation is pretty straightforward:

  • Perform parameter parsing
  • Catch all input bytes from the stdin stream
  • Copy those bytes to both the stdout stream, and the files specified on the command-line
  • Send errors to the stderr stream
  • Do the proper initialization and cleanup

This is the C# code:

using System;
using System.IO;
using System.Collections.Generic;
// Sends standard input to standard output and to all files in command line.
// C# implementation april 4th, 2010 by Jeroen Wiert Pluimers (https://wiert.wordpress.com),
// based on tee Chip Camden, Camden Software Consulting, November 2005
// 	... and Anonymous Cowards everywhere!
//
// TEE [-a | --append] [-i | --ignore] [--help | /?] [-f] [file1] [...]
//    Example:
// 	tee --append file0.txt -f --help file2.txt
//    will append to file0.txt, --help, and file2.txt
//
// -a | --append	Appends files instead of overwriting
// 			  (setting is per tee instance)
// -i | --ignore	Ignore cancel Ctrl+C keypress: see UnixUtils tee
// /? | --help		Displays this message and immediately quits
// -f			Stop recognizing flags, force all following filenames literally
//
// Duplicate filenames are quietly ignored.
// Press Ctrl+Z (End of File character) then Enter to abort.
namespace tee
{
    class Program
    {
        static void help()
        {
            Console.Error.WriteLine("Sends standard input to standard output and to all files in command line.");
            Console.Error.WriteLine("C# implementation april 4th, 2010 by Jeroen Wiert Pluimers (https://wiert.wordpress.com),");
            Console.Error.WriteLine("Chip Camden, Camden Software Consulting, November 2005");
            Console.Error.WriteLine("	... and Anonymous Cowards everywhere!");
            Console.Error.WriteLine("http://www.camdensoftware.com");
            Console.Error.WriteLine("http://chipstips.com/?tag=cpptee");
            Console.Error.WriteLine("");
            Console.Error.WriteLine("tee [-a | --append] [-i | --ignore] [--help | /?] [-f] [file1] [...]");
            Console.Error.WriteLine("   Example:");
            Console.Error.WriteLine(" tee --append file0.txt -f --help file2.txt");
            Console.Error.WriteLine("   will append to file0.txt, --help, and file2.txt");
            Console.Error.WriteLine("");
            Console.Error.WriteLine("-a | --append    Appends files instead of overwriting");
            Console.Error.WriteLine("                 (setting is per tee instance)");
            Console.Error.WriteLine("-i | --ignore    Ignore cancel Ctrl+C keypress: see UnixUtils tee");
            Console.Error.WriteLine("/? | --help      Displays this message and immediately quits");
            Console.Error.WriteLine("-f               Stop recognizing flags, force all following filenames literally");
            Console.Error.WriteLine("");
            Console.Error.WriteLine("Duplicate filenames are quietly ignored.");
            Console.Error.WriteLine("Press Ctrl+Z (End of File character) then Enter to abort.");
        }

        static void OnCancelKeyPressed(Object sender, ConsoleCancelEventArgs args)
        {
            // Set the Cancel property to true to prevent the process from
            // terminating.
            args.Cancel = true;
        }

        static List<String> filenames = new List<String>();

        static void addFilename(string value)
        {
            if (-1 == filenames.IndexOf(value))
                filenames.Add(value);
        }

        static int Main(string[] args)
        {
            try
            {
                bool appendToFiles = false;
                bool stopInterpretingFlags = false;
                bool ignoreCtrlC = false;

                foreach (string arg in args)
                {
                    //Since we're already parsing.... might as well check for flags:
                    if (stopInterpretingFlags)  //Stop interpreting flags, assume is filename
                    {
                        addFilename(arg);
                    }
                    else if (arg.Equals("/?") || arg.Equals("-h") || arg.Equals("--help"))
                    {
                        help();
                        return 1; //Quit immediately
                    }
                    else if (arg.Equals("-a") || arg.Equals("--append"))
                    {
                        appendToFiles = true;
                    }
                    else if (arg.Equals("-i") || arg.Equals("--ignore"))
                    {
                        ignoreCtrlC = true;
                    }
                    else if (arg.Equals("-f"))
                    {
                        stopInterpretingFlags = true;
                    }
                    else
                    {	//If it isn't any of the above, it's a filename
                        addFilename(arg);
                    }
                    //Add more flags as necessary, just remember to SKIP adding them to the file processing stream!
                }

                if (ignoreCtrlC) //Implement the Ctrl+C fix selectively (mirror UnixUtils tee behavior)
                    Console.CancelKeyPress += new ConsoleCancelEventHandler(OnCancelKeyPressed);

                List<BinaryWriter> binaryWriters = new List<BinaryWriter>(filenames.Count); //Add only as many streams as there are distinct files
                try
                {
                    foreach (String filename in filenames)
                    {
                        binaryWriters.Add(new BinaryWriter(appendToFiles ?
                            File.AppendText(filename).BaseStream :
                            File.Create(filename)));  // Open the files specified as arguments
                    }
                    using (BinaryReader stdin = new BinaryReader(Console.OpenStandardInput()))
                    {
                        using (BinaryWriter stdout = new BinaryWriter(Console.OpenStandardOutput()))
                        {
                            Byte b;
                            while (true)
                            {
                                try
                                {
                                    b = stdin.ReadByte();  // Read standard in
                                }
                                catch (EndOfStreamException)
                                {
                                    break;
                                }
                                // The actual tee:
                                stdout.Write(b); // Write standard out
                                foreach (BinaryWriter binaryWriter in binaryWriters)
                                {
                                    binaryWriter.Write(b); // Write to each file
                                }
                            }
                        }
                    }
                }
                finally
                {
                    foreach (BinaryWriter binaryWriter in binaryWriters)
                    {
                        binaryWriter.Flush();  // Flush and close each file
                        binaryWriter.Close();
                    }
                }
            }
            catch (Exception ex)
            {
                Console.Error.WriteLine(String.Concat("tee: ", ex.Message));  // Send error messages to stderr
            }

            return 0;
        }
    }
}

Some alternatives that might (or might not) support unicode:

http://www.commandline.co.uk/mtee/
http://unxutils.sourceforge.net/ (cannot be downloaded any more – pitty, as they were pretty good)

–jeroen

Update: 201009041030 – Syntax highlighting didn’t work, so changed
sourcecode language=”C#
into
sourcecode language=”csharp

Posted in .NET, C#, C# 2.0, CommandLine, Development, Encoding, Power User, Software Development, Unicode, UTF-8, Visual Studio and tools, XP-embedded | 13 Comments »

Formatted sourcecode in WordPress now supports even more languages

Posted by jpluimers on 2010/02/15

I just found out that the sourcecode tag in WordPress now supports even more languages.

This is the list of languages is below, it contains links to Wikipedia for each language.
Starred ones (bold and hyperlinks in this theme are the same ) are new since my post last year.

This is a follow up on the original article Including formatted sourcecode in WordPress « The Wiert Corner – Jeroen Pluimers’ irregular stream of Wiert stuff.

–jeroen

Posted in .NET, C#, CSS, Database Development, Delphi, Development, Encoding, Java, Software Development, SQL Server, Web Development, WordPress, XML, XML/XSD | Tagged: | 2 Comments »

Web means Unicode

Posted by jpluimers on 2010/02/12

Google published an interesting graph generated from their internal data based on their indexed web pages.Encodings on the web

A quick summary of popular encodings based on the graph:

  1. Unicode – almost 50% and rapidly rising
  2. ASCII20% and falling
  3. Western European* – 20% and falling
  4. Rest – 10% and falling

Conclusion: if you do something with the web, make sure you support Unicode.

When you are using Delphi, and need help with transitioning to Unicode: contact me.

–jeroen

* Western European encodings: Windows-1252, ISO-8859-1 and ISO-8859-15.

Reference: Official Google Blog: Unicode nearing 50% of the web.

Edit: 20100212T1500

Some people mentioned (either in the comments or otherwise) that a some sites pretend they emit Unicode, but in fact they don’t.
This doesn’t relieve you from making sure you support Unicode: Don’t pretend you support Unicode, but do it properly!

Examples of bad support for Unicode are not limited to the visible web, but also applications talking to the web, and to webservices (one of my own experiences is explained in StUF – receiving data from a provider where UTF-8 is in fact ISO-8859: it shows an example where a vendor does Unicode support really wrong).

So: when you support Unicode, support it properly.

–jeroen

Posted in .NET, ASP.NET, C#, Database Development, Delphi, Development, Encoding, Firebird, IIS, InterBase, ISO-8859, ISO8859, Prism, SOAP/WebServices, Software Development, SQL Server, Unicode, UTF-8, UTF8, Visual Studio and tools, Web Development | 7 Comments »

Delphi – HIGHCHARUNICODE directive (Delphi) – RAD Studio

Posted by jpluimers on 2010/01/18

I forgot about it, but this thread (which got wiped by Embarcadero) reminded be about the differences between these two character values.

Quoting from the first post:

c1 := #128;
c2 := chr(128);
Assert(c1 = c2);

the assertion fails, meaning that c1 <> c2.

In fact c1 = #$20AC and c2 = #$80.

Read the rest of this entry »

Posted in Delphi, Development, Encoding, Software Development, Unicode | 2 Comments »