The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,860 other subscribers

Archive for the ‘ASCII’ Category

XML and HTML escapes

Posted by jpluimers on 2012/07/26

While reviewing some client’s code, I noticed they were generating and parsing XML and HTML by hand (do not ever do that yourself!).

Before refactoring this into something that uses libraries that properly understand XML and HTML, I needed to assess some of the risks.

A major risk is to get the escaping (and unescaping) of XML and HTML right.

Time to finally organize some of my links on escaping HTML and XML that I had in my favourites list.

The starting point is the List of XML and HTML character entity references on Wikipedia. It is readable, complete and lists both kinds of escapes.

XML escapes

The official W3C text that describes XML escaping is hard to read.

There are only 5 predefined XML entities for characters that can (some must) be escaped. This table is derived from the Wikipedia article.

Name Character Unicode code point
(decimal)
Standard When to escape (from the XML 1.0 standard) Description
quot U+0022 (34) XML 1.0 To allow attribute values to contain both single and double quotes double quotation mark
amp & U+0026 (38) XML 1.0 Outside  comment, a processing instruction, or a CDATA section ampersand
apos U+0027 (39) XML 1.0 To allow attribute values to contain both single and double quotes apostrophe (= apostrophe-quote)
lt < U+003C (60) XML 1.0 Outside  comment, a processing instruction, or a CDATA section less-than sign
gt > U+003E (62) XML 1.0 in content, when that string is not marking the end of a CDATA section greater-than sign

HTML escapes

Read the rest of this entry »

Posted in " quot, & amp, > gt, < lt, ' apos, ASCII, Development, Encoding, HTML, Power User, SocialMedia, Software Development, Unicode, Web Development, WordPress, XML, XML escapes, XML/XSD | 1 Comment »

.NET/C# – TEE filter that also runs on Windows (XP) Embedded – update

Posted by jpluimers on 2010/04/15

Last week, I posted a C# implementation of the tee filter from Sterling W. “Chip” Camden.

Since then I have modified it slightly.
Not because the implementation is bad, but because some pieces of software play dirty when saving their redirected output.

One of those applications is SubInAcl, otherwise a great tool for showing and modifying ACL and Ownership information of Windows NT objects (files, registry entries, etc).

However, when redirecing output or piping it, it writes a zero byte after each byte of text.
I’m not sure why: it might try to face some Unicode output, or just be buggy.

The new sourcecode is below. You can also download the project and binary as tee.C#.7z (you need the freeware 7zip compression tool to decompress this).
Read the rest of this entry »

Posted in .NET, ASCII, C#, C# 2.0, CommandLine, Development, Encoding, Software Development, Unicode, Visual Studio and tools | Leave a Comment »

CodeRage 4 session material download locations changed – CodeCentral messed up

Posted by jpluimers on 2009/09/18

Somehow, CodeCentral managed to not only delete my uploaded CodeRage 4 session materials (the videos are fine!), but also newer uploads with other submissions.

Since I’m in crush mode to get the BASTA! and DelphiLive 2009 Germany sessions done, and all Embarcadero sites having to do with their membership server perform like a dead horse, I have temporary moved the downloads, and corrected my earlier post on the downloads.

I have notified Embarcadero of the problems, so I’m pretty sure they are being addressed on their side as well.
Edit 20090919: Embarcadero indicated that between 20090913 and 20090914 there has been a database restore on CodeCentral. Some entries therefore have been permanently lost.

When I get back from the conferences, and CodeCentral is more responsive, I’ll retry uploading it there, and correct the posts.
In the mean time, be sure not to save shortcuts to the current locations, as they are bound to change.

The are the new download locations can all be found in my xs4all temporary CodeRage 4 2009 download folder.

This is the full list of my CodeRage 4 sessions and places where you can download everything: Read the rest of this entry »

Posted in .NET, ASCII, CodeRage, CommandLine, Conferences, CP437/OEM 437/PC-8, Database Development, Debugging, Delphi, Development, Encoding, Event, Firebird, InterBase, ISO-8859, ISO8859, Prism, Software Development, SQL Server, Unicode, UTF-8, UTF8, Windows-1252 | 1 Comment »

CodeRage 4: session replays are online too!

Posted by jpluimers on 2009/09/13

Embarcadero has made available the replays of the CodeRage 4 sessions.
You can find them in the CodeRage 4 sessions overview.

In order to download them from that overview, NOTE: To access this session replay, you must be logged into EDN. you can login or sign-up (which is free).

To make it easier to find all the relevant downloads, below is an overview of my sessions and their links.

Let me know what you use it for, I’m always interested!

Update 20090918: changed the download locations because CodeCentral messed up.
Read the rest of this entry »

Posted in .NET, ASCII, C#, C# 2.0, CodeRage, CommandLine, Conferences, CP437/OEM 437/PC-8, Database Development, Debugging, Delphi, Development, Encoding, Event, Firebird, InterBase, ISO-8859, ISO8859, Java, Prism, Software Development, Unicode, UTF-8, UTF8, Visual Studio and tools, XML, XML/XSD, XSD | 4 Comments »

.NET/C# – converting UTF8 to ASCII (yes, you *can* loose information with this) using System.Text.Encoding

Posted by jpluimers on 2009/04/23

Quite a while ago, we needed to exchange some text files with .NET and a DOS application (yes, they still exist!).

Since the .NET app already exchanged text files with other applications using UTF8, we had to reencode those into plain ASCII.
(yes, I am aware there are dozens of codepages we could encode to, we decided to stick with 7-bit ASCII, and warned the client about possible information loss).

A couple of months later, we neede to exchange information with an app doing Windows-1252, and then even later on to a web-app needing ISO 8859-1 (both are Western European encodings).
So I decided to refactor the UTF8 to ASCII conversion app into something more maintainable.

But first let me show you how you can dump all of the .NET supported encodings:
Read the rest of this entry »

Posted in .NET, ASCII, C#, Development, Encoding, Software Development, Unicode, UTF-8, UTF8 | 3 Comments »