March 2026
M	T	W	T	F	S	S
	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Archive for the ‘Encoding’ Category

Foute foutmelding @heldenvannu (inschrijving: Pakketten | HELDEN VAN . NU)

Posted by jpluimers on 2013/03/29

Als je postcode “1060 NP” invult bij aansluiting en “1170 AB” bij facturatie, dan krijg je deze onterechte foutmeldingen:

Organisatie postcode is ongeldig
Facturatie gegevens postcode is ongeldig

Beetje vreemd, want al sinds de introductie van postcodes in Nederland in 1978 zit in een postcode 1 spatie, en tussen de postcode en de woonplaats 2 spaties.

Ook trema‘s gaan mis: bij postcode 1060 NP hoort de straat Pyreneeën in Amsterdam, maar bij Heldenvan.nu wordt het deze Mojibake:

Read the rest of this entry »

Posted in Development, Encoding, Mojibake, Opinions, Power User, Software Development, Unicode | Leave a Comment »

Delphi “Variant Records”, a few notes

Posted by jpluimers on 2013/03/14

Variant Records are a feature that has been in the Pascal language since Standard Pascal.

A cool page for historic perspective is R3R: Pascal Features in Popular Compilers, hopefully someone will update it to more modern versions of the mentioned compilers.

There is not much official documentation on the Delphi side on this, so below some parts of a case I used for a project that started in 1997 and is still in use to day. Read the rest of this entry »

Posted in APPC, AS/400 / iSeries / System i, ASCII, COBOL, Communications Development, Conference Topics, Conferences, CPI-C, Delphi, Delphi 1, Delphi 2005, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi 3, Delphi 4, Delphi 5, Delphi 6, Delphi 7, Delphi 8, Delphi XE, Delphi XE2, Delphi XE3, Development, Encoding, Event, HIS Host Integration Services, Internet protocol suite, MQ Message Queueing/Queuing, SNA, Software Development, TCP, Unicode, UTF-8, WebSphere MQ | 9 Comments »

Delphi “type types”: similar types but not the same type identity, some examples.

Posted by jpluimers on 2013/03/12

Few people know about a Delphi language feature that has been present since Delphi 1: prepending the type definition with a type keyword to make the type getting a new identity.

Each time I use it, I have to do some browsing for the consequences, and this time I wrote down some notes and created a small example program (source is also below).

This time I needed it when writing class wrappers on top of the Delphi bindings for WebSphere MQ.

WebSphere MQ has Queues where you can put and get messages. It also has Queue Managers to which you connect, and that provide queuing services and manages queues.

Both Queues and Queue Managers have names that can be up to 48 (single byte) characters long.
Those names mean totally different things, so though the have similar data types, they have a different identity.

The same holds for 20 byte character arrays (they can be used as names for ChannelName, ShortConnectionName and MCAName). The 264 byte character array is so far used for ConnectionName only.

Distinguishing those types: That’s what “type types” in Delphi are all about. Read the rest of this entry »

Posted in CP437/OEM 437/PC-8, Delphi, Delphi 1, Delphi 2005, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi 3, Delphi 4, Delphi 5, Delphi 6, Delphi 7, Delphi 8, Delphi x64, Delphi XE, Delphi XE2, Delphi XE3, Development, Encoding, Shift JIS, Software Development, Unicode, UTF-8, UTF8 | 1 Comment »

Link clearance: fonts, localization, languages, internationalization, PostScript, and more

Posted by jpluimers on 2013/03/01

A few links I came across recently:

internationalization – Country codes list – C# – Stack Overflow.
c# – Converting country codes in .NET – Stack Overflow.
CLDR – Unicode Common Locale Data Repository.
Common Locale Data Repository – Wikipedia, the free encyclopedia.
localization – C#: get letters of alphabet for scandinavian language? – Stack Overflow.
– The exempla characters in CLDR, type=”index” are what you are looking for. unicode.org/reports/tr35/#Character_Elements – Steven R. Loomis May 14 ’10 at 18:15
– There’s a good set of reference documents at the Evertype website
Evertype: The Alphabets of Europe.
TZ4Net library. (TimeZone 4.NET: uses Unicode CLDR v.2.1 based mapping between Win32 Id and Olson name)
Evertype.
Software I used in the 90s for digitizing fonts to produce PostScript and Type 1 fonts
– Fontographer – Wikipedia, the free encyclopedia
– Ikarus (typography software) – Wikipedia, the free encyclopedia.
Visualogik Technology & Design / Hans van Leeuwen.

–jeroen

Posted in About, Development, Encoding, EPS/PostScript, Font, internatiolanization (i18n) and localization (l10), Personal, Power User, Programmers Font, Software Development, Unicode | Leave a Comment »

START: Start a program, even if it is not on the PATH ideal to start various versions of apps from DOS

Posted by jpluimers on 2013/01/29

A while ago, I had to adapt a DOS app that used one specific version of Excel to do some batch processing so it would support multiple versions of Excel on multiple versions of Windows.

One of the big drawbacks of DOS applications is that the command lines you can use are even shorter than Windows applications, which depending you how you call an application are:

32767 characters when you call CreateProcess (the limit is UNICODE_STRING structure)
8192 characters when you use Cmd.exe
2048+32 characters when you use ShellExecute or ShellExecuteEx (the limit is INTERNET_MAX_URL_LENGTH)
260 characters for the Windows 95 family of products (the limit is MAX_PATH)
127 characters for DOS (the upper limit of a signed byte) often excluding the length of “cmd.exe” or “command.exe”

This is how the DOS app written in Clipper (those were the days, it was even linked with Blinker :) started Excel:

c:\progra~1\micros~2\office11\excel.exe parameters
01234567890123456789012345678901234567890
          1         2         3         4

The above depends on 8.3 short file names that in turn depend on the order in which similar named files and directories have been created.

The trick around this, and around different locations/versions of an application, is to use START to find the right version of Excel.

The reason it works is because in addition to PATH, it checks the App Paths portions in the registry in this order to find an executable: Read the rest of this entry »

Posted in Batch-Files, Development, Encoding, Power User, Scripting, Software Development, Unicode, Windows, Windows 7, Windows 8, Windows Server 2000, Windows Server 2003, Windows Server 2003 R2, Windows Server 2008, Windows Server 2008 R2, Windows Vista, Windows XP | Leave a Comment »

Delphi and C# compiler oddities

Posted by jpluimers on 2013/01/08

When developing in multiple languages, it sometimes is funny to see how they differ in compiler oddities.

Below are a few on const examples.

Basically, in C# you cannot go from a char const to a string const, and chars are a special kind of int.

In Delphi you cannot go from a string to a char. Read the rest of this entry »

Posted in .NET, ASCII, C#, C# 1.0, C# 2.0, C# 3.0, C# 4.0, C# 5.0, Delphi, Delphi 2009, Delphi 2010, Delphi XE, Delphi XE2, Development, Encoding, Software Development, Unicode | Leave a Comment »

If you think CSV is easy; think again!

Posted by jpluimers on 2012/12/05

Lots of people think CSV is easy: it’s just a bunch of values separated with commas. But in practice it is not. Various reasons can make CSV very hard, especially since “CSV” is not a single, well-defined format. As always importing is always harder than exporting. A few reasons that make it hard:

Comma is often not the separator
The separator can be inside a field as well, so you need some form of quoting the separator
If quotes are in the fields, you need to have some form of escaping these quotes (usually done by doubling the double quotes or doubling the single quotes or quoting commas and newlines)
What kind of quotes to you want (especially when you want to embed the CSV into an XML or HTML attribute)
Do you allow for newlines (and if yes: what kind of newline representation: CRLF, LF, CR, LFCR?) Some solve this by replacing newline representations with spaces, but that is not always a good idea.
Encoding? What encoding: everything is EBCDIC, right?

A few links that helped me a lot getting input and output of CSV right in C#:

c# – Creating a DataTable from CSV File – Stack Overflow.
JoshClose/CsvHelper · GitHub.
CSV TextfieldParser parsing through lines in CSV File with Single and Double Quotes, while skipping lines of unwanted data.
Though be careful with TextFieldParser problem with Double Quotes with Quotes.
c# – Creating a comma separated list from IList or IEnumerable – Stack Overflow.
(simple solution when you know no strange characters are involved).
c# – How to split csv whose columns may contain , – Stack Overflow.
A Fast CSV Reader – CodeProject.

Thanks to Jabulaza:

–jeroen

via: Comma-separated values – Wikipedia, the free encyclopedia.

Posted in Conference Topics, Conferences, CSV, Development, EBCDIC, Encoding, Event, Software Development | 4 Comments »

.NET/C# duh moment of the day: “A char can be implicitly converted to ushort, int, uint, long, ulong, float, double, or decimal (not the other way around; implicit != implicit)”

Posted by jpluimers on 2012/11/20

A while ago I had a “duh” moment while calling a method that had many overloads, and one of the overloads was using int, not the char I’d expect.

The result was that a default value for that char was used, and my parameter was interpreted as a (very small) buffer size. I only found out something went wrong when writing unit tests around my code.

The culprit is this C# char feature (other implicit type conversions nicely summarized by Muhammad Javed):

A char can be implicitly converted to ushort, int, uint, long, ulong, float, double, or decimal. However, there are no implicit conversions from other types to the char type.

Switching between various development environments, I totally forgot this is the case in languages based on C and Java ancestry. But not in VB and Delphi ancestry (C/C++ do numeric promotions of char to int and Java widens 2-byte char to 4-byte int; Delphi and VB.net don’t).

I’m not the only one who was confused, so Eric Lippert wrote a nice blog post on it in 2009: Why does char convert implicitly to ushort but not vice versa? – Fabulous Adventures In Coding – Site Home – MSDN Blogs.

Basically, it is the C ancestry: a char is an integral type always known to contain an integer value representing a Unicode character. The opposite is not true: an integer type is not always representing a Unicode character.

Lesson learned: if you have a large number of overloads (either writing them or using them) watch for mixing char and int parameters.

Note that overload resolution can be diffucult enough (C# 3 had breaking changes and C# 4 had breaking changes too, and those are only for C#), so don’t make it more difficult than it should be (:

Below a few examples in C# and VB and their IL disassemblies to illustrate their differnces based on asterisk (*) and space ( ) that also show that not all implicits are created equal: Decimal is done at run-time, the rest at compile time.

Note that the order of the methods is alphabetic, but the calls are in order of the type and size of the numeric types (integral types, then floating point types, then decimal).

A few interesting observations:

The C# compiler implicitly converts char with all calls except for decimal, where an implicit conversion at run time is used:
L_004c: call valuetype [mscorlib]System.Decimal [mscorlib]System.Decimal::op_Implicit(char)
L_0051: call void CharIntCompatibilityCSharp.Program::writeLineDecimal(valuetype [mscorlib]System.Decimal)
Same for implicit conversion of byte to the other types, though here the C# and VB.NET compilers generate slightly different code for run-time conversion.
C# uses an implicit conversion:
L_00af: ldloc.1
L_00b0: call valuetype [mscorlib]System.Decimal [mscorlib]System.Decimal::op_Implicit(uint8)
L_00b5: call void CharIntCompatibilityCSharp.Program::writeLineDecimal(valuetype [mscorlib]System.Decimal)
VB.NET calls a constructor:
L_006e: ldloc.1
L_006f: newobj instance void [mscorlib]System.Decimal::.ctor(int32)
L_0075: call void CharIntCompatibilityVB.Program::writeLineDecimal(valuetype [mscorlib]System.Decimal)

Here is the example code: Read the rest of this entry »

Posted in .NET, Agile, Algorithms, C#, C# 1.0, C# 2.0, C# 3.0, C# 4.0, C# 5.0, C++, Delphi, Development, Encoding, Floating point handling, Java, Software Development, Unicode, Unit Testing, VB.NET | 1 Comment »

XML and HTML escapes

Posted by jpluimers on 2012/07/26

While reviewing some client’s code, I noticed they were generating and parsing XML and HTML by hand (do not ever do that yourself!).

Before refactoring this into something that uses libraries that properly understand XML and HTML, I needed to assess some of the risks.

A major risk is to get the escaping (and unescaping) of XML and HTML right.

Time to finally organize some of my links on escaping HTML and XML that I had in my favourites list.

The starting point is the List of XML and HTML character entity references on Wikipedia. It is readable, complete and lists both kinds of escapes.

XML escapes

The official W3C text that describes XML escaping is hard to read.

There are only 5 predefined XML entities for characters that can (some must) be escaped. This table is derived from the Wikipedia article.

Name	Character	Unicode code point (decimal)	Standard	When to escape (from the XML 1.0 standard)	Description
quot	“	U+0022 (34)	XML 1.0	To allow attribute values to contain both single and double quotes	double quotation mark
amp	&	U+0026 (38)	XML 1.0	Outside comment, a processing instruction, or a CDATA section	ampersand
apos	‘	U+0027 (39)	XML 1.0	To allow attribute values to contain both single and double quotes	apostrophe (= apostrophe-quote)
lt	<	U+003C (60)	XML 1.0	Outside comment, a processing instruction, or a CDATA section	less-than sign
gt	>	U+003E (62)	XML 1.0	in content, when that string is not marking the end of a CDATA section	greater-than sign

HTML escapes

Read the rest of this entry »

Posted in " quot, & amp, > gt, < lt, ' apos, ASCII, Development, Encoding, HTML, Power User, SocialMedia, Software Development, Unicode, Web Development, WordPress, XML, XML escapes, XML/XSD | 1 Comment »

Searchable UTF8 Unicode Characters

Posted by jpluimers on 2012/05/31

Brilliant: site where you can ~~Search UTF8 Unicode Characters~~.

If you know part of the name of a Unicode character, you can now try and find it, then copy/paste it from that site.

Edit: 20160403: that site disappeared, but this one works: Unicode Character Search and Shapecatcher.com: Unicode Character Recognition still works.

–jeroen

Posted in Development, Encoding, Power User, Software Development, Unicode, UTF-8, UTF8 | Leave a Comment »

« Previous Entries

Next Entries »

	Attila Kovacs on Crowbarring Windows 95 into Wi…
	Jeroen Wiert Pluimer… on Does Odido (the old T-Mobile N…
	Lars Fosdal on Security alarm provider Woonve…
	Thomas Mueller on Question got closed in May 202…
	Thaddy de Koning on Formulier voor bewindvoerders…

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

My Flickr Stream

Pages

All categories

Email Subscription

Archive for the ‘Encoding’ Category

Foute foutmelding @heldenvannu (inschrijving: Pakketten | HELDEN VAN . NU)

Delphi “Variant Records”, a few notes

Delphi “type types”: similar types but not the same type identity, some examples.

Link clearance: fonts, localization, languages, internationalization, PostScript, and more

START: Start a program, even if it is not on the PATH ideal to start various versions of apps from DOS

Delphi and C# compiler oddities

If you think CSV is easy; think again!

.NET/C# duh moment of the day: “A char can be implicitly converted to ushort, int, uint, long, ulong, float, double, or decimal (not the other way around; implicit != implicit)”

XML and HTML escapes

XML escapes

HTML escapes

Searchable UTF8 Unicode Characters

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

My Flickr Stream

Pages

All categories

Email Subscription

Archive for the ‘Encoding’ Category

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

XML escapes

HTML escapes

Rate this:

Share this:

Rate this:

Share this: