Archive for the ‘ASCII’ Category
Posted by jpluimers on 2017/09/25
At the end of April 2014, Roman Yankovsky started a nice [Wayback] discussion on Google+ trying to get upvotes for [Wayback] QualityCentral Report #: 124402: Compiler bug when comparing chars.
His report basically comes down to that when using Ansi character literals like #255, the compiler treats them as single-byte encoded characters in the current code page of your Windows context, translates them to Unicode, then processes them.
The QC report has been dismissed as “Test Case Error” (within 15 minutes of stating “need more info”) by one of the compiler engineers, directing to the [Wayback] UsingCharacterLiterals section of Delphi in a Unicode World Part III: Unicodifying Your Code where – heaven forbid – they suggest to replace #128 with the Euro-Sign literal.
I disagree, as the issue happens without any hint or warning whatsoever, and causes code that compiles fine in Delphi <= 2007 to fail in subtle ways on Delphi >= 2009.
The compiler should issue a hint or warning when you potentially can screw up. It doesn’t. Not here.
Quite a few knowledgeable Delphi people got involved in the discussion:
Read the rest of this entry »
Posted in Ansi, ASCII, Conference Topics, Conferences, CP437/OEM 437/PC-8, Delphi, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi 7, Delphi XE, Delphi XE2, Delphi XE3, Delphi XE4, Delphi XE5, Delphi XE6, Development, Encoding, Event, ISO-8859, Missed Schedule, QC, SocialMedia, Software Development, Unicode, UTF-8, Windows-1252, WordPress | Leave a Comment »
Posted by jpluimers on 2016/08/17
After yesterdays post on Testing and static methods don’t go well together, I read around on Source (kunststube [WayBack]) a bit more and found these very nice articles on encoding,Unicode and text:
Related on those, some other nice readings:
–jeroen
Posted in Ansi, ASCII, CP437/OEM 437/PC-8, Development, EBCDIC, Encoding, ISO-8859, ISO8859, Shift JIS, Software Development, Unicode, UTF-16, UTF-8, UTF16, UTF8, Windows-1252 | Leave a Comment »
Posted by jpluimers on 2015/10/13
So I won’t forget:
Even though this does not work on most USA T-Shirt sites, it works on this Dutch one: T-Shirt Ontwerpen – t-shirt zelf ontwerpen | Spreadshirt.
–jeroen
PS:
Read the rest of this entry »
Posted in ASCII, Development, Encoding, Software Development, Unicode | Leave a Comment »
Posted by jpluimers on 2015/07/08
Great post by Marjan Venema when you need to migrate your old Delphi programs to the modern Delphi world: [Wayback] 20 resources on migrating to Unicode with Delphi | Software on a String.
I’m glad that some of the links overlap with what I posted and presented in the past at:
Well done Marjan!
–jeroen
Posted in Ansi, ASCII, Delphi, Delphi 2, Delphi 2005, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi 3, Delphi 4, Delphi 5, Delphi 6, Delphi 7, Delphi XE, Delphi XE2, Delphi XE3, Delphi XE4, Delphi XE5, Delphi XE6, Delphi XE7, Development, Encoding, Software Development, Unicode | Leave a Comment »
Posted by jpluimers on 2015/01/29
When people tell you that ASCII is not an Internet Standard but an RFC. They are wrong. They used to be right though. Until 2015-01-12, when IETF declared the RFC 20 to be an Internet Standard: status-change-rfc20-ascii-format-to-standard-00.
So after more than 45 years (like many good things, the ASCII RFC is from 1969), it is not just an American Standard but an Internet Standard (:
Thanks Lauren Weinstein for sharing and Kristian Köhntopp for pointing to the reclassification.
–jeroen
via: ASCII – Wikipedia, the free encyclopedia.
Posted in ASCII, Development, Encoding, History, Software Development | Leave a Comment »
Posted by jpluimers on 2014/05/06
Apart from the mandatory Joel on Software article about Unicode and Character sets, these two articles are of great value too:
Fun to read from that blog is the Historical Technology section including this article:
–jeroen
PS: The mandatory one is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) – Joel on Software.
Posted in .NET, Ansi, ASCII, CP437/OEM 437/PC-8, Delphi, Development, EBCDIC, Encoding, ISO-8859, ISO8859, Shift JIS, Software Development, Unicode, UTF-8, UTF8, Windows-1252 | Leave a Comment »
Posted by jpluimers on 2013/06/11
A while ago, I needed to export pure ASCII text from a .NET app.
An important step there is to convert the diacritics to “normal” ASCII characters. That turned out to be enough for this case.
This is the code I used which is based on Extension Methods and this trick from Blair Conrad:
The approach uses String.Normalize to split the input string into constituent glyphs (basically separating the “base” characters from the diacritics) and then scans the result and retains only the base characters. It’s just a little complicated, but really you’re looking at a complicated problem.
Example code:
using System;
using System.Text;
using System.Globalization;
namespace StringToAsciiConsoleApplication
{
class Program
{
static void Main(string[] args)
{
string unicode = "áìôüç";
string ascii = unicode.ToAscii();
Console.WriteLine("Unicode\t{0}", unicode);
Console.WriteLine("ASCII\t{0}", ascii);
}
}
public static class StringExtensions
{
public static string ToAscii(this string value)
{
return RemoveDiacritics(value);
}
// http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net
private static string RemoveDiacritics(this string value)
{
string valueFormD = value.Normalize(NormalizationForm.FormD);
StringBuilder stringBuilder = new StringBuilder();
foreach (System.Char item in valueFormD)
{
UnicodeCategory unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(item);
if (unicodeCategory != UnicodeCategory.NonSpacingMark)
{
stringBuilder.Append(item);
}
}
return (stringBuilder.ToString().Normalize(NormalizationForm.FormC));
}
}
}
–jeroen
Posted in .NET, .NET 3.5, .NET 4.0, .NET 4.5, ASCII, C#, C# 3.0, C# 4.0, C# 5.0, Development, Encoding, Software Development, Unicode | Leave a Comment »
Posted by jpluimers on 2013/03/14
Variant Records are a feature that has been in the Pascal language since Standard Pascal.
A cool page for historic perspective is R3R: Pascal Features in Popular Compilers, hopefully someone will update it to more modern versions of the mentioned compilers.
There is not much official documentation on the Delphi side on this, so below some parts of a case I used for a project that started in 1997 and is still in use to day. Read the rest of this entry »
Posted in APPC, AS/400 / iSeries / System i, ASCII, COBOL, Communications Development, Conference Topics, Conferences, CPI-C, Delphi, Delphi 1, Delphi 2005, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi 3, Delphi 4, Delphi 5, Delphi 6, Delphi 7, Delphi 8, Delphi XE, Delphi XE2, Delphi XE3, Development, Encoding, Event, HIS Host Integration Services, Internet protocol suite, MQ Message Queueing/Queuing, SNA, Software Development, TCP, Unicode, UTF-8, WebSphere MQ | 9 Comments »
Posted by jpluimers on 2013/01/08
When developing in multiple languages, it sometimes is funny to see how they differ in compiler oddities.
Below are a few on const examples.
Basically, in C# you cannot go from a char const to a string const, and chars are a special kind of int.
In Delphi you cannot go from a string to a char. Read the rest of this entry »
Posted in .NET, ASCII, C#, C# 1.0, C# 2.0, C# 3.0, C# 4.0, C# 5.0, Delphi, Delphi 2009, Delphi 2010, Delphi XE, Delphi XE2, Development, Encoding, Software Development, Unicode | Leave a Comment »
Posted by jpluimers on 2012/07/26
While reviewing some client’s code, I noticed they were generating and parsing XML and HTML by hand (do not ever do that yourself!).
Before refactoring this into something that uses libraries that properly understand XML and HTML, I needed to assess some of the risks.
A major risk is to get the escaping (and unescaping) of XML and HTML right.
Time to finally organize some of my links on escaping HTML and XML that I had in my favourites list.
The starting point is the List of XML and HTML character entity references on Wikipedia. It is readable, complete and lists both kinds of escapes.
XML escapes
The official W3C text that describes XML escaping is hard to read.
There are only 5 predefined XML entities for characters that can (some must) be escaped. This table is derived from the Wikipedia article.
HTML escapes
Read the rest of this entry »
Posted in " quot, & amp, > gt, < lt, ' apos, ASCII, Development, Encoding, HTML, Power User, SocialMedia, Software Development, Unicode, Web Development, WordPress, XML, XML escapes, XML/XSD | 1 Comment »