For my link archive: [WayBack] Encode/Decode Quoted Printable – Webatic.
It did a splendid job at decoding email files in MIME format Quoted-printable.
–jeroen
Posted by jpluimers on 2021/03/17
Most searches for “ASCII emoticons” get you Unicode ones:
Luckily most are ASCII in List of emoticons – Wikipedia.
There are also shortcodes, which do not visually represent an emoji, but usually get translated to the image or Unicode character.
A few lists on them:
–jeroen
Posted in ASCII, Development, Encoding, LifeHacker, Power User, Software Development, Unicode | Leave a Comment »
Posted by jpluimers on 2021/01/21
I should have had the below answer when writing about StUF – receiving data from a provider where UTF-8 is in fact ISO-8859.
A while ago, a co-worker did not believe when I told that default XML encoding really is UTF-8
(and tried to force it to utf-8
), and that if the content had byte sequences different from the (either specified or default) encoding, it was a problem.
I though I blogged about the default, and where to find it, but apparently, I did not.
My blog had (and has <g>) a truckload of articles mentioning UTF-8, less articles containing UTF-8, encoding and xml, but the ones having UTF-8, default, encoding and xml did not actually tell about a standard that really defines XML uses UTF-8 as default encoding when there is no other encoding information – like BOM (byte order mark), HTTP, or MIME encoding) available.
W3C indeed specifies it. [WayBack] utf 8 – How default is the default encoding (UTF-8) in the XML Declaration? – Stack Overflow has a summary (thanks James Holderness!):
The Short Answer
Under the very specific circumstances of a UTF-8 encoded document with no external encoding information (which I understand from the comments is what you’re interested in), there is no difference between the two declarations.
The long answer is far more interesting though.
and an elaboration:
Posted in Development, Encoding, Software Development, UTF-8, UTF8, XML, XML/XSD | Leave a Comment »
Posted by jpluimers on 2020/10/13
The Delphi compiler does not see a unicode non-breaking space (0x00A0 as whitespace, and the Delphi IDE does not warn you about it: [WayBack] Delphi revelations #2 – Space characters are not just space characters.
Given that this character was introduced in 1993, I wonder how the compiler tests look like.
These also will not be recognised as whitespace:
Related, as many other tools also do not properly support various whitespace characters:
Via: [WayBack] A Delphi “Aha” experience – Kim Madsen – Google+
–jeroen
Posted in Delphi, Development, Software Development, Unicode | Leave a Comment »
Posted by jpluimers on 2020/02/24
From quite some time ago, but still very relevant as encoding issues keep occurring:
A while ago, I saw the text “v3/43/4r” in a document.I know it comes from “vóór” (the acute accent emphasises in Dutch), and wonder which encoding failure was applied to get this wrong.
Source: [WayBack] Which encoding failure did encode “vóór” into “v3/43/4r”? – Stack Overflow
From the [WayBack] answer by rodrigo:
- ó: is U+00F3, and occupies the same codepoint (0xF3) in a lot of different encodings (most ISO-8859-* and most western Windows-*).
- In CP850 the codepint 0xF3 is ¾ (U+00BE), that is the three-quarters character. It is the same in other, less used, codepages (CP775, CP856, CP857, CP858).
- The ¾ is sometimes transliterated to 3/4 when the character is not directly available.
And there you are! “vóór” -> “v¾¾r” -> “v3/43/4r”.
The first part (ó -> ¾) is the usual corruption of ANSI vs. OEM codepages in the Western Windows versions (in my country ANSI=Windows-1252, OEM=CP850). You can see it easily creating a file with NOTEPAD, writing
vóór
and dumping it in a command prompt withtype
.
–jeroen
Posted in CP850, Development, Encoding, Software Development, UTF-8, UTF8, Windows-1252 | Leave a Comment »