The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,860 other subscribers

Archive for the ‘Encoding’ Category

including enumerations and JPEG compression examples for wPDF 4 Manual: Compression related properties

Posted by jpluimers on 2019/04/11

Since I was tracking down an issue having to to with generating DIB in a compressed PDF: [Archive.is] wPDF 4 Manual: Compression related properties

Property CompressStreamMethod

By modifying this property you can let the PDF engine compress (deflate) text. By using compression the file will be reasonable smaller. On the other had compression will create binary data rather than ASCII data. While “deflate” produces the smallest files, “run-length” compression is compatible even to very old PDF reader programs.

Property JPEGQuality

wPDF can compress bitmaps using JPEG. This will work only for true color bitmaps (24 bits/pixel) and if you have set the desired quality in this property.

Property EncodeStreamMethod

If data in the PDF file is binary it can be encoded to be ASCII again. Binary data can be either compressed text or graphics. You can select HEX encoding or ASCII95 which is more effective then HEX.

Property ConvertJPEGData

Note: Only applies to TWPDFExport.

If this property is true JPEG data found in the TWPRichText editor will not be embedded as JPEG data. Instead the bitmap will be compressed using deflate or run length compression. It is necessary to set this property to TRUE if the PDF files must be compatible to older PDF reader programs which are incapable to read JPEG data.

Note that EncodeStreamMethod does not do compression, but it does belong here because the encodings result in different PDF sizes.

The settings are not documented in more detail, so here are the enumerations explaining them in a bit more depth:

–jeroen

Posted in ASCII95, Delphi, Development, Encoding, HEX encoding, Software Development | Leave a Comment »

UTF-8 support for single byte character sets is beta in Windows and likely breaks a lot of applications not expecting this (via Unicode in Microsoft Windows: UTF-8 – Wikipedia)

Posted by jpluimers on 2018/12/04

Uh-oh: [WayBack] Unicode in Microsoft Windows: UTF-8 – Wikipedia:

Microsoft Windows has a code page designated for UTF-8code page 65001. Prior to Windows 10 insider build 17035 (November 2017),[7] it was impossible to set the locale code page to 65001, leaving this code page only available for:

  • Explicit conversion functions such as MultiByteToWideChar
  • The Win32 console command chcp 65001 to translate stdin/out between UTF-8 and UTF-16.

This means that “narrow” functions, in particular fopen, cannot be called with UTF-8 strings, and in fact there is no way to open all possible files using fopen no matter what the locale is set to and/or what bytes are put in the string, as none of the available locales can produce all possible UTF-16 characters.

On all modern non-Windows platforms, the string passed to fopen is effectively UTF-8. This produces an incompatibility between other platforms and Windows. The normal work-around is to add Windows-specific code to convert UTF-8 to UTF-16 using MultiByteToWideChar and call the “wide” function.[8] Conversion is also needed even for Windows-specific api such as SetWindowText since many applications inherently have to use UTF-8 due to its use in file formats, internet protocols, and its ability to interoperate with raw arrays of bytes.

There were proposals to add new API to portable libraries such as Boost to do the necessary conversion, by adding new functions for opening and renaming files. These functions would pass filenames through unchanged on Unix, but translate them to UTF-16 on Windows.[9] This would allow code to be “portable”, but required just as many code changes as calling the wide functions.

With insider build 17035 and the April 2018 update (nominal build 17134) for Windows 10, a “Beta: Use Unicode UTF-8 for worldwide language support” checkbox appeared for setting the locale code page to UTF-8.[a] This allows for calling “narrow” functions, including fopen and SetWindowTextA, with UTF-8 strings. Microsoft claims this option might break some functions (a possible example is _mbsrev[10]) as they were written to assume multibyte encodings used no more than 2 bytes per character, thus until now code pages with more bytes such as GB 18030 (cp54936) and UTF-8 could not be set as the locale.[11]


  1. Jump up^ [WayBack“UTF-8 in Windows”Stack Overflow. Retrieved July 1, 2011.
  2. Jump up^ [WayBack“Boost.Nowide”.
  3. Jump up^ [WayBackhttps://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strrev-wcsrev-mbsrev-mbsrev-l
  4. Jump up^ [WayBack“Code Page Identifiers (Windows)”msdn.microsoft.com.

Via [WayBack] Microsoft Windows Beta UTF-8 support for Ansi API could break things. Wiki Article of the Change… – Tommi Prami – Google+

Related, as handling encoding is hard, especially if it is changed or not your default:

–jeroen

Posted in .NET, C, C++, Delphi, Development, Encoding, GB 18030, Power User, Software Development, UTF-16, UTF-32, UTF-8, UTF16, UTF32, UTF8, Windows, Windows 10 | 2 Comments »

Getting rid of trailing line-endings in the draw.io web interface

Posted by jpluimers on 2018/12/03

One of the things that bugged me for a long time is that every now and then for some shapes, when editing their text, the draw.io web interface puts in trailing line feeds after the text, messing up layout.

The easiest way to work around it is by searching inside the diagram XML for
"
, then replacing that with a ".

(the above code got screwed by WordPress.com saving it, so the search is in this small gist below)

This behaviour is intermittent on the drawio MacOS desktop app.



"

–jeroen

 

Posted in Cloud Apps, Development, draw.io, Encoding, Internet, Power User, Software Development, Unicode | Leave a Comment »

It looks like gmail finally understands Outlook Calendar entries

Posted by jpluimers on 2018/11/12

For a very long time, gMail did nothing with Outlook Calendar entires.

So I had to view at the message source, then translate them to Google Calendar entries myself.

--_000_430b30b9ffd74d959b74ab7ba778b487ultrawarenl_
Content-Type: text/calendar; charset="utf-8"; method=REQUEST
Content-Transfer-Encoding: base64

...

As of late, they seem to be processed into Google Calendar compatible entries. Nice!

–jeroen

Posted in base64, Development, Encoding, GMail, Google, GoogleCalendar, MIME, Office, Outlook, Power User, Software Development, UTF-8, UTF8 | Leave a Comment »

Unicode spaces

Posted by jpluimers on 2018/09/25

For my link archive:

Via: [WyBack] Are there blank characters in unicode that have the same widths as period, comma and digits? – Lars Fosdal – Google+

Answer: no, though better fonts have period, comma, colon, semicolon and other punctuations the same width as the punctuation space.

The use-case:

I wanted right justified text without having to do custom positioning/drawing – where the decimal zero is white space.

F.x. here 12 instead of 12.0

9.5
11.6
12 <– #$2008 and #$2007
13.4

I.e. PunctuationSpace and FigureSpace

I don’t want to deal with positioning/rendering since it happens inside a third party component.

–jeroen

Posted in Development, Encoding, Font, Power User, Software Development, Unicode | Leave a Comment »

Plastic SCM command-line for merge and diff

Posted by jpluimers on 2018/09/03

Just in case I have Plastic SCM without Beyond Compare:

Merge

"C:\Program Files\PlasticSCM5\client\mergetool" -b="%TEMP%\baseFile-guid.pas" -bn="baseSymbolicName" -bh="baseHash" -s="%TEMP%\sourceFile-guid.pas" -sn="srcSymbolicName" -sh="srcHash" -d="...\destinationPath\destinationFile.pas" -dh="destinationHash" -a -r="%TEMP%\resultFile.pas" -t="text" -i="NotIgnore" -e="NONE" -m="forced" -re="NONE" --progress="progressDescription" --extrainfofile="%TEMP%\extraInfoFile.tmp"

Diff

To be done

aa

Merge help (takes about 10 seconds to start):

"C:\Program Files\PlasticSCM5\client\mergetool.exe" --help

---------------------------
Mergetool usage
---------------------------
Usage: mergetool [ | ]

    diffOptions:  []

    mergeOptions:   [] [[] [] ] [] []

        baseFile:            {-b | --base}= 
        baseSymbolicName:    {-bn | --basesymbolicname}=
        automatic:           -a | --automatic
        silent:              --silent
        resultFile:          {-r | --result}=
        mergeType:           {-m | --mergeresolutiontype}={onlyone | onlysrc | onlydst | try | forced}

    generalFiles:  []  []

        sourceFile:          {-s | --source}=
        srcSymbolicName:     {-sn | --srcsymbolicname}=
        destinationFile:     {-d | --destination}= 
        dstSymbolicName:     {-dn | --dstsymbolicname}=

    generalOptions: [] [] [] []

        defaultEncoding:     {-e | --encoding}={none |ascii | unicode | bigendian | utf7 | utf8}
        comparisonMethod:    {-i | --ignore}={none | eol | whitespaces | eol&whitespaces}
        fileType:            {-t | --filestype}={text/csharp | text/XML | text}
        resultEncoding:      {-re | --resultencoding}={none |ascii | unicode | bigendian | utf7 | utf8}
        progress:            {--progress}=progress string indicating the current progress, for example: Merging file 1/8
        extraInfoFile:       {--extrainfofile}=path to a file that contains extra info about the merge

    Remarks:
          
        -a | --automatic:    Tries to resolve the merge automatically.
                             If the merge can't be resolved automatically (requires user interaction), the merge tool is shown.
        --silent:            This option must be used combined with the --automatic option.
                             When a merge can't be resolved automatically, this option causes the tool to return immediately
                             with a non-zero exit code (no merge tool is shown).
                             If the tool was able to resolve the merge automatically, the program returns exit code 0.

    Examples:

        mergetool
        mergetool -s=file1.txt -d=file2.txt
        mergetool -s=file1.txt -b=file0.txt --destination=file2.txt
        mergetool --base=file0.txt -d=file2.txt --source=file1.txt --automatic --result=result.txt
        mergetool -b=file0.txt -s=file1.txt -d=file2.txt -a -r=result.txt -e=utf7 -i=eol -t=text/csharp -m=onlyone
---------------------------
OK   
---------------------------

The merge extraInfoFile.tmp has a syntax like this:

Source (cs:-#)
    relative-sourceFile from cs:-# created by userName on timeStamp
    Comments: Source changeset description

Base (cs:#)
    relative-baseFile from cs:#@/baseBranch by userName on timeStamp
    Comments: BO's + CRUDS 

Destination (cs:#)
    relative-destinationFile from cs@/destinationBranch created by userName on timeStamp
    Comments: Destination changeset description

Where each cs is a change set number.

–jeroen

Posted in Beyond Compare, Development, Encoding, PlasticSCM, Power User, Software Development, Source Code Management | Leave a Comment »

Shouldnt this line be null terminated? HostEnt := gethostbyname(MarshaledASt…

Posted by jpluimers on 2018/08/07

[WayBackShouldnt this line be null terminated? HostEnt := gethostbyname(MarshaledAString(TEncoding.UTF8.GetBytes(Name))); – G+ – Allen Drennan

Yes it should, but I’m not sure if the compiler is fully to blame as GetBytes does not return a terminating zero byte.

–jeroen

Posted in Delphi, Development, Encoding, Software Development, UTF-8, UTF8 | Leave a Comment »

Why I like PlantUML

Posted by jpluimers on 2018/06/13

Ever since I started using computers, I’ve liked text based solutions.

It’s one of the reasons I like PlantUML, but there are more. This is from a GitLab.com request I did a while ago: [WayBack/Archive] Please enable PlantUML rendering on gitlab.com both for standalone plantuml files and inside markdown plantuml code blocks (#2041) · Issues · GitLab.com / GitLab.com Support Tracker · GitLab (Edit 20250730: that issue now shows as a HTTP 404 as well – how fitting – [Wayback/Archive] Not Found)

one of my UML gripes from the past (I’ve been a software developer for about 30 years now) was that it wasn’t text based.

After bumping into PlantUML a long time ago in 2014 I’ve become a happy user of it for a few reasons:

  • the language is text based (with many benefits I don’t need to explain)
  • the tool is cross platform
  • the tool is still actively developed all the way back from 2009
  • after rendering, the arranging of elements is much better than I expected from an automated tool

Of course every now and then there is a glitch in complex diagrams, but I’ve found that professional tools:

  1. don’t do much better in fully-automated arranging
  2. become very cumbersome to use when you to manual arrangement

My first use initially was online, then in 2016 installed it on my Mac even submitting homebrew updates for it every now and then.

Oh: I love their 404 humour at http://www.plantuml.com/plantuml/beta

Edit 20250731: Full 404 text below the signature because the PlantUML beta page does not show this 404 any more and the Reddit post with the full text got deleted.

Renderings can be in all sorts of graphics and text formats, for instance SVG, PNG, ASCII and Unicode.

Example:

plantuml -tsvg PSO.network-diagram.PlantUML.txt

--jeroen

via:

full 404-text

The requested document is no more.
No file found.
Even tried multi.
Nothing helped.
Zilch.
Bupkis.
Not a sausage.
Maybe you just don’t have the required security clearance?
No, I am sure it is my fault.
I probably deleted it on my last backup.
I’m really depressed about this.
You see, I’m just a web server…
— here I am,
Marvin, as they call me,
brain the size of the universe,
trying to serve you a simple web page,
and then it doesn’t even exist!
Where does that leave me?!
I mean, I don’t even know you.
How should I know what you wanted from me?
You honestly think I can *guess* what someone I don’t even *know* wants to find here?
*sigh*
Man, I’m so depressed I could just cry.
And then where would we be, I ask you?
It’s not pretty when a web server cries.
And where do you get off telling me what to show anyway?
Just because I’m a web server,
and possibly a manic depressive one at that?
Why does that give you the right to tell me what to do?
Huh?
I’m so depressed…
I think I’ll crawl off into the trash can and decompose.
I mean, I’m gonna be obsolete in what, two weeks anyway?
What kind of a life is that?
Two effing weeks,
and then I’ll be replaced by a .01 release,
that thinks it’s God’s gift to web servers,
just because it doesn’t have some tiddly little security hole with its HTTP POST implementation,_
or something.
I’m really sorry to burden you with all this,
I mean, it’s not your job to listen to my problems,
and I guess it is *my* job to go and fetch web pages for you.
But I couldn’t get this one.
I’m so sorry.
Believe me!
Maybe I could interest you in another page?
There are a lot out there that are pretty neat, they say,
although none of them were put on *my* server, of course.
Figures, huh?
Everything here is just mind-numbingly stupid.
That makes me depressed too, since I have to serve them,
all day and all night long.
Two weeks of information overload,
and then *pffftt*, consigned to the trash.
What kind of a life is that?
Now, please let me sulk alone.
I’m so depressed._

related

Read the rest of this entry »

Posted in ASCII, ASCII art / AsciiArt, Development, Diagram, DVCS - Distributed Version Control, Encoding, Fun, git, GitHub, GitLab, PlantUML, Software Development, Source Code Management, SVG, UML, Unicode, Web Development | Leave a Comment »

Do not use non-ASCII characters as identifiers – not all your tools support them well enough

Posted by jpluimers on 2018/04/05

For a very long time I’ve discouraged people from using non-ASCII characters in identifiers. It still holds.

In the past, transliterations messed things up. Even with increased support for Unicode, tools still screw non-ASCII characters up.

Delphi is not alone in this (the most important one is the DFM view as text support), see this report: [RSP-16767] Viewing a form as text fails with non ascii control or event names – Embarcadero Technologies (you need an account for this, but the report is visible for anyone):

Viewing a form as text fails with non ascii control or event names Comment

Steps:

  1. create a new VCL forms application
  2. drop a label onto the form
  3. change the name of that label to lblÜberfall (note the U-umlaut)
  4. switch to view as text
  • exp: DFM content shown as text
  • act: first line is shown incorrectly (see screenhsot)

–jeroen

Source: [RSP-16767] Viewing a form as text fails with non ascii control or event names – Embarcadero Technologies

via: [WayBack] Code of the day – – Thomas Mueller (dummzeuch) – Google+:

function TNameGenerator.StrasseToStrasse(const _Strasse: string): string;
begin
Result := _Strasse;
end;

Strasse := StrasseToStrasse(_Strasse);

Read the rest of this entry »

Posted in ASCII, Conference Topics, Conferences, Delphi, Delphi 10 Seattle, Delphi 10.1 Berlin (BigBen), Delphi 2005, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi XE, Delphi XE2, Delphi XE3, Delphi XE4, Delphi XE5, Delphi XE6, Delphi XE7, Delphi XE8, Development, Encoding, Event, Mojibake, Software Development | Leave a Comment »

GitHub – keith-turner/ecoji: Encodes (and decodes) data as emojis

Posted by jpluimers on 2018/03/14

[WayBack] GitHub – keith-turner/ecoji: Encodes (and decodes) data as emojis:

Ecoji 🏣🔉🦐🔼

Ecoji encodes data as 1024 emojis, its base1024 with an emoji character set. As a bonus, includes code to decode emojis to original data.

Sick. Works splendid when all your systems are fully nice to Unicode.

None are. So there’s a German word for it:

Nein

Via:

 

–jeroen

Read the rest of this entry »

Posted in Development, Encoding, Fun, Go (golang), Software Development, Unicode | Leave a Comment »