A while ago, I had to fix some stuff in an application that would write – using a binary mechanism – UTF-8 and UTF-16 strings (part of it XML in various flavours) to the same byte stream without converting between the two encodings.
Some links that helped me investigate what was wrong, choose what encoding to use for storage and fix it:
And even though you didn’t ask, you can use LOCALE_IDEFAULTCODEPAGE to get the OEM code page for a locale.
Bonus gotcha: There are a number of locales that are Unicode-only. If you ask the GetLocaleInfo function and ask for their ANSI and OEM code pages, the answer is “Um, I don’t have one.” (You get zero back.)
A few years back I put all my conferences material in a GitHub repository https://github.com/jpluimers/Conferences/. There were a lot directories and files so I didn’t pay much attention to the initial check-in list. The files had been part of copy.com syncing between Windows and Mac machines.
I choose a Mac because it is closer to a Linux machine than Widows so I expected no encoding trouble (as git has a Linux origin: it “was created by Linus Torvalds in 2005 for development of the Linux kernel“).
Boy I was wrong:
Recently I cloned the repository in a different place and found out a few strange things:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
The trick comes down to enabling the PreferExternalManifest registry setting and then create a manual manifest for the application that forces the application to use “bitmap scaling” by basically telling it does not support “XP style DPI scaling”.
You name manifest file named after the exe and stored it in the same directory as the exe.
After that, you also have to rename the exe to a temporary name and then back in order to refresh the cache.
A quote from the trick:
In Windows Vista, you had two possible ways of scaling applications: with the first one (the default) applications were instructed to scale their objects using the scaling factor imposed by the operating system. The results, depending on the quality of the application and the Windows version, could vary a lot. Some scaled correctly, some other look very similar to what we are seeing in SSMS, with some weird-looking GUIs. In Vista, this option was called “XP style DPI scaling”.
The second option, which you could activate by unchecking the “XP style” checkbox, involved drawing the graphical components of the GUI to an off-screen buffer and then drawing them back to the display, scaling the whole thing up to the screen resolution. This option is called “bitmap scaling” and the result is a perfectly laid out GUI.
In order to enable this option in Windows 10, you need to merge this key to your registry:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\SideBySide]
"PreferExternalManifest"=dword:00000001
Then, the application has to be decorated with a manifest file that instructs Windows to disable DPI scaling and enable bitmap scaling, by declaring the application as DPI unaware. The manifest file has to be saved in the same folder as the executable (ssms.exe) and its name must be ssms.exe.manifest. In this case, for SSMS 2014, the file path is “C:\Program Files (x86)\Microsoft SQL Server\120\Tools\Binn\ManagementStudio\Ssms.exe.manifest”.
Paste this text inside the manifest file and save it in UTF8 encoding:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This “Vista style” bitmap scaling is very similar to what Apple is doing on his Retina displays, except that Apple uses a different font rendering algorithm that looks better when scaled up. If you use this technique in Windows, ClearType rendering is performed on the off-screen buffer before upscaling, so the final result might look a bit blurry.The amount of blurriness you will see depends on the scale factor you set in the control panel or in the settings app in Windows 10. Needless to say that exact pixel scaling looks better, so prefer 200% over 225% or 250% scale factors, because there is no such thing as “half pixel”.
On basics, sorting, comparison, decomposition, composition, width, whitespace, encoding, emoji, interesting code planes and dark corners. Lots of dark corners.
The examples are in Python, but hold for almost any programming language
A while ago, I needed to get the various date, time and week values from WMIC to environment variables with pre-padded zeros. I thought: easy job, just write a batch file.
Tough luck: I couldn’t get the values to expand properly. Which in the end was caused by WMIC emitting UTF-16 and the command-interpreter not expecting double-byte character sets which messed up my original batch file.
Actually, because of a a common “beautification” of many Office suites (Microsoft and Open alike), the single quote was a special one: a Unicode Character ‘RIGHT SINGLE QUOTATION MARK’ (U+2019) which in UTF-8 is encoded as 0xE2 0x80 0x99.