The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,482 other followers

Archive for the ‘Encoding’ Category

PowerShell error in a script but not on the console: The string is missing the terminator: “.

Posted by jpluimers on 2021/09/29

The below one will fail in a script, both both work from the PowerShell prompt:

Success

Get-NetFirewallRule -DisplayGroup "File and Printer Sharing" | ForEach-Object { Write-Host $_.DisplayName ; Get-NetFirewallAddressFilter -AssociatedNetFirewallRule $_ }

Failure

Get-NetFirewallRule –DisplayGroup "File and Printer Sharing" | ForEach-Object { Write-Host $_.DisplayName ; Get-NetFirewallAddressFilter -AssociatedNetFirewallRule $_ }

The error you get this this:

At C:\bin\Show-File-and-Printer-Sharing-firewall-rules.ps1:5 char:52
+ ... -TCP-NoScope" | ForEach-Object { Write-Host $_.DisplayName ; Get-NetF ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The string is missing the terminator: ".
    + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : TerminatorExpectedAtEndOfString

Via [WayBack] script file ‘The string is missing the terminator: “.’ – Google Search, I quickly found these that stood out:

Cause and solution

Before DisplayGroup, the first line has a minus sign and the second an en-dash. You can see this via [WayBack] What Unicode character is this ?.

Apparently, when using Unicode on the console, it does not matter if you have a minus sign (-), en-dash (–), em-dash (—) or horizontal bar (―) as dash character. You can see this in [WayBack] tokenizer.cs at function [WayBack] NextToken and [WayBack] CharTraits.cs at function [WayBack] IsChar).

When saving to a non-Unicode file, it does matter, even though it does not display as garbage in the error message.

Similarly, PowerShell has support for these special characters:

    internal static class SpecialChars
    {
        // Uncommon whitespace
        internal const char NoBreakSpace = (char)0x00a0;
        internal const char NextLine = (char)0x0085;

        // Special dashes
        internal const char EnDash = (char)0x2013;
        internal const char EmDash = (char)0x2014;
        internal const char HorizontalBar = (char)0x2015;

        // Special quotes
        internal const char QuoteSingleLeft = (char)0x2018; // left single quotation mark
        internal const char QuoteSingleRight = (char)0x2019; // right single quotation mark
        internal const char QuoteSingleBase = (char)0x201a; // single low-9 quotation mark
        internal const char QuoteReversed = (char)0x201b; // single high-reversed-9 quotation mark
        internal const char QuoteDoubleLeft = (char)0x201c; // left double quotation mark
        internal const char QuoteDoubleRight = (char)0x201d; // right double quotation mark
        internal const char QuoteLowDoubleLeft = (char)0x201E; // low double left quote used in german.
    }

The easiest solution is to use minus signs everywhere.

Another solution is to save files as Unicode UTF-8 encoding (preferred) or UTF-16 encoding (which I dislike).

–jeroen

Posted in .NET, CommandLine, Development, Encoding, PowerShell, PowerShell, Scripting, Software Development, Unicode, UTF-16, UTF-8, UTF16, UTF8 | Leave a Comment »

Unicode superscript and subscript alphabetic letters

Posted by jpluimers on 2021/08/04

Not all letters have superscript or subscript counterparts. The counterparts are from different ranges, so might not look nice when next to each other.

I think 20th using Unicode lowercase superscript looks ugly 20ᵗʰ. With uppercase superscript it is somewhat OK: 20ᵀᴴ.

The list is from [WayBack] javascript – How to find the unicode of the subscript alphabet? – Stack Overflow:

Take a look at the wikipedia article Unicode subscripts and superscripts. It looks like these are spread out across different ranges, and not all characters are available.

Consolidated for cut-and-pasting purposes, the Unicode standard defines complete sub- and super-scripts for numbers and common mathematical symbols ( ⁰ ¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ⁺ ⁻ ⁼ ⁽ ⁾ ₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ ₊ ₋ ₌ ₍ ₎ ), a full superscript Latin lowercase alphabet except q ( ᵃ ᵇ ᶜ ᵈ ᵉ ᶠ ᵍ ʰ ⁱ ʲ ᵏ ˡ ᵐ ⁿ ᵒ ᵖ ʳ ˢ ᵗ ᵘ ᵛ ʷ ˣ ʸ ᶻ ), a limited uppercase Latin alphabet ( ᴬ ᴮ ᴰ ᴱ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ ᴼ ᴾ ᴿ ᵀ ᵁ ⱽ ᵂ ), a few subscripted lowercase letters ( ₐ ₑ ₕ ᵢ ⱼ ₖ ₗ ₘ ₙ ₒ ₚ ᵣ ₛ ₜ ᵤ ᵥ ₓ ), and some Greek letters ( ᵅ ᵝ ᵞ ᵟ ᵋ ᶿ ᶥ ᶲ ᵠ ᵡ ᵦ ᵧ ᵨ ᵩ ᵪ ). Note that since these glyphs come from different ranges, they may not be of the same size and position, depending on the typeface.

After a nice chat with my nephew EWD, I did some research and found the above via

–jeroen

Posted in Development, Encoding, internatiolanization (i18n) and localization (l10), Power User, Software Development, Unicode | Leave a Comment »

Fixing hg.exe “ImportError: DLL load failed: %1 is not a valid Win32 application.”

Posted by jpluimers on 2021/07/21

If you get the below error when running hg.exe, then you are mixing a 64-bit Mercurial with 32-bit dependencies:

C:\>hg --version
Traceback (most recent call last):
  File "hg", line 43, in 
  File "hgdemandimport\demandimportpy2.pyc", line 150, in __getattr__
  File "hgdemandimport\demandimportpy2.pyc", line 94, in _load
  File "hgdemandimport\demandimportpy2.pyc", line 43, in _hgextimport
  File "mercurial\dispatch.pyc", line 22, in 
  File "hgdemandimport\demandimportpy2.pyc", line 248, in _demandimport
  File "hgdemandimport\demandimportpy2.pyc", line 43, in _hgextimport
  File "mercurial\i18n.pyc", line 28, in 
  File "hgdemandimport\demandimportpy2.pyc", line 150, in __getattr__
  File "hgdemandimport\demandimportpy2.pyc", line 94, in _load
  File "hgdemandimport\demandimportpy2.pyc", line 43, in _hgextimport
  File "mercurial\encoding.pyc", line 24, in 
  File "mercurial\policy.pyc", line 101, in importmod
  File "mercurial\policy.pyc", line 63, in _importfrom
  File "hgdemandimport\demandimportpy2.pyc", line 164, in __doc__
  File "hgdemandimport\demandimportpy2.pyc", line 94, in _load
  File "hgdemandimport\demandimportpy2.pyc", line 43, in _hgextimport
  File "mercurial\cext\parsers.pyc", line 12, in 
  File "mercurial\cext\parsers.pyc", line 10, in __load
ImportError: DLL load failed: %1 is geen geldige Win32-toepassing.

The equivalent English error is [WayBack] ImportError: DLL load failed: %1 is not a valid Win32 application.” – Google Search.

The problem is the bitness of hg.exe: [WayBack] python – Error while installing Mercurial on IIS7 64bit: “DLL Load Failed: %1 is not a valid Win32 application” – Stack Overflow

You can quickly figure out the bitness of hg.exe:

C:\>where hg
C:\Program Files\Mercurial\hg.exe

C:\>sigcheck "C:\Program Files\Mercurial\hg.exe"

Sigcheck v2.72 - File version and signature viewer
Copyright (C) 2004-2019 Mark Russinovich
Sysinternals - www.sysinternals.com

c:\program files\mercurial\hg.exe:
        Verified:       Unsigned
        Link date:      17:49 9-7-2019
        Publisher:      n/a
        Company:        n/a
        Description:    Fast scalable distributed SCM (revision control, version control) system
        Product:        mercurial
        Prod version:   5.0.2
        File version:   5.0.2
        MachineType:    64-bit

Forcing x86 of Mercurial

Since I use chocolatey for most my installs, I forced x86 the Chocolatey way:

So after these:

choco uninstall --yes hg
choco install --yes --force86 hg

I got this signature check:

C:\>sigcheck "C:\Program Files (x86)\Mercurial\hg.exe"

Sigcheck v2.72 - File version and signature viewer
Copyright (C) 2004-2019 Mark Russinovich
Sysinternals - www.sysinternals.com

c:\program files (x86)\mercurial\hg.exe:
        Verified:       Unsigned
        Link date:      17:50 9-7-2019
        Publisher:      n/a
        Company:        n/a
        Description:    Fast scalable distributed SCM (revision control, version control) system
        Product:        mercurial
        Prod version:   5.0.2
        File version:   5.0.2
        MachineType:    32-bit

–jeroen

Posted in Development, DVCS - Distributed Version Control, Encoding, Mercurial/Hg, Python, Scripting, Software Development, Source Code Management | Leave a Comment »

SQL server: getting database names and IDs

Posted by jpluimers on 2021/06/29

A few statements go get database names and IDs based on these functions or system tables:

Part of it has the assumption that a master database always exists.

-- gets current database name
select db_name() as name
;
name
--------------------------------------------------------------------------------------------------------------------------------
acc

(1 row affected)
-- gets current database ID
select db_id() as dbid
;
dbid
------
5

(1 row affected)
-- gets all database IDs and names
select dbid,name from sys.sysdatabases
;
dbid   name
------ --------------------------------------------------------------------------------------------------------------------------------
1      master
5      acc

(2 rows affected)
-- gets current database name by ID
select db_name(db_id()) as name
;
name
--------------------------------------------------------------------------------------------------------------------------------
acc

(1 row affected)
-- gets case corrected database name for sys.sysdatabases.name having a case insensitive collation sequence
select dbid,name from sys.sysdatabases 
where name='Master'
;
dbid   name
------ --------------------------------------------------------------------------------------------------------------------------------
1      master

(1 row affected)
-- gets case corrected database name for sys.sysdatabases.name having a case sensitive collation sequence
select dbid,name from sys.sysdatabases 
where name = 'Master' collate Latin1_General_100_CI_AI
;
dbid   name
------ --------------------------------------------------------------------------------------------------------------------------------
1      master

(1 row affected)

Note that:

  • even though by default the SQL server collation sequence is case insensitive, it can make sense to do a case insensitive search, for example by using the upper function, specifying a collation, or casting to binary. I like upper the most, because  – though less efficient – it is a more neutral SQL idiom.
  • the most neutral case insensitive collation seems to be Latin1_General_100_CI_AI

Related:

  • [WayBack] SQL server ignore case in a where expression – Stack Overflow answered by Solomon Rutzky, summarised as:
    • Do not use upper as upper with lower does not always round-trip.
    • Do not use varbinary as it is not case insensitive.
    • Neither the = or like operators are case sensitive by default: both need a collate clause.
    • Find the collation of the column(s) involved; if it contains _CI, then you are done (it is already case insensitive); if it contains _CS, then replace that with _CI (case insensitive) and add that in a collate clause.
    • Collations are per predicate, so not per query, per table, per column nor per database. This means you have to specify them if you want to use a different one than the default.
  • [WayBack] What is Collation in Databases? | Database.Guide
    Latin1_General_100_CI_AI Latin1-General-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive
  • [WayBack] Collation Info: Information about Collations and Encodings for SQL Server
  • [WayBack] SQL Instance Collation – Language Neutral Required:

    I recommend using Latin1_General_100_CI_AI. I recommend this because:

    1. If Latin1_General_CI_AI is supported, then there’s almost no chance thatLatin1_General_100_CI_AI (which is a far better choice) isn’t also supported. The version 100 collation has about 15,400 more sort weight definitions, plus 438 more uppercase/lowercase mappings. Not having those sort weights means that 15,400 more characters in the non-100 version equate to space, an empty string, and to each other. Not having those case mappings means that 438 more characters in the non-100 version return the character passed in (i.e. no change) for the UPPER() and LOWER() functions. There is no reason at all to want Latin1_General_CI_AI instead of Latin1_General_100_CI_AI. There might be a need if code was put into place to work around these deficiencies, and that code would behave incorrectly under the newer, better version of that collation. However, it’s highly unlikely that code was put into place to account for this, and extremely unlikely that if such code did exist, that it would error or doing things incorrectly due to the newer collation.
  • [WayBack] Differences Between the Various Binary Collations (Cultures, Versions, and BIN vs BIN2) – Sql Quantum Leap
  • [WayBack] How to do a case sensitive search in WHERE clause (I’m using SQL Server)? – Stack Overflow answered by Jonas Lincoln:

    By using collation or casting to binary, like this:

    SELECT *
    FROM Users
    WHERE   
        Username = @Username COLLATE SQL_Latin1_General_CP1_CS_AS
        AND Password = @Password COLLATE SQL_Latin1_General_CP1_CS_AS
        AND Username = @Username 
        AND Password = @Password 

    The duplication of username/password exists to give the engine the possibility of using indexes. The collation above is a Case Sensitive collation, change to the one you need if necessary.

    The second, casting to binary, could be done like this:

    SELECT *
    FROM Users
    WHERE   
        CAST(Username as varbinary(100)) = CAST(@Username as varbinary))
        AND CAST(Password as varbinary(100)) = CAST(@Password as varbinary(100))
        AND Username = @Username 
        AND Password = @Password 
  • [WayBack] sql – How to get Database name of sqlserver – Stack Overflow

–jeroen

Posted in Database Development, Development, Encoding, internatiolanization (i18n) and localization (l10), SQL Server | Leave a Comment »

“No mapping for the Unicode character exists in the target multi-byte code page”

Posted by jpluimers on 2021/06/24

Usually when I see this error [Wayback] “No mapping for the Unicode character exists in the target multi-byte code page” – Google Search, it is in legacy code that uses string buffers where decoding or decompressing data into.

This is almost always wrong no matter what kind of data you use, as it will depend in your string encoding.

I have seen it happen especially in these cases:

  • base64 decoding from string to string (solution: decode from a string stream into a binary stream, then post-process from there)
  • zip or zlib decompress from binary stream to string stream, then reading the string stream (solution: decompress from binary stream to binary stream, then post-process from there)

Most cases I encountered were in Delphi and C code, but surprisingly I also bumped into C# exhibiting this behaviour.

I’m not alone, just see these examples from the above Google search:

–jeroen

Posted in .NET, base64, C, C#, C++, Delphi, Development, Encoding, Software Development, Unicode | Leave a Comment »

 
%d bloggers like this: