Michael Kaplan Obituary – Berkowitz-Kumin-Bookatz | Cleveland Heights OH (and a whole bunch of info in zero width Unicode stuff) « The Wiert Corner

January 2018
M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Michael Kaplan Obituary – Berkowitz-Kumin-Bookatz | Cleveland Heights OH (and a whole bunch of info in zero width Unicode stuff)

Posted by jpluimers on 2018/01/02

I totally missed the passing of Michael Scott Kaplan some 2 years ago, so a belated R.I.P. is in place.

Obituary for Michael Kaplan, Michael Scott Kaplan, 45, passed away Wednesday, October 21, 2015, in Redmond, WA, after a brave battle with MS for 25 years. He was a lead software developer for Microsoft.

Source: [WayBack] Michael Kaplan Obituary – Berkowitz-Kumin-Bookatz | Cleveland Heights OH

Michael was the leading source on i18n, L10N, Unicode, sorting, normalisation and other things having to do with languages, representations and writing.

Besides that he was a really nice guy of which I enjoyed his MSDN materials.

Other people enjoy that too, so I’m glad his writings have been archived: [first archive.is, second archive.is, WayBack] Sorting it All Out: Archives

Here are some additional links:

https://web.archive.org/web/*/https://blogs.msdn.microsoft.com/michkap//*
[WayBack] Sorting out Internationalization with Michael Kaplan on the Hanselminutes Technology Podcast: Fresh Air for Developers: Michael Kaplan is a Developer in the Windows International group and the author of the popular ‘Sorting It Out’ blog that is dedicated it all things ‘-ization.’ That means Globalization, Internationalization, and Localization. This show is is brought to you by the CYRILLIC CAPITAL LETTER A.
- Transcript: [WayBack] https://s3.amazonaws.com/hanselminutes/hanselminutes_0117.pdf
Facebook: Michael S. Kaplan
[WayBack] Michael S. Kaplan (@michkap). Blood type AB+ geek who loves i18n etc. I worked for MS & have MS & I had an iBot. Got game tho ain’t playin much anymore… Eto Akta Gamat!. Seattle, WA, USA
[WayBack] Sorting the rest all Out
[WayBack] RIP Michael J Kaplan (of Sorting It All Out blog) | The VSubhash.com Blog
[WayBack] Excellent blog about Windows and Unicode – The Old New Thing
[WayBack] Michael Kaplan leaves Microsoft

More on miloush.net:

[WayBack] miloush.net Feed Services
[Archive.is 1, Archive.is 2] http://miloush.net/
[WayBack] Keyboard Layout Info – Keyboard Layout Info
[Archive.is] Emoji List
[Archive.is] Skype Emoticons List List of public and hidden emoticons in Skype.

I got there while researching U+200C and U+200D:

The relevant Unicode code points in that research:

[WayBack] javascript – Remove non-ascii character in string – Stack Overflow
[WayBack] Unicode and JavaScript
[WayBack] Fingerprinting with Zero-Width Characters… Kristian Köhntopp – Google+
- Fingerprinting with Zero-Width Characters (does not archive)
- Text Fingerprinting Update (does not archive)

From the G+ thread, a few nice comments:

Quork Q’Tar:
Das heißt, Copy and Paste in Notepad++ und den Text in mehreren Zeichenkodierungen ansehen (bzw., wenn keine Sonderzeichen erforderlich sind (also fast immer), direkt in ASCII konvertieren und dann erst ins Zieldokument copyandpasten) dürfte bis auf die Wortsubstitution (die ja alles andere als neu ist als Methode) eigentlich alles in der Richtung aufdecken?
Jürgen Christoffel:
+Quork Q’Tar nein, nicht copy/paste, sondern ausdrucken und mit OCR wieder einscannen. Ein Bitmap-Scan reicht nicht, der könnte weiterhin erkennbare Glyphen (das kyrillische “a” o.ä.) enthalten.
Tobias Migge:
Beispiel-Text nach Notepad++ kopiert, Erweiterungen->MIME Tools->Quoted Printable Encode:
- We’re=E2=80=8B not the=E2=80=8B same text, even though we look the same.
- We’re not the same=E2=80=8B text, even though we look the same.
Steve S:
+Quork Q’Tar You can paste it into regular Notepad and save as ANSI instead of UTF-8. That strips it out: I tested it just now.
Jeroen Wiert Pluimers:
+Steve S though that kills many other useful characters which depends on your particular ANSI encoding.
Jeroen Wiert Pluimers
It should not be too hard to write a JavaScript web page that – without a round trip – strips a lot of this. Can be even ran from localhost.
Steve S:
+Jeroen Wiert Pluimers Yes, that’s true. Really, the right answer is to feed it through a program to canonicize the text. This includes fixing “typos”, making all of the words either American or British, and so on. Not a trivial task.
(A few years ago, I had to write a small subset of this as part of a program that de-duped email threads, so I’m a bit familiar with the issues.)
Jeroen Wiert Pluimers:
+Steve S that sounds like an interesting project to base such a thing on. Any change to publicise that source? If so: what language?
Jürgen Christoffel:
+Jeroen Wiert Pluimers once upon a time, there was some thing called the “writer’s workbench” for BSD 4.x (or was it AT&T’s?) This might be / have been a good place to start. Don’t remember if it ever wad available in source, though.
Quork Q’Tar
In other words, “few years” doesn’t mean two or three here =D
Steve S:
+Jeroen Wiert Pluimers It was done for hire, so I don’t have any of the code, and wouldn’t own it if I somehow had it. But the basic idea is very simple: for my purposes, only alphanumerics mattered. For “weird” characters, what matters is filtering out the gratuitous punctuation and canonicalizing representations.

–jeroen

via:

This entry was posted on 2018/01/02 at 15:00 and is filed under Ansi, Development, Encoding, internatiolanization (i18n) and localization (l10), Software Development, The Old New Thing, UTF-8, UTF8, Windows Development. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

	#2 on Forensics: Automating Active D…
	A/V Revolution on Link archive: A YouTube video…
	#omdenken on Post by @lookitup.baby (Ian Co…
	xyzzy, Relay Confere… on Sad and Useless about Competit…
	ZaqHydn on MeshCore – Off grid mesh…

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

My Flickr Stream

Pages

All categories

Email Subscription

Michael Kaplan Obituary – Berkowitz-Kumin-Bookatz | Cleveland Heights OH (and a whole bunch of info in zero width Unicode stuff)

Leave a comment Cancel reply

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

My Flickr Stream

Pages

All categories

Email Subscription

Michael Kaplan Obituary – Berkowitz-Kumin-Bookatz | Cleveland Heights OH (and a whole bunch of info in zero width Unicode stuff)

Rate this:

Share this:

Related

Leave a comment Cancel reply