A while ago I bumped into [Wayback/Archive] Unicode weirdness – VCL – Delphi-PRAXiS [en].
This sketched a mojibake problem where PDF to text converted files had odd looking character sequences.
The solution – replacing these sequences with more correctly looking text – worked at first, but then failed because the underlying source code got “corrected” from containing the Mojibake character sequences into the correct Unicode text.
A better solution is to figure out what series of encoding/decoding steps will give the correct text.
This is where – again – [Wayback/Archive] Home – ftfy: fixes text for you comes up: a still indispensable tool.
–jeroen





