The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

    • RT @SwiftOnSecurity: To be fair, anyone who's tried to use VMware's website would probably rather see it deleted 1 hour ago
    • RT @mmeeuw: “Je mag Joost Eerdmans niet vergelijken Anton Mussert en JoostAnnabel21 niet met de NSB, daar kunnen ze niet tegen, dan worden… 1 hour ago
    • RT @roosvonk: Omdat enkele fabrieken van Tatasteel zo verouderd zijn dat ze extra veel vervuilen, hoeven ze niet te voldoen aan de strenge… 1 hour ago
    • RT @zaagvis: Mijn moestuin is een kilometer van huis en ja ik ga altijd met de auto omdat ik zoveel mee moet slepen, nu dacht ik, als ik ee… 1 hour ago
    • @Suuzer83 War heftig. Ik duim dat het stopt en wns je veel sterkte. 🤗 1 hour ago
  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,733 other followers

Tesseract (software): amazing command-line OCR tool

Posted by jpluimers on 2022/05/13

A twitter post blasted me away by showing the results of Tesseract (software) – Wikipedia doing perfect OCR on an image from a twitter post:

[Wayback] Harrie Baken on Twitter: “Fantastic!… “

curl -s 'https://pbs.twimg.com/media/E9T96Q9XIAcs8xJ?format=jpg&name=large …' -o - | tesseract stdin stdout | grep --color 609

It instantly solved this puzzle:

[Archive.is] Dave Royal 🎧 on Twitter: “Only people with great eyesight can find the intruder GO ON!!!! 🧐… “

Earlier, I quoted a bit of the SikuliX documentation in RaiMan’s SikuliX: Automate what you see on a computer monitor that already mentioned Tesseract, but I looked over it.

It is amazing, and has been around for so long that I felt like living under a stone!

Anyway: it’s available on many platforms, and you can find the source at [Archive.is] tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)

This package contains an OCR engine – libtesseract and a command line program – tesseract. Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (–oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.

While writing, there were various 5.x test releases [Archive.is].

There are wrappers/ports around it in many programming languages, some of which allow a less basic user experience, like for instance a GUI.

Examples of both:

–jeroen


Via:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

 
%d bloggers like this: