The first hit of pdf binary to text – Google Search was [WayBack] binaryfiles – How to convert PDF binary parts into ASCII/ANSI so I can look at it in a text editor? – Stack Overflow has many options including:
- ghostscript
- qpdfut
- mutool (part of MuPDF)
- pdftk
Since I have qpdf installed on most systems:
Another useful tool to transform a PDF into an internal format that enables text editor access is
qpdf
. It is a “command-line program that does structural, content-preserving transformations on PDF files”.Example usage:
qpdf \ --qdf \ --object-streams=disable \ input-with-compressed-objects.pdf \ output-with-expanded-objects.pdf
- The output of the
QDF
-mode enforced by the--qdf
switch organizes and re-orders the objects neatly. It adds comments to track the original object IDs and page content streams. All object dictionaries are written into a “normalized” standard format for easier parsing.- The
--object-streams=disable
causes the extraction of (otherwise not recognizable) individual objects that are compressed into another object’s stream data.
The recompress is easy as per [WayBack] QPDF Manual:
qpdf /tmp/uncompressed.pdf /tmp/compressed.pdf
The answer is by [WayBack] User Kurt Pfeifle – Stack Overflow who has many other interesting PDF related answers at:
Stackoverflow.com:
- My SO answers to [PDF]-tagged questions
- My SO answers to [Ghostscript]-tagged questions
- My SO answers to [ImageMagick]-tagged questions
Superuser.com:
Serverfault.com:
–jeroen