TransWikia.com

How can I compress this pdf file without losing readability?

Ebooks Asked on March 22, 2021

I have a large black and white pdf scan of a book. The book is 1700 pages, and is about 300MB. That is about 150-175kb per page. This strikes me as rather large. Two arbitrarily selected pages are attached to illustrate; they are each about 330kb. Running them through ImageMagick’s compressor gives pages which are about 240kb, which I think is still too high. The best I’ve been able to get is about 33% smaller. Very good but still very bad in absolute terms.

I believe the book is simply a bunch of .png’s stitched together with no more information than that. I have tried converting to a lower dpi, but it really decreases readability beyond 300. I am confident there is to be a way to convert these files to some kind of document (either .pdf or .djvu), but I don’t know how to do this. I tried converting to .tif and that was very useful.

I have done some research and it seems that it might be helpful to run these images through some kind of OCR text recognition software, or otherwise construct "vectorized" text. Can anyone help me with a recommendation of an open source tool for this and some instructions on how to get started?

enter image description here

2 Answers

If you want to maintain the content integrity of your document, I advise against OCR. I think most OCR engines would have a lot of trouble vectorizing your content, due to the heavy use of symbols, mixed use of italics, bold and normal fonts, and complex formatting.

If you simply want to make the final PDF file smaller, you could always compress it using a simple file compression utility available on your computer. For example, within Windows, you could right-click on your file and select "Send to" --> "Compressed (zipped) folder". I've found that does a fairly good job of compressing PDFs. Of course, the downside is that you'll need to uncompress it when you want to read your PDF.

Answered by Scott Bordelon on March 22, 2021

Try searching for compressing PDFs in general. I personally use these:

pdftk in.pdf output out.pdf compress

Or:

pdf-compress-gray () {
    local input="${1}"
    local out="${2:-${input:r}_cg.pdf}"
    local dpi="${pdf_compress_gray_dpi:-150}"
    gs -q -dNOPAUSE -dBATCH -dSAFER -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -dOverrideICC -dDownsampleColorImages=true -dDownsampleGrayImages=true -dPDFSETTINGS=/screen -dColorImageDownsampleType=/Bicubic -dColorImageResolution=$dpi -dGrayImageDownsampleType=/Bicubic -dGrayImageResolution=$dpi -dMonoImageDownsampleType=/Bicubic -dMonoImageResolution=$dpi -sOutputFile="$out" "$input"
}

Answered by HappyFace on March 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP