Compress

PDFs are compressed internally, but they might use old techology or have been generated by inefficient PDF writing software, and as a result be much larger than necessary. This tool recompresses PDFs in order to further reduce their file size. Compression ratios vary widely, from 0% to 99%.

The compression algorithm removes useless objects (duplicated or unused), strips off regenerable objects (ASCII filters, thumbnails), and replaces LZW compression with the superior Flate (even on inline images), among other optimizations. By default it retains all non-regenerable information. Further space savings are possible by setting command line options to remove various non-regenerable objects as well or to apply lossy image compression such as JPEG.

Unless the -compatible option is given, PDFs are updated to the current version of PDF, which at this time is PDF 1.5, which corresponds to Acrobat 6. Additional compression is possible in PDF 1.5 with cross reference streams and object streams. With PDF 1.5, one can usually obtain about the same compression as gzip on the whole PDF file, with the advantage that the PDF is directly readable by Acrobat. Note that if the PDF is already PDF 1.5, this option does not back convert it to and earlier version of PDF.

Compact PDF is a new format that can give an additional compression of 30 to 60% on many classes of PDF beyond what is possible in PDF 1.5. For instance, the PDF Reference 1.5 shrinks from 12.2MB as distributed by Adobe down to 4.4MB in Compact format. No information is lost in obtaining the additional compression, in contrast to some methods that throw away structural or other information or use lossy image compression. This format is not presently part of Adobe's PDF specification and cannot be directly read by Acrobat. However, it is fully supported by Multivalent: the PDF viewer in the browser and all tools including full-text search are "Compact-aware", meaning that they transparently view and manipulate Compact PDFs just as easily as standard ones. You can archive PDFs in this format and if you need to read them in non-Compact-aware PDF viewers, you can always convert them back to standard format by rerunning this tool omitting the -compact option. Technical details for developers can be found in the Compact PDF Specification.

Representative Compression Results

Compression results are highly variable, but certain classes or categories of PDFs seem compress approximately to the same degree.
CategoryRepresentative PDFOriginal Sizegzip-compatPDF 1.5Compact-max
pure textAida.pdf85,43719,18733,18429,66115,477 (81% savings)15,461
Compact format compresses all the pages together, rather than as individual pages.
TeX documentpdftex-s.pdf329,601197,483240,793188,14293,529 (71%)93,530
gentlesgml.pdf486,807207,922270,227180,74475,048 (84%)75,040
TeX documents that use Computer Modern fonts typically embed it as an encrypted Type 1 fonts. Compact format decrypts them to make them available to compression for the first time.
FrameMakerJava Language Specification 2.04,419,9061,622,2963,938,6731,534,857829,671 (81%)829,667
FrameMaker generates many named destinations (anchors), which compress well in PDF 1.5 and Compact. FrameMaker will also sometimes write out the page template in each page rather than sharing it, and this compress out in Compact.
reference manualPDF Reference 1.5 (draft)12,765,4167,399,69510,973,6527,247,1364,577,057 (64%)4,438,420
PDF Reference 1.5 (final; in PDF 1.4)14,171,4488,386,00713,184,3588,231,9915,356,277 (62%)5,243,832
PDF Reference 1.5 (final; in PDF 1.5)9,190,2168,377,1798,205,6288,205,6285,350,532 (41%)5,236,604
PDFlib-manual.pdf1,298,364936,1661,166,849896,269619,029 (52%)588,801
Reference manuals are typically dominated by text, which compresses better in Compact because it compresses all pages together.
converted from HTML htmldoc379,568245,630349,335256,767169,704 (55%)169,708
W3C HTML 4.0 specification3,006,205958,1501,727,038936,942468,013 (84%)467,999
tcl8.4.28,135,8923,784,9506,503,7673,648,3701,423,353 (82%)1,423,344
Hyperlinks compress well with PDF 1.5 object streams and Compact.
bookManning JDK 1.410,168,3528,871,9499,777,8388,633,3552,999,054 (70%)2,857,728
UNIX Haters Handbook3,639,1722,803,5463,125,2442,516,2842,068,612 (43%)2,068,607
Real World Go Live18,530,90315,692,40218,015,83217,424,96212,800,412 (30%)12,800,405
pdfxguide.pdf10,243,0708,325,4585,310,3264,897,6944,385,093 (57%)3,371,767
Large books often compress well in percentage and absolute terms, saving much bandwidth for online distribution.
image dominatedp40-marshall.pdf1,762,9451,488,0771,594,2361,584,4871,272,028 (27%)1,272,028
UnixTextProcessing.pdf27,969,20626,212,37527,278,92626,733,53023,229,702 (16%)23,229,695
beos_osx.pdf1,247,4571,225,1791,155,4131,145,1631,103,550 (11%)499,557
By default images are not recompressed. With the -jpeg option, JPEG compression is applied to raw image samples, which explains the drastic compression for beos_osx in the -max column. If you have lots of scanned images, Adobe Acrobat 6 can compress them considerably with JBIG2 or JPEG2000; for instance, UnixTextProcessing compresses down to about 9MB with JBIG2.
high quality PDF generatorJDJ 1-01.pdf3,236,5073,072,9823,213,3353,155,2622,924,454 (9%)2,729,700
isaacs_installconfig.pdf3,518,4153,306,6673,169,7243,100,9572,445,007 (30%)2,368,293
Sometimes only a little compression is available, but isaacs uses the latest libraries (Adobe InDesign 2.0.2, Adobe PDF Library 5.0) and still compresses well.

Options

java tool.pdf.Compress [options] PDF-file(s)
Some large PDFs need more than the 64MB that Java limits itself to by default. Additional memory can also sometimes speed up compression. Most PDFs compress in a few seconds, but a small percentage need up to a few minutes for the most advanced Compact mode. If Compress stops with an OutOfMemoryError or takes more than two minutes on a given PDF (most take less than 10 seconds), try boosting memory, as in:
java -Xmx128m tool.pdf.Compress ...
The new PDF file has the same name as the original with the addtion of of -o before the .pdf suffix.

Note: PDFs lose their "linearization" or "Fast Web View" organization. Use another tool to recompute it if desired.