Uncompress for Hand Editing or Examination

The written file leaves content streams uncompressed and available for inspection or hand editing. With reference to Adobe's PDF Reference, available online, you can arbitrarily change the PDF, anything from correcting bad OCR, to fixing typos on pages without having the generating application (text is not reflowed), to adding title and keywords, to authoring annotations to diagnosing problems.

Uncompressed content streams are pretty printed to better show structure and objects are labelled with the page numbers they're used on. For Western languages, it is straightforward to identify the character strings and edit them. One must be careful to edit with a text editor, such as Emacs, that can handle binary data and does not translate unfamiliar characters or line endings.

The edited file written by the text editor should be passed through the Compress tool to recompress the streams and rebuild the cross-reference table.

Options

java tool.pdf.Uncompress [options] PDF-file

Sometimes, but very rarely, a PDF has content stream that is highly compressed, which means when uncompressed it is very large and leads to a java.lang.OutOfMemoryError. In this case, try giving more than the default 64MB of memory to Java, as in the following which gives 256MB. If the error persists, try the -exact, which is more memory efficient.

java -Xmx192m tool.pdf.Uncompress [options] PDF-file
The uncompressed version of the PDF is named with -u appended to the original file name.

Example

java tool.pdf.Uncompress Tekton.pdf
produces a file named Tekton.pdf with the following content
%PDF-1.3
%
% page 2
1 0 obj<< /Type /Page /Thumb 204 0 R /MediaBox [0 0 612 792] /Rotate 0 /Contents 3 0 R /CropBox [0 0 612 792] /Resources 2 0 R /Parent 267 0 R>>
endobj
% used by page 2
2 0 obj<< /ExtGState << /GS1 285 0 R>> /Font << /F4 95 0 R /F3 283 0 R>> /ColorSpace << /Cs9 96 0 R /Cs6 276 0 R>>>>
endobj
% used by page 2
3 0 obj<< /Length 235>>stream

BT
   /F3 1 Tf  100 0 0 100 277.45 399.192 Tm  /Cs9 cs  0.05 scn  /GS1 gs  0 Tc  0 Tw  (a) Tj
   /F4 1 Tf  9 0 0 9 304.007 57.88 Tm  /Cs6 cs  0 0 0 scn  <01> Tj
   /F3 1 Tf  100 0 0 100 277.45 399.192 Tm  /Cs9 cs  0.05 scn  (a) Tj
ET

endstream
endobj
% page 3
4 0 obj<< /Type /Page /Thumb 206 0 R /MediaBox [0 0 612 792] /Rotate 0 /Contents 6 0 R /CropBox [0 0 612 792] /Resources 5 0 R /Parent 267 0 R>>
endobj
% used by page 3
5 0 obj<< /ExtGState << /GS1 285 0 R>> /Font << /F4 95 0 R /F2 277 0 R>> /ColorSpace << /Cs8 284 0 R /Cs6 276 0 R>>>>
endobj
% used by page 3
6 0 obj<< /Length 948>>stream

BT
   /F2 1 Tf  16 0 0 16 144 651.36 Tm  /Cs6 cs  0 0 0 scn  /GS1 gs  -0.0002 Tc  0.0002 Tw  [(An Adob)10.7(e)]TJ
   6 0 0 6 203.247 658.36 Tm  0 Tc  0 Tw  <a8> Tj
   16 0 0 16 207.969 651.36 Tm  -0.0001 Tc  -0.0999 Tw  [( Original)]TJ
   30 0 0 30 144 554.36 Tm  -0.01 Tc  0 Tw  [(T)74(ekton)]TJ
   7 0 0 7 220.679 567.36 Tm  0 Tc  <a8> Tj
   30 0 0 30 226.048 554.36 Tm  -0.0487 Tc  -0.0013 Tw  [( Pr)-15.7(o)]TJ
   13 0 0 13 144 535.36 Tm  /Cs8 cs  1 scn  -0.0001 Tc  0.0133 Tw  [(an inf)19.9(o)-0.1(rmal multi-purp)10.9(ose t)6.9(y)-0.1(p)10.9(eface family)]TJ
   18 0 0 18 144 430.36 Tm  /Cs6 cs  0 0 0 scn  0 Tc  0 Tw  (         ) Tj
   9 0 0 9 144 144.36 Tm  <a9> Tj
   0.787 0.1111 TD  ( ) Tj
   /F4 1 Tf  0.188 0 TD  0.0001 Tc  <01020202> Tj
   /F2 1 Tf  1.9579 0 TD  -0.0002 Tc  0.0002 Tw  [( Adob)11.1(e Syst)7.7(ems Incorp)10.8(or)21.7(at)8(ed. All rights r)22.7(e)0(serv)19.8(ed.)]TJ
   48 0 0 48 144 87.76 Tm  0 Tc  0 Tw  (  ) Tj
ET

endstream
endobj
...