multivalent.std.adaptor.pdf

Class PDFWriter

public class PDFWriter extends COSSource

Write new PDF file from low-level data structures.

How to use this class

Either start with an existing PDF or start from stratch with the appropriate construction. Then add or modify objects, as informed by the Adobe's PDF Reference.

When done, simply writePDF. Any objects not set in this class will be faulted in from the backing PDFReader, if any. Some useful PDF manipulations, such as replacing CCITT FAX images with JBIG2, just need to modify a few objects; for this the convenience method writePDF will invoke the caller before every object, at which time it can be modified before writing.

To encrypt, just set up a PDF encryption dictionary according to the PDF Reference (don't forget point to it in the trailer). See tool.pdf.Encrypt for an example. If there is a backing PDF then by default the PDF to be written inherits it encryption settings, if any. Encryption must be set up before writing (writePDF or writeHeader).

To write in Compact format, just set up dictionary according to the Compact PDF Specification.

For greater control, as for PDF concatenation and excerpting and other applications, write parts separately: writeHeader, PDFWriter for each object, PDFWriter

	PDFWriter pdfw = new PDFWriter(...);
	pdfw.writeHeader();
	pdfw.writeObject();
	// ...
	pdfw.close();
	

See Also

Version: $Revision: 1.57 $ $Date: 2003/08/29 03:47:03 $

Field Summary
static intPDFOBJ_OVERHEAD
static intPDFOBJREF_OVERHEAD
Constructor Summary
PDFWriter(File file, PDFReader base)
Modify an existing PDF.
PDFWriter(File file, PDFReader base, boolean incremental)
PDFWriter(File file)
Create a PDF from scratch.
Method Summary
voidaddFilter(Dict stream, String filter, Object parms)
Prepends filter to stream.
intaddObject(Object obj)
Adds object by reusing number of deleted object if possible, or else appending to end.
voidclose()
byte[]deflateStream(Dict stream, int number, int generation)
If data would be smaller with Flate compression applied, apply it, set /Filter /FlateDecode and /Length, and return compressed data.
ObjectgetCache(int objnum)
Low-level retrieval from table of instantiated objects: if object is not present, null is returned.
DictgetCatalog()
Returns document /Catalog.
intgetMajorVersion()
intgetMinorVersion()
intgetObjCnt()
Returns number of objects.
ObjectgetObject(int objnum, boolean fcache)
Objects are read on demand from the backing PDFReader, if any.
ObjectgetObject(Object ref)
Object[]getObjects()
Return array of all objects currently read in base PDFReader or explicitly set by client code.
intgetObjGen(int objnum)
bytegetObjType(int objnum)
RandomAccessFilegetRAF()
DictgetTrailer()
Document trailer.
voidliftPageTree()
Removes unnecessarily duplicated inherited attributes in page tree.
booleanmakeObjectStreams(int start, int end)
Collect non-stream objects into compressed object streams (introduced in PDF 1.5), in groups of 200 or so.
static byte[]maybeDeflateData(byte[] data)
Deflates data, if compressed size is smaller than original.
booleanobjEquals(Object o1, Object o2)
Deep equality testing, recursing through arrays and dictionarys and one level of indirect references.
voidreadAllObjects()
Read all remaining objects from backing PDFReader that have not already been read or set by PDFWriter.
int[]refcnt()
Reference count PDF objects to see how many times (and if) an object is used.
intrefcntRemove()
Reference count and remove unused objects.
voidremoveFilter(Dict stream, String filter)
Removes filter and associated DecodeParms from stream.
voidrenumber(int[] newnum)
Renumbers IRef's according to newnum[] by descending through object tree (rooted at
int[]renumberRemove(int[] newnum)
Descends through object tree (rooted at Trailer), renumbering IRef's according to newnum[] and removing unused objects.
voidresetPageTree(List<IRef> leaves)
Rebalances page tree so that each internal node tries to have 20 children and none has no more than 20 children.
voidsetCompress(boolean b)
Compress objects, or not (for debugging or pedagogical purposes).
voidsetExact(boolean b)
If false (the default), unpacks objects from objects streams and report object streams themselves as OBJECT_NULL.
voidsetMinVersion(int major, int minor)
If using a PDF feature introduced since PDF 1.0, use this method to ensure that the document header is correct.
voidsetMonitor(boolean b)
Shows status information.
voidsetObjCnt(int newcnt)
Truncate object list or allocate space for more (addObject automatically allocate space as needed too).
voidsetObject(int num, Object obj)
Set an object to null to take from base PDFReader.
voidsetObjGen(int objnum, int newgen)
voidsetPassword(String password)
Provide either owner or user password for encryption, if any.
byte[]writeCommandArray(Cmd[] cmds, boolean prettyprint)
Writes command array back into a byte stream, skipping commands marked invalid.
voidwriteHeader()
Writes document header: "%PDF-m.n\n%byte/byte/byte/byte\n".
StringBufferwriteInlineImage(Dict params, byte[] data, StringBuffer sb)
Writes inline image with image data into content stream sb.
longwriteObject(Object obj, int number, int generation)
Writes a top-level object: n g obj contents endobj, with applicable encryption, respecting CryptFilter, if any.
longwriteObject(Object obj, int number, int generation, boolean fplain, Encrypt encrypt)
Low-level write of a top-level object: n g obj contents endobj, encrypting according to encrypt, which can be null for no encryption.
StringBufferwriteObject(Object o, StringBuffer sb, boolean fcrunch)
Writes contents of passed PDF object to StringBuffer that represents a content stream.
longwritePDF(Observer observer)
Writes data in memory or base PDF to new PDF file, complete with header, xref table, and trailer.
longwritePDF()
Convience method for writePDF(null).
voidwriteXref(Dict trailer, int size, long prev, long[] offset, int start, int length)
Writes cross reference table and trailer.
voidwriteXref(Dict trailer, int size, long prev, long[] offset, int[] start, int[] length)
Writes cross reference of multiple sections and trailer.

Field Detail

PDFOBJ_OVERHEAD

public static final int PDFOBJ_OVERHEAD

PDFOBJREF_OVERHEAD

public static final int PDFOBJREF_OVERHEAD

Constructor Detail

PDFWriter

public PDFWriter(File file, PDFReader base)
Modify an existing PDF. If the source PDF is encrypted, it should have its password set before passing to PDFWriter. base is considered owned by new PDFWriter, and clients should only operate through PDFWriter; closing the PDFWriter closes the PDFReader and should be the only way a backing PDFReader is closed.

PDFWriter

public PDFWriter(File file, PDFReader base, boolean incremental)

PDFWriter

public PDFWriter(File file)
Create a PDF from scratch. Automatically creates ID in trailer and Catalog and Info dictionaries.

Method Detail

addFilter

public void addFilter(Dict stream, String filter, Object parms)
Prepends filter to stream.

addObject

public int addObject(Object obj)
Adds object by reusing number of deleted object if possible, or else appending to end.

Returns: object number assigned

close

public void close()

deflateStream

public byte[] deflateStream(Dict stream, int number, int generation)
If data would be smaller with Flate compression applied, apply it, set /Filter /FlateDecode and /Length, and return compressed data. If data would be larger, set /Length so don't try and fail to deflate twice, and return original data.

Returns: null if dictionary is not a stream

getCache

public Object getCache(int objnum)
Low-level retrieval from table of instantiated objects: if object is not present, null is returned.

getCatalog

public Dict getCatalog()
Returns document /Catalog.

getMajorVersion

public int getMajorVersion()

getMinorVersion

public int getMinorVersion()

getObjCnt

public int getObjCnt()
Returns number of objects.

getObject

public Object getObject(int objnum, boolean fcache)
Objects are read on demand from the backing PDFReader, if any. If object is a stream, the stream content is read in, uncompressed, and stored under the dictionary key STREAM_DATA. Objects typically requires about 10 times as many bytes in memory as on disk. Valid object numbers are 0 .. getObjCnt-1.

Parameters: fcache true if should object be cached

getObject

public Object getObject(Object ref)

getObjects

public Object[] getObjects()
Return array of all objects currently read in base PDFReader or explicitly set by client code. Usually callers want to first instantiate all objects by invoking readAllObjects. Invoking this method takes control of the objects from PDFWriter, and results in losing any deleted object chain.

getObjGen

public int getObjGen(int objnum)

getObjType

public byte getObjType(int objnum)

getRAF

public RandomAccessFile getRAF()

getTrailer

public Dict getTrailer()
Document trailer. To change the trailer, such as the /ID array, mutate this object.

liftPageTree

public void liftPageTree()
Removes unnecessarily duplicated inherited attributes in page tree. Removes attributes set to default, such as CropBox same as MediaBox. Sometimes makes a big difference, sometimes no difference.

makeObjectStreams

public boolean makeObjectStreams(int start, int end)
Collect non-stream objects into compressed object streams (introduced in PDF 1.5), in groups of 200 or so. Should be last method invoked before starting to write PDF objects to a file.

Other algorithms that make object streams represent this adding the object streams as dictionaries, storing their component objects in the STREAM_DATA dictionary key, and replacing the old copies of component objects with a number of class CLASS_OBJSTMC that holds the object number of the object stream and setting its generation to the index of the object in the object stream.

maybeDeflateData

public static byte[] maybeDeflateData(byte[] data)
Deflates data, if compressed size is smaller than original.

objEquals

public boolean objEquals(Object o1, Object o2)
Deep equality testing, recursing through arrays and dictionarys and one level of indirect references.

readAllObjects

public void readAllObjects()
Read all remaining objects from backing PDFReader that have not already been read or set by PDFWriter. Same effect as invoking PDFWriter on all objects in backing PDF.

refcnt

public int[] refcnt()
Reference count PDF objects to see how many times (and if) an object is used.

Returns: array such that array[objnum] = reference count.

refcntRemove

public int refcntRemove()
Reference count and remove unused objects.

Returns: number of unused objects

removeFilter

public void removeFilter(Dict stream, String filter)
Removes filter and associated DecodeParms from stream.

renumber

public void renumber(int[] newnum)
Renumbers IRef's according to newnum[] by descending through object tree (rooted at the Trailer), where new numbers can map to any number and no objects are removed from the object table. Caller retains responsibility to correctly number the objects (n g obj ... endobj) on writing a new PDF.

renumberRemove

public int[] renumberRemove(int[] newnum)
Descends through object tree (rooted at Trailer), renumbering IRef's according to newnum[] and removing unused objects. Assumes that object numbers lie within 0..getObjCnt(), and that a renumbered object is obsolete and deleted in favor of the referenced object; thus some objects will always be removed or there was no use invoking this method. Shrinks the object tables by the number of unused objects, and renumbering is adjusted by moved object positions.

Parameters: newnum is mutated so that object id's point to positions in collapsed array

Returns: offsets array updated to match new object numbers so, in addition to objs_, callers can update parallel arrays for (int i=0; i

See Also: PDFWriter

resetPageTree

public void resetPageTree(List<IRef> leaves)
Rebalances page tree so that each internal node tries to have 20 children and none has no more than 20 children. Subsequently, Sometime before writing, the caller probably wants to invoke PDFWriter.

Parameters: leaves holds all the pages, in sequence, with all of their attributes explicit (not relying on inheritance from a parent). A convenient way to accumulate this list is to read IRefs from PDFReader and make attributes explicit with PDFReader.

setCompress

public void setCompress(boolean b)
Compress objects, or not (for debugging or pedagogical purposes). Also writes COS objects with minimal number of spaces, or to be more readable by humans. Images are always compressed in image-specific formats regardless of this setting, and compression of other objects, if requested, is always with Flate compression (not LZW).

setExact

public void setExact(boolean b)
If false (the default), unpacks objects from objects streams and report object streams themselves as OBJECT_NULL. If true, reports objects as seen from backing PDFReader, with objects inside object streams given as a java.lang.Long giving their index within the stream.

See Also: PDFReader

setMinVersion

public void setMinVersion(int major, int minor)
If using a PDF feature introduced since PDF 1.0, use this method to ensure that the document header is correct. (There are no setMajorVersion/setMinorVersion methods to set an arbitrary value since this might be less than the version necessary to reflect the features of PDF in use.) If you don't care about compatibility with older PDF readers, you can simply set this to the current level of PDF (1.5 as of this writing) at the start. The minimum version settable is PDF 1.2 (Acrobat 3.0). This number only goes up, so if you want something else, write your own header.

setMonitor

public void setMonitor(boolean b)
Shows status information.

setObjCnt

public void setObjCnt(int newcnt)
Truncate object list or allocate space for more (addObject automatically allocate space as needed too).

setObject

public void setObject(int num, Object obj)
Set an object to null to take from base PDFReader. Do not use IRefs as top-level objects. Do not set an object beyond getObjCnt, exclusive; first extend the set of objects with PDFWriter. To delete an object, set it to OBJECT_DELETED.

setObjGen

public void setObjGen(int objnum, int newgen)

setPassword

public void setPassword(String password)
Provide either owner or user password for encryption, if any.

writeCommandArray

public byte[] writeCommandArray(Cmd[] cmds, boolean prettyprint)
Writes command array back into a byte stream, skipping commands marked invalid. (Doesn't minimize whitespace as done for top-level objects, since compression in streams wipes out advantages.)

writeHeader

public void writeHeader()
Writes document header: "%PDF-m.n\n%byte/byte/byte/byte\n". After this point, no changes can be made to the encryption settings.

writeInlineImage

public StringBuffer writeInlineImage(Dict params, byte[] data, StringBuffer sb)
Writes inline image with image data into content stream sb.

See Also: readInlineImage

writeObject

public long writeObject(Object obj, int number, int generation)
Writes a top-level object: n g obj contents endobj, with applicable encryption, respecting CryptFilter, if any.

writeObject

public long writeObject(Object obj, int number, int generation, boolean fplain, Encrypt encrypt)
Low-level write of a top-level object: n g obj contents endobj, encrypting according to encrypt, which can be null for no encryption. in that follows encrypt setting, ignoring any CryptFilter. Content streams should pass their data streams as a byte[] under the STREAM_DATA key. If object is a stream and no filter has been applied, applies Flate compression if that results in a smaller object.

Returns: file offset of start of object, or 0 if the object has been deleted or object number is 0 (which is a special number reserved by PDF).

writeObject

public StringBuffer writeObject(Object o, StringBuffer sb, boolean fcrunch)
Writes contents of passed PDF object to StringBuffer that represents a content stream.

Returns: same StringBuffer passed in

writePDF

public long writePDF(Observer observer)
Writes data in memory or base PDF to new PDF file, complete with header, xref table, and trailer. If a non-null observer is passed in, it is invoked after fully reading but before writing each PDF object, with the PDF object passed as the argument to the observer. If asked to write onto existing file, first writes to temporary file, then deletes and renames. If an object has not been instantiated, it is instantiated, written, and then cleared in order to free up memory and thus allow PDFs of any size to be processed in a limited amount of memory.

Completely control writing by writing own version of this method with these steps:

  1. writeHeader
  2. PDFWriters, keeping track of file offset for xref table
  3. PDFWriter table

If PDF version is 1.5 or later, writes trailer and xref as a stream, adding an object for it with the highest object number. If object streams are desired, they must be created beforehand, as by invoking PDFWriter.

If Compact PDF writing mode is set writes entire PDF in format that puts almost every object in a single BZip2 or Flate stream. It is more 30 to 60% more compact on large classes of PDF, but does not conform to the PDF 1.5 specification. It is readable with the PDFReader class, the Multivalent Browser, and the Multivalent tools; and it is valid PDF so you can do incremental writes and so on. See Compact PDF format.

After writing, objects may have been mutated or deleted, and therefore should not be accessed.

Returns: length of new PDF file

writePDF

public long writePDF()
Convience method for writePDF(null).

writeXref

public void writeXref(Dict trailer, int size, long prev, long[] offset, int start, int length)
Writes cross reference table and trailer. If PDF version is 1.5 or later and trailer can be found among existing objects, then the cross reference table is written as a stream, which is compressible. As separate method so can write in chunks for Linearized. If an object is a component of an object stream, it should have been set as described in PDFWriter.

Parameters: size total number of objects in the file (not just in current xref segment); usually same as getObjCnt

Returns: file position of start of Xref table.

writeXref

public void writeXref(Dict trailer, int size, long prev, long[] offset, int[] start, int[] length)
Writes cross reference of multiple sections and trailer.