Multivalent API

multivalent.std.adaptor.pdf
Class PDFWriter

java.lang.Object
  extended by multivalent.std.adaptor.pdf.COSSource
      extended by multivalent.std.adaptor.pdf.PDFWriter

public class PDFWriter
extends COSSource

Write new PDF file from low-level data structures.

How to use this class

Either start with an existing PDF or start from stratch with the appropriate construction. Then add or modify objects, as informed by the Adobe's PDF Reference.

When done, simply writePDF(). Any objects not set in this class will be faulted in from the backing PDFReader, if any. Some useful PDF manipulations, such as replacing CCITT FAX images with JBIG2, just need to modify a few objects; for this the convenience method writePDF(Observer) will invoke the caller before every object, at which time it can be modified before writing.

To encrypt, just set up a PDF encryption dictionary according to the PDF Reference (don't forget point to it in the trailer). See tool.pdf.Encrypt for an example. If there is a backing PDF then by default the PDF to be written inherits it encryption settings, if any. Encryption must be set up before writing (writePDF() or writeHeader()).

To write in Compact format, just set up dictionary according to the Compact PDF Specification.

For greater control, as for PDF concatenation and excerpting and other applications, write parts separately: writeHeader(), writeObject(Object, int, int) for each object, writeXref(Dict,int,long, long[], int, int)

        PDFWriter pdfw = new PDFWriter(...);
        pdfw.writeHeader();
        pdfw.writeObject();
        // ...
        pdfw.close();
        

Notes

Version:
$Revision: 1.72 $ $Date: 2005/07/21 16:03:02 $

Field Summary
static java.text.NumberFormat NF
          Floating point formatter that matches PDF limits.
static int PDFOBJ_OVERHEAD
           
static int PDFOBJREF_OVERHEAD
           
 
Constructor Summary
PDFWriter(java.io.File file, PDFReader base, boolean incremental)
           
PDFWriter(com.pt.io.OutputUni ou)
          Creates a PDF from scratch.
PDFWriter(com.pt.io.OutputUni ou, PDFReader base)
          Convenience for new PDFWriter(out, base, false).
PDFWriter(com.pt.io.OutputUni ou, PDFReader base, boolean incremental)
          Creates a new PDF based on an existing PDF.
 
Method Summary
 void addFilter(Dict stream, java.lang.String filter, java.lang.Object parms)
          Prepends filter to stream.
 IRef addObject(java.lang.Object obj)
          Adds object by reusing number of deleted object if possible, or else appending to end.
 void close()
          Closes PDFWriter and associated File or OutputStreamTee.
 void convertType1(java.lang.String subformat)
          Convert embedded Type 1 fonts, if any, to a different format.
 byte[] deflateStream(Dict stream, int objnum)
          If data would be smaller with Flate compression applied, apply it, set /Filter /FlateDecode and /Length, and return compressed data.
 java.lang.Object getCache(int objnum)
          Low-level retrieval from table of instantiated objects: if object is not present, null is returned.
 Dict getCatalog()
          Returns document /Catalog.
 Dict getInfo()
           
 int getObjCnt()
          Returns number of objects.
 java.lang.Object getObject(int objnum, boolean fcache)
          Objects are read on demand from the backing PDFReader, if any.
 java.lang.Object getObject(java.lang.Object ref)
           
 java.lang.Object[] getObjects()
          Return array of all objects currently read in base PDFReader or explicitly set by client code.
 int getObjGen(int objnum)
           
 byte getObjType(int objnum)
           
 com.pt.io.OutputStreamTee getOutputStream()
          For expert use in special cases.
 PDFReader getReader()
           
 byte[] getStreamData(java.lang.Object obj)
           
 Dict getTrailer()
          Document trailer.
 java.net.URI getURI()
           
 phelps.util.Version getVersion()
          Returns the major version of PDF used; for example, for PDF 1.4.
 void liftPageTree()
          Removes unnecessarily duplicated inherited attributes in page tree.
 boolean makeObjectStreams(int start, int end)
          Collect non-stream objects into compressed object streams (introduced in PDF 1.5), in groups of 200 or so.
static byte[] maybeDeflateData(byte[] data)
          Deflates data, if compressed size is smaller than original.
 boolean objEquals(java.lang.Object o1, java.lang.Object o2)
          Deep equality testing, recursing through arrays and dictionarys and one level of indirect references.
 void readAllObjects()
          Read all remaining objects from backing PDFReader that have not already been read or set by setObject(int, Object).
 int[] refcnt()
          Reference count PDF objects to see how many times (and if) an object is used.
 int refcntRemove()
          Reference count and remove unused objects.
 void removeFilter(Dict stream, java.lang.String filter)
          Removes filter and associated DecodeParms from stream.
 void renumber(int[] newnum)
          Renumbers IRef's according to newnum[] by descending through object tree (rooted at
 int[] renumberRemove(int[] newnum)
          Descends through object tree (rooted at Trailer), renumbering IRef's according to newnum[] and removing unused objects.
 void resetPageTree(java.util.List<IRef> leaves)
          Rebalances page tree so that each internal node tries to have 20 children and none has no more than 20 children.
 void setCompress(boolean b)
          Compress objects, or not (for debugging or pedagogical purposes).
 void setExact(boolean b)
          If false (the default), unpacks objects from objects streams and report object streams themselves as COS.OBJECT_NULL.
 void setMonitor(boolean b)
          Shows status information.
 void setObjCnt(int newcnt)
          Truncate object list or allocate space for more (addObject(Object) automatically allocate space as needed too).
 void setObject(int num, java.lang.Object obj)
          Set an object to null to take from base PDFReader.
 void setObjGen(int objnum, int newgen)
           
 void setPassword(java.lang.String password)
          Provide either owner or user password for encryption, if any.
 byte[] writeCommandArray(Cmd[] cmds, boolean prettyprint)
          Writes command array back into a byte stream, skipping commands marked invalid.
 void writeFDF()
          Writes contents in Forms Data Format (FDF).
 void writeHeader()
          Writes document header: "%PDF-m.n\n%byte/byte/byte/byte\n".
 java.lang.StringBuffer writeInlineImage(Dict params, byte[] data, java.lang.StringBuffer sb)
          Writes inline image with image data into content stream sb.
 long writeObject(java.lang.Object obj, int objnum, int objgen)
          Writes a top-level object: n g obj contents endobj, with applicable encryption, respecting CryptFilter, if any.
 long writeObject(java.lang.Object obj, int objnum, int objgen, boolean fplain, Encrypt encrypt)
          Low-level write of a top-level object: n g obj contents endobj, encrypting according to encrypt, which can be null for no encryption.
 java.lang.StringBuffer writeObject(java.lang.Object o, java.lang.StringBuffer sb, boolean fcrunch)
          Writes contents of passed PDF object to StringBuffer that represents a content stream.
 java.lang.Object writePDF()
          Convience method for writePDF(null).
 java.lang.Object writePDF(java.util.Observer observer)
          Writes data in memory or base PDF to new PDF file, complete with header, xref table, and trailer.
 void writeXref(Dict trailer, int size, long prev, long[] offset, int[] start, int[] length)
          Writes cross reference of multiple sections and trailer.
 void writeXref(Dict trailer, int size, long prev, long[] offset, int start, int length)
          Writes cross reference table and trailer.
 
Methods inherited from class multivalent.std.adaptor.pdf.COSSource
connected, getDecodeParms, getObjInt
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PDFOBJ_OVERHEAD

public static final int PDFOBJ_OVERHEAD

PDFOBJREF_OVERHEAD

public static final int PDFOBJREF_OVERHEAD

NF

public static final java.text.NumberFormat NF
Floating point formatter that matches PDF limits.

Constructor Detail

PDFWriter

public PDFWriter(com.pt.io.OutputUni ou,
                 PDFReader base,
                 boolean incremental)
          throws java.io.IOException
Creates a new PDF based on an existing PDF. Modify based on an existing PDF. If the source PDF is encrypted, it should have its password set before passing to PDFWriter. Objects are shared between this object and its backing PDFReader, so mutated objects here are mutated in PDFReader too. Writes PDF to ou, with unmutated objects taken from base, writing incrementally iff incremental is true or else to a new file. To write to a pipe (that is, a PDFReader) pass null for ou.

The output is allowed to be the same as the input base if writing to a File.

Throws:
java.io.IOException

PDFWriter

public PDFWriter(com.pt.io.OutputUni ou,
                 PDFReader base)
          throws java.io.IOException
Convenience for new PDFWriter(out, base, false).

Throws:
java.io.IOException

PDFWriter

public PDFWriter(java.io.File file,
                 PDFReader base,
                 boolean incremental)
          throws java.io.IOException
Throws:
java.io.IOException

PDFWriter

public PDFWriter(com.pt.io.OutputUni ou)
          throws java.io.IOException
Creates a PDF from scratch. Automatically creates ID in trailer and Catalog and Info dictionaries. Existing output file, if any, is overwritten.

Throws:
java.io.IOException
Method Detail

setExact

public void setExact(boolean b)
If false (the default), unpacks objects from objects streams and report object streams themselves as COS.OBJECT_NULL. If true, reports objects as seen from backing PDFReader, with objects inside object streams given as a Long giving their index within the stream.

See Also:
PDFReader.setExact(boolean)

setPassword

public void setPassword(java.lang.String password)
Provide either owner or user password for encryption, if any.


getReader

public PDFReader getReader()

getURI

public java.net.URI getURI()

getOutputStream

public com.pt.io.OutputStreamTee getOutputStream()
For expert use in special cases.


getVersion

public phelps.util.Version getVersion()
Description copied from class: COSSource
Returns the major version of PDF used; for example, for PDF 1.4.

Specified by:
getVersion in class COSSource

getTrailer

public Dict getTrailer()
Document trailer. To change the trailer, such as the /ID array, mutate this object.

Specified by:
getTrailer in class COSSource

getCatalog

public Dict getCatalog()
                throws java.io.IOException
Returns document /Catalog.

Specified by:
getCatalog in class COSSource
Throws:
java.io.IOException

getInfo

public Dict getInfo()
             throws java.io.IOException
Throws:
java.io.IOException

getObjCnt

public int getObjCnt()
Returns number of objects.

Specified by:
getObjCnt in class COSSource

setObjCnt

public void setObjCnt(int newcnt)
Truncate object list or allocate space for more (addObject(Object) automatically allocate space as needed too).


addObject

public IRef addObject(java.lang.Object obj)
Adds object by reusing number of deleted object if possible, or else appending to end.

Returns:
object number assigned

setObject

public void setObject(int num,
                      java.lang.Object obj)
Set an object to null to take from base PDFReader. Do not use IRefs as top-level objects. Do not set an object beyond getObjCnt(), exclusive; first extend the set of objects with setObjCnt(int). To delete an object, set it to COS.OBJECT_DELETED.


getObject

public java.lang.Object getObject(int objnum,
                                  boolean fcache)
                           throws java.io.IOException
Objects are read on demand from the backing PDFReader, if any. If object is a stream, the stream content is read in, uncompressed, and stored under the dictionary key COS.STREAM_DATA. Objects typically requires about 10 times as many bytes in memory as on disk. Valid object numbers are 0 .. getObjCnt()-1.

Parameters:
fcache - true if should object be cached
Throws:
java.io.IOException

getCache

public java.lang.Object getCache(int objnum)
Low-level retrieval from table of instantiated objects: if object is not present, null is returned.


getObject

public java.lang.Object getObject(java.lang.Object ref)
                           throws java.io.IOException
Specified by:
getObject in class COSSource
Throws:
java.io.IOException

getStreamData

public byte[] getStreamData(java.lang.Object obj)
                     throws java.io.IOException
Throws:
java.io.IOException

readAllObjects

public void readAllObjects()
                    throws java.io.IOException
Read all remaining objects from backing PDFReader that have not already been read or set by setObject(int, Object). Same effect as invoking getObject(int, boolean) on all objects in backing PDF.

Throws:
java.io.IOException

getObjects

public java.lang.Object[] getObjects()
Return array of all objects currently read in base PDFReader or explicitly set by client code. Usually callers want to first instantiate all objects by invoking readAllObjects(). Invoking this method takes control of the objects from PDFWriter, and results in losing any deleted object chain.


setObjGen

public void setObjGen(int objnum,
                      int newgen)

getObjGen

public int getObjGen(int objnum)

getObjType

public byte getObjType(int objnum)
                throws java.io.IOException
Throws:
java.io.IOException

addFilter

public void addFilter(Dict stream,
                      java.lang.String filter,
                      java.lang.Object parms)
               throws java.io.IOException
Prepends filter to stream.

Throws:
java.io.IOException

removeFilter

public void removeFilter(Dict stream,
                         java.lang.String filter)
                  throws java.io.IOException
Removes filter and associated DecodeParms from stream.

Throws:
java.io.IOException

deflateStream

public byte[] deflateStream(Dict stream,
                            int objnum)
                     throws java.io.IOException
If data would be smaller with Flate compression applied, apply it, set /Filter /FlateDecode and /Length, and return compressed data. If data would be larger, set /Length so don't try and fail to deflate twice, and return original data.

Returns:
null if dictionary is not a stream
Throws:
java.io.IOException

maybeDeflateData

public static byte[] maybeDeflateData(byte[] data)
                               throws java.io.IOException
Deflates data, if compressed size is smaller than original.

Throws:
java.io.IOException

objEquals

public boolean objEquals(java.lang.Object o1,
                         java.lang.Object o2)
Deep equality testing, recursing through arrays and dictionarys and one level of indirect references.


refcnt

public int[] refcnt()
Reference count PDF objects to see how many times (and if) an object is used.

Returns:
array such that array[objnum] = reference count.

refcntRemove

public int refcntRemove()
Reference count and remove unused objects.

Returns:
number of unused objects

renumber

public void renumber(int[] newnum)
Renumbers IRef's according to newnum[] by descending through object tree (rooted at the Trailer), where new numbers can map to any number and no objects are removed from the object table. Caller retains responsibility to correctly number the objects (n g obj ... endobj) on writing a new PDF.


renumberRemove

public int[] renumberRemove(int[] newnum)
Descends through object tree (rooted at Trailer), renumbering IRef's according to newnum[] and removing unused objects. Assumes that object numbers lie within 0..getObjCnt(), and that a renumbered object is obsolete and deleted in favor of the referenced object; thus some objects will always be removed or there was no use invoking this method. Shrinks the object tables by the number of unused objects, and renumbering is adjusted by moved object positions.

Parameters:
newnum - is mutated so that object id's point to positions in collapsed array
Returns:
offsets array updated to match new object numbers so, in addition to objs_, callers can update parallel arrays for (int i=0; iSee Also:
renumber(int[])

resetPageTree

public void resetPageTree(java.util.List<IRef> leaves)
                   throws java.io.IOException
Rebalances page tree so that each internal node tries to have 20 children and none has no more than 20 children. Subsequently, Sometime before writing, the caller probably wants to invoke liftPageTree().

Parameters:
leaves - holds all the pages, in sequence, with all of their attributes explicit (not relying on inheritance from a parent). A convenient way to accumulate this list is to read IRefs from PDFReader.getPageRef(int) and make attributes explicit with PDFReader.getPage(int).
Throws:
java.io.IOException

liftPageTree

public void liftPageTree()
                  throws java.io.IOException
Removes unnecessarily duplicated inherited attributes in page tree. Removes attributes set to default, such as CropBox same as MediaBox. Sometimes makes a big difference, sometimes no difference.

Throws:
java.io.IOException

convertType1

public void convertType1(java.lang.String subformat)
                  throws java.io.IOException
Convert embedded Type 1 fonts, if any, to a different format. For historical reasons Type 1 fonts are encrypted and not compressed. They can be written out decrypted (NFontType1.SUBFORMAT_DECRYPTED) for low-level inspection. This does not affect other font formats (Type 1C, TrueType, ...).

Throws:
java.io.IOException

setCompress

public void setCompress(boolean b)
Compress objects, or not (for debugging or pedagogical purposes). Also writes COS objects with minimal number of spaces, or to be more readable by humans. Images are always compressed in image-specific formats regardless of this setting, and compression of other objects, if requested, is always with Flate compression (not LZW).


setMonitor

public void setMonitor(boolean b)
Shows status information.


makeObjectStreams

public boolean makeObjectStreams(int start,
                                 int end)
                          throws java.io.IOException
Collect non-stream objects into compressed object streams (introduced in PDF 1.5), in groups of 200 or so. Should be last method invoked before starting to write PDF objects to a file.

Other algorithms that make object streams represent this adding the object streams as dictionaries, storing their component objects in the COS.STREAM_DATA dictionary key, and replacing the old copies of component objects with a number of class COS.CLASS_OBJSTMC that holds the object number of the object stream and setting its generation to the index of the object in the object stream.

Throws:
java.io.IOException

writeHeader

public void writeHeader()
                 throws java.io.IOException
Writes document header: "%PDF-m.n\n%byte/byte/byte/byte\n". After this point, no changes can be made to the encryption settings.

Throws:
java.io.IOException

writeXref

public void writeXref(Dict trailer,
                      int size,
                      long prev,
                      long[] offset,
                      int start,
                      int length)
               throws java.io.IOException
Writes cross reference table and trailer. If PDF version is 1.5 or later and trailer can be found among existing objects, then the cross reference table is written as a stream, which is compressible. As separate method so can write in chunks for Linearized. If an object is a component of an object stream, it should have been set as described in makeObjectStreams(int,int).

Parameters:
size - total number of objects in the file (not just in current xref segment); usually same as getObjCnt()
Throws:
java.io.IOException

writeXref

public void writeXref(Dict trailer,
                      int size,
                      long prev,
                      long[] offset,
                      int[] start,
                      int[] length)
               throws java.io.IOException
Writes cross reference of multiple sections and trailer.

Throws:
java.io.IOException

writeObject

public long writeObject(java.lang.Object obj,
                        int objnum,
                        int objgen)
                 throws java.io.IOException
Writes a top-level object: n g obj contents endobj, with applicable encryption, respecting CryptFilter, if any.

Throws:
java.io.IOException

writeObject

public long writeObject(java.lang.Object obj,
                        int objnum,
                        int objgen,
                        boolean fplain,
                        Encrypt encrypt)
                 throws java.io.IOException
Low-level write of a top-level object: n g obj contents endobj, encrypting according to encrypt, which can be null for no encryption. in that follows encrypt setting, ignoring any CryptFilter. Content streams should pass their data streams as a byte[] under the COS.STREAM_DATA key. If object is a stream and no filter has been applied, applies Flate compression if that results in a smaller object.

Returns:
file offset of start of object, or 0 if the object has been deleted or object number is 0 (which is a special number reserved by PDF).
Throws:
java.io.IOException

writeObject

public java.lang.StringBuffer writeObject(java.lang.Object o,
                                          java.lang.StringBuffer sb,
                                          boolean fcrunch)
Writes contents of passed PDF object to StringBuffer that represents a content stream.

Returns:
same StringBuffer passed in

writeCommandArray

public byte[] writeCommandArray(Cmd[] cmds,
                                boolean prettyprint)
Writes command array back into a byte stream, skipping commands marked invalid. (Doesn't minimize whitespace as done for top-level objects, since compression in streams wipes out advantages.)
  • normalize content stream line ends
  • optionally pretty print content stream: indent under BT..ET / q..Q
  • write floating point numbers compactly and limit resolution to useful range: "0.00" => "0", "5.248549334" => "5.24854"
  • write strings compactly: characters rather than hex, escaped '(' ')' only when unpaired
  • remove separate LZW or Flate compression on inline images as it's more efficient to compress as part of the overall content stream


writeInlineImage

public java.lang.StringBuffer writeInlineImage(Dict params,
                                               byte[] data,
                                               java.lang.StringBuffer sb)
Writes inline image with image data into content stream sb.

See Also:
PDFReader.readInlineImage(InputStreamComposite)

writePDF

public java.lang.Object writePDF(java.util.Observer observer)
                          throws java.io.IOException
Writes data in memory or base PDF to new PDF file, complete with header, xref table, and trailer. If a non-null observer is passed in, it is invoked after fully reading but before writing each PDF object, with observer being invoked with a two-element array consisting of the PDF object and the object number as an Integer. If asked to write onto existing file, first writes to temporary file, then deletes and renames. If an object has not been instantiated, it is instantiated, written, and then cleared in order to free up memory and thus allow PDFs of any size to be processed in a limited amount of memory.

Completely control writing by writing own version of this method with these steps:

  1. writeHeader()
  2. writeObject(Object, int, int)s, keeping track of file offset for xref table
  3. writeXref(Dict,int,long, long[], int, int) table

If PDF version is 1.5 or later, writes trailer and xref as a stream, adding an object for it with the highest object number. If object streams are desired, they must be created beforehand, as by invoking makeObjectStreams(int,int).

If Compact PDF writing mode is set writes entire PDF in format that puts almost every object in a single BZip2 or Flate stream. It is more 30 to 60% more compact on large classes of PDF, but does not conform to the PDF 1.5 specification. It is readable with the PDFReader class, the Multivalent Browser, and the Multivalent tools; and it is valid PDF so you can do incremental writes and so on. See Compact PDF format.

After writing, objects may have been mutated or deleted, and therefore should not be accessed.

Returns:
length of new PDF file (as a Long) or other object for special cases
Throws:
java.io.IOException

writePDF

public java.lang.Object writePDF()
                          throws java.io.IOException
Convience method for writePDF(null).

Throws:
java.io.IOException

writeFDF

public void writeFDF()
              throws java.io.IOException
Writes contents in Forms Data Format (FDF).

Throws:
java.io.IOException

close

public void close()
           throws java.io.IOException
Closes PDFWriter and associated File or OutputStreamTee. If there was a backing PDFReader, any mutated objects are mutated in PDFReader as well, and therefore in most cases that PDFReader instance should be closed as well. After closing, the PDFWriter is invalid and should not be used or queried.

Throws:
java.io.IOException

Multivalent API