multivalent.std.adaptor.pdf
public class PDF extends MediaAdaptorRandom
The PDF content stream is translated into a Multivalent document tree as follows. The tree is live: reformat. Objects drawn as appear in content stream, which usually but not necessarily follows reading order, To see the document tree for any particular PDF page, turn on the Debug switch in the Help menu, then select Debug/View Document Tree.
BT
..ET
) have subtrees rooted at a FixedI with name "text".
Under that can be any number of lines, which collect text that have been determined to share the same baseline in FixedIHBoxs named "line".
(Some PDF generators generate an inordinate number of BT..ET blocks, as for instance on version of pdfTeX generated a block
for each dot in a table of contents between header and page number, but most generators use for meaningful blocks of text.)
PDF text streams are normalized to word chunks in FixedLeafAsciiKerns, with special kerning between letters, whether from TJ or Tz or small TD/TM/..., stored in the leaf.
Text is translated into Unicode, from whatever original encoding (Macintosh, Macintosh Expert, Windows, PDF, Adobe Standard). However, if the encoding is nonstandard and found only in font tables, it is not translated.
Text content is available from the node via getName.
/Name
as the GI,
and inline images (BI
..ID
..EI
) have the GI "[inline]".
cm
, Td
, ...) is not maintained.
Transformation matrices (cm
, Tm
) are reflected in final sizes and not maintained as separate objects.
Ts
), text rendering mode (Tr
) are all maintained as SpanPDFs.
Other attributes (line width, line cap style, line join style, miter limit, dash array, ...) are all maintained as SpanPDFs
such that if several change at once they are batched in same span and if any of the group changes a new span is started,
which means that only one span for these attributes is active at any point.
Sometimes a PDF generator produces redundant color/font/attribute changes (pdfTeX sets the color to 1 1 1 1 K
and again immediately to 1 1 1 1 K
)
or useless changes (e.g., setting the color and then setting it to something else without drawing anything) --
all redundent and useless changes are optimized away.
MP
/DP
) are Marks, with the point name as the Mark name.
Marked regions (BMC
/BDC
..EMC
) are simple Spans, with the region name as the Span name and with any region attributes in span attributes.
W
/W*
) are FixedIClip.
Clipping regions cannot be enlarged (push the clip onto the graphics stack with q
..Q
to temporarily reduce it),
but some PDF generators don't know this: useless clipping changes are optimized away.
Tr 3
or overdrawn with image)
is associated with the corresponding image fragment and transformed into FixedLeafOCR, and the independent image os removed.
(This allows hybrid image-OCR PDFs to work as expected with other behaviors, such as select and paste and the Show OCR lens.)
Other PDF viewers:
Version: $Revision: 1.149 $ $Date: 2004/02/05 06:12:41 $
Field Summary | |
---|---|
static boolean | GoFast Go fast or be exactly correct. |
static String | MSG_DUMP Message of semantic event to control dumping of uncompress and decrypted content stream to temporary file. |
static String | MSG_GO_FAST Message "pdfSetGoFast": faster rendering if sometimes less accurate: arg=boolean or null to toggle. |
static String | MSG_OWNER_PASSWORD Message of semantic event to set the user password so encrypted files can be read, with the password String passed in arg. |
static String | MSG_USER_PASSWORD Message of semantic event to set the owner password so encrypted files can be read, with the password String passed in arg. |
static String | OCG_OFF |
static String | OCG_ON |
static String | VAR_OCG
Optional content groups stored in Document under this key.
|
Method Summary | |
---|---|
boolean | formatAfter(Node node) Enlarge content root to MediaBox. |
Rectangle | getCropBox() |
PDFReader | getReader() |
AffineTransform | getTransform() |
boolean | isAuthorized() |
Object | parse(INode parent) |
boolean | semanticEventAfter(SemanticEvent se, String msg) |
boolean | semanticEventBefore(SemanticEvent se, String msg) "Dump PDF to temp dir" in Debug menu. |
void | setPassword(String pw) |
void | setZoom(float percent) Set zoom/magnification percentage, from 25% to 1600%. |
Returns: root of PDF subtree under parent