|
Multivalent API | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectmultivalent.VObject
multivalent.Behavior
multivalent.MediaAdaptor
multivalent.std.adaptor.pdf.PDF
Parse a page of PDF and display with Java 2's 2D API.
The PDF content stream is translated into a Multivalent document tree as follows. The tree is live: reformat. Objects drawn as appear in content stream, which usually but not necessarily follows reading order, To see the document tree for any particular PDF page, turn on the Debug switch in the Help menu, then select Debug/View Document Tree.
BT
..ET
) have subtrees rooted at a FixedI
with name "text".
Under that can be any number of lines, which collect text that have been determined to share the same baseline in FixedIHBox
s named "line".
(Some PDF generators generate an inordinate number of BT..ET blocks, as for instance on version of pdfTeX generated a block
for each dot in a table of contents between header and page number, but most generators use for meaningful blocks of text.)
PDF text streams are normalized to word chunks in FixedLeafUnicodeKern
s, with special kerning between letters, whether from TJ or Tz or small TD/TM/..., stored in the leaf.
Text is translated into Unicode, from whatever original encoding (Macintosh, Macintosh Expert, Windows, PDF, Adobe Standard). However, if the encoding is nonstandard and found only in font tables, it is not translated.
Text content is available from the node via Node.getName()
.
FixedLeafImage
s. The BufferedImage
is available via LeafImage.getImage()
,
and the image's colorspace via BufferedImage.getColorModel()
.
Images from XObjects have the reference /Name
as the GI,
and inline images (BI
..ID
..EI
) have the GI "[inline]".
FixedLeafShape
s, with fill and stroke flags.
Paths are kept as simple Line2D
with GI "line" or Rectangle
with GI "rect" if possible, else GeneralPath
with GI "path".
, paths as Rectangle "rect" if possible, else "line", else GeneralPath "path",
Node.bbox
,
but the command positioning it there (cm
, Td
, ...) is not maintained.
Transformation matrices (cm
, Tm
) are reflected in final sizes and not maintained as separate objects.
SpanPDF
s, and all colors are translated into RGB.
Fonts (family, size, style), text rise (Ts
), text rendering mode (Tr
) are all maintained as SpanPDF
s.
Other attributes (line width, line cap style, line join style, miter limit, dash array, ...) are all maintained as SpanPDF
s
such that if several change at once they are batched in same span and if any of the group changes a new span is started,
which means that only one span for these attributes is active at any point.
Sometimes a PDF generator produces redundant color/font/attribute changes (pdfTeX sets the color to 1 1 1 1 K
and again immediately to 1 1 1 1 K
)
or useless changes (e.g., setting the color and then setting it to something else without drawing anything) --
all redundent and useless changes are optimized away.
MP
/DP
) are Mark
s, with the point name as the Mark name.
Marked regions (BMC
/BDC
..EMC
) are simple Span
s, with the region name as the Span name and with any region attributes in span attributes.
W
/W*
) are FixedIClip
.
Clipping regions cannot be enlarged (push the clip onto the graphics stack with q
..Q
to temporarily reduce it),
but some PDF generators don't know this: useless clipping changes are optimized away.
FixedLeafShade
.
Document
StyleSheet
.
Tr 3
or overdrawn with image)
is associated with the corresponding image fragment and transformed into FixedLeafOCR
, and the independent image os removed.
(This allows hybrid image-OCR PDFs to work as expected with other behaviors, such as select and paste and the Show OCR lens.)
Anno.MSG_CREATE
.
Other behaviors can translate them into entities on the document tree, often spans.
Field Summary | |
---|---|
static boolean |
GoFast
Go fast or be exactly correct. |
static java.lang.String |
MSG_DUMP
Message of semantic event to control dumping of uncompress and decrypted content stream to temporary file. |
static java.lang.String |
MSG_GO_FAST
Message "pdfSetGoFast": faster rendering if sometimes less accurate: arg=boolean or null to toggle. |
static java.lang.String |
MSG_OWNER_PASSWORD
Message of semantic event to set the user password so encrypted files can be read, with the password String passed in arg. |
static java.lang.String |
MSG_USER_PASSWORD
Message of semantic event to set the owner password so encrypted files can be read, with the password String passed in arg. |
static java.lang.String |
OCG_OFF
|
static java.lang.String |
OCG_ON
|
static java.lang.String |
VAR_OCG
Optional content groups stored in Document under this key. |
Fields inherited from class multivalent.MediaAdaptor |
---|
HINT_DEFAULTS, HINT_EXACT, HINT_FAST, HINT_METADATA_ONLY, HINT_NO_IMAGE, HINT_NO_INTERACTIVE, HINT_NO_LAYOUT, HINT_NO_SHAPE, HINT_NO_SHOW, HINT_NO_STYLE, HINT_NO_TEXT, HINT_NO_TRANSCLUSION, HINT_NONE, HINT_NORMALIZE |
Fields inherited from class multivalent.Behavior |
---|
ATTR_BEHAVIOR, name_ |
Fields inherited from class multivalent.VObject |
---|
attr_ |
Constructor Summary | |
---|---|
PDF()
|
Method Summary | |
---|---|
void |
close()
Close media adaptor, freeing any resources. |
boolean |
formatAfter(Node node)
Enlarge content root to MediaBox. |
java.awt.Rectangle |
getCropBox()
|
java.util.Map<java.lang.String,java.lang.Object> |
getForm()
Returns interactive from as Map with keys the fully qualified. |
java.awt.Rectangle |
getMediaBox()
|
PDFReader |
getReader()
|
java.awt.geom.AffineTransform |
getTransform()
|
boolean |
isAuthorized()
|
java.lang.Object |
parse(INode parent)
Parses individual page indicated in Document.ATTR_PAGE of parent's containing Document
and returns formatted document tree rooted at parent as described above. |
boolean |
semanticEventAfter(SemanticEvent se,
java.lang.String msg)
Implements MSG_DUMP , MSG_USER_PASSWORD , MSG_OWNER_PASSWORD . |
boolean |
semanticEventBefore(SemanticEvent se,
java.lang.String msg)
"Dump PDF to temp dir" in Debug menu. |
void |
setPassword(java.lang.String pw)
|
void |
setReader(PDFReader pdfr)
|
Methods inherited from class multivalent.MediaAdaptor |
---|
buildBefore, destroy, getHints, getInputUni, getURI, getZoom, isStopped, parseHelper, setHints, setInput, setInput, setZoom, stop |
Methods inherited from class multivalent.Behavior |
---|
buildAfter, checkRep, clipboardAfter, clipboardBefore, createUI, eventAfter, eventBefore, formatBefore, getBrowser, getDocument, getInstance, getInstance, getLayer, getLogger, getName, getPreference, getRoot, isEditable, paintAfter, paintBefore, putPreference, redo, restore, restoreChildren, save, setName, toString, undo |
Methods inherited from class multivalent.VObject |
---|
attrEntrySetIterator, attrKeysIterator, clearAttributes, getAttr, getAttr, getAttributes, getGlobal, getValue, hasAttributes, putAttr, removeAttr, setAttributes |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String MSG_GO_FAST
public static final java.lang.String MSG_OWNER_PASSWORD
public static final java.lang.String MSG_USER_PASSWORD
public static final java.lang.String MSG_DUMP
public static final java.lang.String VAR_OCG
Document
under this key.
The value there is a Map
with names of optional content groups as keys and OCG_ON
and OCG_OFF
as values.
public static final java.lang.String OCG_ON
public static final java.lang.String OCG_OFF
public static boolean GoFast
Constructor Detail |
---|
public PDF()
Method Detail |
---|
public void setReader(PDFReader pdfr) throws java.io.IOException
java.io.IOException
public PDFReader getReader()
public boolean isAuthorized()
isAuthorized
in class MediaAdaptor
public void setPassword(java.lang.String pw)
setPassword
in class MediaAdaptor
public java.awt.Rectangle getMediaBox()
public java.awt.Rectangle getCropBox()
public java.awt.geom.AffineTransform getTransform()
public java.util.Map<java.lang.String,java.lang.Object> getForm() throws java.io.IOException
Map
with keys the fully qualified.
This Map represents the current settings of the form, as modified by the user or by PDF actions (reset, import) or programmatically.
java.io.IOException
public java.lang.Object parse(INode parent) throws java.io.IOException, ParseException
Document.ATTR_PAGE
of parent's containing Document
and returns formatted document tree rooted at parent as described above.
parse
in class MediaAdaptor
java.io.IOException
ParseException
for a convenient way to attach spans
public boolean formatAfter(Node node)
formatAfter
in class Behavior
public boolean semanticEventBefore(SemanticEvent se, java.lang.String msg)
semanticEventBefore
in class Behavior
public boolean semanticEventAfter(SemanticEvent se, java.lang.String msg)
MSG_DUMP
, MSG_USER_PASSWORD
, MSG_OWNER_PASSWORD
.
semanticEventAfter
in class MediaAdaptor
public void close() throws java.io.IOException
MediaAdaptor
close
in class MediaAdaptor
java.io.IOException
|
Multivalent API | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |