Multivalent API

multivalent
Class MediaAdaptor

java.lang.Object
  extended by multivalent.VObject
      extended by multivalent.Behavior
          extended by multivalent.MediaAdaptor
Direct Known Subclasses:
AppleWorksWP, ASCII, DirectoryFTP, DirectoryLocal, FontSampler, HexDump, JavaClass, ManualPage, ML, MultivalentAdaptor, PDF, PerlPOD, RawImage, RPM, Tar, Texinfo, Unsupported, Zip

public abstract class MediaAdaptor
extends Behavior

Superclass for media adaptors: behaviors that parse some concrete document format and build a document tree. As much as possible, errors in the input document should be corrected and old constructs modernized so that all the many behaviors that operate on the document tree benefit. Media adaptors should include in the tree complete data, so that a document in the original format can be reconstructed without loss of information. This will not necessarily produce an identical file, after error corrrection during read and pretty printing on write.

New media adaptors can be linked in with the core system by associating a MIME Content-Type header and/or file suffix with the class name in Preferences.

Media adaptors by default display a document accurately, with perfect fidelity sacrificed only if very expensive to do so. Applications not requiring this can speed execution by declaring hints. For example, HINT_NO_IMAGE which suggests to a media adaptor that images won't be needed and so it can operate faster by not creating them. Media adaptors are free to ignore hints.

External Use

To extract text, determine layout coordinates, convert to another document format, convert the document to an image by painting onto the image's Graphics2D, and other uses.

Parsing

  1. Create instance: MediaAdaptor ma = (MediaAdaptor)Behavior.getInstance(String, String, Map, Layer)
  2. setInput(InputUni).
  3. optionally setHints(int)
  4. parse(INode) to obtain a document tree, which can be inspected (e.g., to extract text), formatted (e.g., to obtain the layout geometry of HTML), painted (e.g., to convert a PDF page into an image and save that disk), ...
  5. optionally, obtain metadata
  6. close()

Formatting ...

Painting on screen or image (Graphics2D) ...

Examples: tool.doc.ExtractText Often a String toHTML() method defined.

Version:
$Revision: 1.11 $ $Date: 2005/05/01 03:36:53 $
See Also:
ParseException

Field Summary
static int HINT_DEFAULTS
          By default all hints are off: display complete document with perfect fidelity.
static int HINT_EXACT
          Require exact display no matter the computation cost.
static int HINT_FAST
          Favor fast display at possible expense of some accuracy.
static int HINT_METADATA_ONLY
          Read enough to extract metadata, but save time by ignoring content.
static int HINT_NO_IMAGE
          Results need not include images, so there may be no need to create them.
static int HINT_NO_INTERACTIVE
          No interaction by user: clicking, typing.
static int HINT_NO_LAYOUT
          Document tree need not be formatted.
static int HINT_NO_SHAPE
          Results need not include drawn shapes (e.g., rectangles, ellipses, splines).
static int HINT_NO_SHOW
          Document tree will not be shown (on screen or painted), but may be queried.
static int HINT_NO_STYLE
          Results need not record or apply syling (e.g., fonts, colors, line widths).
static int HINT_NO_TEXT
          Results need not include text.
static int HINT_NO_TRANSCLUSION
          Do not incorporate transclusions, such as HTML IFRAME and man page .so.
static int HINT_NONE
          No hints: instantiate full, high quality document content.
static int HINT_NORMALIZE
          Normalize metadata to Dublin Core where applicable.
 
Fields inherited from class multivalent.Behavior
ATTR_BEHAVIOR, name_
 
Fields inherited from class multivalent.VObject
attr_
 
Constructor Summary
MediaAdaptor()
           
 
Method Summary
 void buildBefore(Document doc)
          parse(INode) concrete document format and put into tree.
 void close()
          Close media adaptor, freeing any resources.
 void destroy()
          Protocol.
 int getHints()
           
protected  com.pt.io.InputUni getInputUni()
           
 java.net.URI getURI()
          Returns the logical URI of the document (the data may come from a cache or elsewhere).
 float getZoom()
           
 boolean isAuthorized()
           
 boolean isStopped()
           
abstract  java.lang.Object parse(INode parent)
          Parses a document's data format and constructs a document tree.
static java.lang.Object parseHelper(java.lang.String txt, java.lang.String adaptor, Layer layer, INode parent)
          It is recommended that media adaptors construct document trees that directly and fully represent the document format.
 boolean semanticEventAfter(SemanticEvent se, java.lang.String msg)
          On Document.MSG_STOP, set stop flag, which subclass has to check for periodically.
 void setHints(int hints)
          Set document tree construction hints for media adaptor to bit-wise OR of hint flags.
 void setInput(java.io.File f)
           
 void setInput(com.pt.io.InputUni iu)
           
 void setPassword(java.lang.String pw)
           
 void setZoom(float zoom)
          Sets the zoom factor for the associated document, where 1.0 is the natural size and 1.25 is 25% larger.
 void stop()
           
 
Methods inherited from class multivalent.Behavior
buildAfter, checkRep, clipboardAfter, clipboardBefore, createUI, eventAfter, eventBefore, formatAfter, formatBefore, getBrowser, getDocument, getInstance, getInstance, getLayer, getLogger, getName, getPreference, getRoot, isEditable, paintAfter, paintBefore, putPreference, redo, restore, restoreChildren, save, semanticEventBefore, setName, toString, undo
 
Methods inherited from class multivalent.VObject
attrEntrySetIterator, attrKeysIterator, clearAttributes, getAttr, getAttr, getAttributes, getGlobal, getValue, hasAttributes, putAttr, removeAttr, setAttributes
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

HINT_NONE

public static final int HINT_NONE
No hints: instantiate full, high quality document content.

See Also:
Constant Field Values

HINT_METADATA_ONLY

public static final int HINT_METADATA_ONLY
Read enough to extract metadata, but save time by ignoring content.

See Also:
Constant Field Values

HINT_NO_TEXT

public static final int HINT_NO_TEXT
Results need not include text.

See Also:
Constant Field Values

HINT_NO_IMAGE

public static final int HINT_NO_IMAGE
Results need not include images, so there may be no need to create them.

See Also:
Constant Field Values

HINT_NO_SHAPE

public static final int HINT_NO_SHAPE
Results need not include drawn shapes (e.g., rectangles, ellipses, splines).

See Also:
Constant Field Values

HINT_NO_STYLE

public static final int HINT_NO_STYLE
Results need not record or apply syling (e.g., fonts, colors, line widths).

See Also:
Constant Field Values

HINT_NO_TRANSCLUSION

public static final int HINT_NO_TRANSCLUSION
Do not incorporate transclusions, such as HTML IFRAME and man page .so. Useful for full-text indexing that scans each file.

See Also:
Constant Field Values

HINT_NO_LAYOUT

public static final int HINT_NO_LAYOUT
Document tree need not be formatted. This implies HINT_NO_SHOW.

See Also:
Constant Field Values

HINT_NO_SHOW

public static final int HINT_NO_SHOW
Document tree will not be shown (on screen or painted), but may be queried. This allows formatting with stubs: empty box for images and cheap no-display fonts with same metrics.

See Also:
Constant Field Values

HINT_NO_INTERACTIVE

public static final int HINT_NO_INTERACTIVE
No interaction by user: clicking, typing. HINT_NO_SHOW implies this hint.

See Also:
Constant Field Values

HINT_NORMALIZE

public static final int HINT_NORMALIZE
Normalize metadata to Dublin Core where applicable.

See Also:
Constant Field Values

HINT_EXACT

public static final int HINT_EXACT
Require exact display no matter the computation cost. Content is put into the document tree, even if it is not visible. Use this flag if the tree is the basis for translation to another format.

See Also:
Constant Field Values

HINT_FAST

public static final int HINT_FAST
Favor fast display at possible expense of some accuracy.

See Also:
Constant Field Values

HINT_DEFAULTS

public static final int HINT_DEFAULTS
By default all hints are off: display complete document with perfect fidelity.

See Also:
Constant Field Values
Constructor Detail

MediaAdaptor

public MediaAdaptor()
Method Detail

setInput

public void setInput(com.pt.io.InputUni iu)
              throws java.io.IOException
Throws:
java.io.IOException

setInput

public void setInput(java.io.File f)
              throws java.io.IOException
Throws:
java.io.IOException

getInputUni

protected com.pt.io.InputUni getInputUni()

getURI

public java.net.URI getURI()
Returns the logical URI of the document (the data may come from a cache or elsewhere).


setZoom

public void setZoom(float zoom)
Sets the zoom factor for the associated document, where 1.0 is the natural size and 1.25 is 25% larger. The media adaptor is free to uniformly scale all objects or just fonts or another interpretation.


getZoom

public float getZoom()

getHints

public int getHints()

setHints

public void setHints(int hints)
Set document tree construction hints for media adaptor to bit-wise OR of hint flags.


parse

public abstract java.lang.Object parse(INode parent)
                                throws java.lang.Exception
Parses a document's data format and constructs a document tree. Structure is represented in internal nodes and content (text, images, video, ...) at the leaves.

Before using, invoke setInput(InputUni). The newly constructed document tree should attach to parent. The parent is usually but not necessarily a Document. Paginated documents should build the current page only, as indicated by the attribute Document.ATTR_PAGE, and report their page count to Document.ATTR_PAGECOUNT. Metadata, such as author and dates, should be stored in the closed containing Document.

If encountering an unfixable/unrecoverable parsing error, usually due to an invalid data format, throws a ParseException. (This does not supercede IOException.) When media adaptor is done or has thrown an exception, the client must close() it.

Subclasses should not rely on being able to obtain a Root, Browser, or Multivalent; in such cases it is acceptable to reduce functionality.

Returns:
whatever Object is appropriate to the media adaptor. For HTML it is the root of the HTML tree (which has name "html"), for documents with no single root it can be parent, for an image constuctor it could be an Image. However, the primary job of a media adaptor is to add content to the document tree.
Throws:
java.lang.Exception
See Also:
for a convenient way to attach spans

parseHelper

public static java.lang.Object parseHelper(java.lang.String txt,
                                           java.lang.String adaptor,
                                           Layer layer,
                                           INode parent)
It is recommended that media adaptors construct document trees that directly and fully represent the document format. However, it can be expedient to write a quick-and-dirty converter into another a document format, such as Perl POD to HTML. In that case, the converter can generated the target format and throw it to this method convert that to a document tree.


isAuthorized

public boolean isAuthorized()

setPassword

public void setPassword(java.lang.String pw)

isStopped

public boolean isStopped()

stop

public void stop()

close

public void close()
           throws java.io.IOException
Close media adaptor, freeing any resources.

Throws:
java.io.IOException

buildBefore

public void buildBefore(Document doc)
parse(INode) concrete document format and put into tree. Subclasses should set their style sheets, then parse document body, so can progressively render page, if applicable.

Overrides:
buildBefore in class Behavior
See Also:
Mark

semanticEventAfter

public boolean semanticEventAfter(SemanticEvent se,
                                  java.lang.String msg)
On Document.MSG_STOP, set stop flag, which subclass has to check for periodically.

Overrides:
semanticEventAfter in class Behavior

destroy

public void destroy()
Description copied from class: Behavior
Protocol. Cleans up state before being decommissioned: remove from Layer, observed nodes, .... Clients shouldn't hold a pointer/handle to object after destroy() as it is in an invalid state. This protocol cannot be short-circuited.

Overrides:
destroy in class Behavior

Multivalent API