multivalent

Class MediaAdaptor

public abstract class MediaAdaptor extends Behavior implements Runnable

Superclass for behaviors that parse some concrete document format and build a document tree. As much as possible, errors in the input document should be corrected and old constructs modernized so that all the many behaviors that operate on the document tree benefit. Media adaptors should include in the tree complete data, so that a document in the original format can be reconstructed without loss of information. This will not necessarily produce an identical file, after error corrrection during read and pretty printing on write.

New media adaptors can be linked in with the core system by associating a MIME Content-Type header and/or file suffix with the class name in Preferences.

Media adaptors by default display a document accurately, with perfect fidelity sacrificed only if very expensive to do so. Applications not requiring this can speed execution by declaring hints. For example, HINT_NO_IMAGE which suggests to a media adaptor that images won't be needed and so it can operate faster by not creating them. Media adaptors are free to ignore hints.

External Use

To extract text, determine layout coordinates, convert to another document format, convert the document to an image by painting onto the image's Graphics2D, and other uses.

Parsing

  1. Create instance: MediaAdaptor ma = (MediaAdaptor)Behavior
  2. optionally set docURI field (required if document contains relative references)
  3. setInputStream. Subclasses can take also MediaAdaptorChar or setFile.
  4. parse to obtain a document tree, which can be inspected (e.g., to extract text), formatted (e.g., to obtain the layout geometry of HTML), painted (e.g., to convert a PDF page into an image and save that disk), ...
  5. closeInputStream

Formatting ...

Painting on screen or image (java.awt.Graphics2D) ...

Examples: tool.doc.ExtractText Often String toHTML() method defined.

Version: $Revision: 1.8 $ $Date: 2003/06/02 05:08:46 $

Field Summary
URIdocURI
static intHINT_DEFAULTS
By default all hints are off: display complete document with perfect fidelity.
static intHINT_EXACT
Require exact display no matter the computation cost.
static intHINT_FAST
Favor fast display at possible expense of some accuracy.
static intHINT_NO_DISPLAY
Document tree will not be displayed (formatted or painted).
static intHINT_NO_IMAGE
Results should not include images, so there is no need to create them.
static intHINT_NO_SHAPE
Results should not include drawn shapes (rectangles, ellipses, splines).
static intHINT_NO_TEXT
Results should not include text.
static intHINT_NO_TRANSCLUSIONS
Do not incorporate transclusions.
Method Summary
voidbuildAfter(Document doc)
voidbuildBefore(Document doc)
parse concrete document format and put into tree.
voidcloseInputStream()
intgetHints()
protected InputStreamgetInputStream()
booleanisAuthorized()
booleanisStopped()
abstract Objectparse(INode parent)
Translate from a document's data format into a document tree, with structure represented in internal nodes and content (text, images, video, ...) at the leaves.
static ObjectparseHelper(String txt, String adaptor, Layer layer, INode parent)
It is recommended that media adaptors construct document trees that directly and fully represent the document format.
voidrun()
booleansemanticEventAfter(SemanticEvent se, String msg)
On MSG_STOP, set stop flag, which subclass has to check for periodically.
voidsetHints(int hints)
Set document tree construction hints for media adaptor to bit-wise OR of hint flags.
voidsetInputStream(InputStream is)
Close media adaptor, freeing any resources. => Behavior.destroy public void close() { }
voidsetInputStream(String txt)
voidsetPassword(String pw)
voidstop()

Field Detail

docURI

public URI docURI

HINT_DEFAULTS

public static final int HINT_DEFAULTS
By default all hints are off: display complete document with perfect fidelity.

HINT_EXACT

public static final int HINT_EXACT
Require exact display no matter the computation cost. Content is put into the document tree, even if it is not visible. Use this flag if the tree is the basic for translation to another format.

HINT_FAST

public static final int HINT_FAST
Favor fast display at possible expense of some accuracy.

HINT_NO_DISPLAY

public static final int HINT_NO_DISPLAY
Document tree will not be displayed (formatted or painted).

HINT_NO_IMAGE

public static final int HINT_NO_IMAGE
Results should not include images, so there is no need to create them.

HINT_NO_SHAPE

public static final int HINT_NO_SHAPE
Results should not include drawn shapes (rectangles, ellipses, splines).

HINT_NO_TEXT

public static final int HINT_NO_TEXT
Results should not include text.

HINT_NO_TRANSCLUSIONS

public static final int HINT_NO_TRANSCLUSIONS
Do not incorporate transclusions. Useful for full-text indexing that scans each file.

Method Detail

buildAfter

public void buildAfter(Document doc)

buildBefore

public void buildBefore(Document doc)
parse concrete document format and put into tree. Subclasses should set their style sheets, then parse document body, so can progressively render page, if applicable.

closeInputStream

public void closeInputStream()

getHints

public int getHints()

getInputStream

protected InputStream getInputStream()

isAuthorized

public boolean isAuthorized()

isStopped

public boolean isStopped()

parse

public abstract Object parse(INode parent)
Translate from a document's data format into a document tree, with structure represented in internal nodes and content (text, images, video, ...) at the leaves.

Before using, invoke setInputStream. The newly constructed document tree should attach to parent. The parent is usually but not necessarily a Document. Paginated documents should build the current page only, as indicated by the attribute ATTR_PAGE, and report their page count to ATTR_PAGECOUNT. Metadata, such as author and dates, should be stored in the closed containing Document. If encountering an unfixable/unrecoverable parsing error, usually due to an invalid data format, throw a ParseException. (This does not supercede java.io.IOException.)

Subclasses should not rely on being able to obtain a Root, Browser, or Multivalent; in such cases it is acceptable to reduce functionality.

Returns: whatever Object is appropriate to the media adaptor. For HTML it is the root of the HTML tree (which has name "html"), for documents with no single root it can be parent, for an image constuctor it could be an java.awt.Image. However, the primary job of a media adaptor is to add content to the document tree.

See Also: for a convenient way to attach spans

parseHelper

public static Object parseHelper(String txt, String adaptor, Layer layer, INode parent)
It is recommended that media adaptors construct document trees that directly and fully represent the document format. However, it can be expedient to write a quick-and-dirty converter into another a document format, such as Perl POD to HTML. In that case, the converter can generated the target format and throw it to this method convert that to a document tree.

run

public void run()

semanticEventAfter

public boolean semanticEventAfter(SemanticEvent se, String msg)
On MSG_STOP, set stop flag, which subclass has to check for periodically.

setHints

public void setHints(int hints)
Set document tree construction hints for media adaptor to bit-wise OR of hint flags.

setInputStream

public void setInputStream(InputStream is)
Close media adaptor, freeing any resources. => Behavior.destroy public void close() { }

setInputStream

public void setInputStream(String txt)

setPassword

public void setPassword(String pw)

stop

public void stop()