Multivalent API

multivalent.std.adaptor.pdf
Class OCR

java.lang.Object
  extended by multivalent.std.adaptor.pdf.OCR

public class OCR
extends java.lang.Object

Normalize OCR + text, which can be implemented in various ways, into a document tree with hybrid image-text leaves (FixedLeafOCRs). Various ways to implement OCR + text in PDF:

  1. image: full page, over content only (not margins), patches, strips width of screen but only 100 pixels high so you need 30 separate FAX images per page
  2. text: invisible text (Tr 3), white text on white background
  3. image + text: image over text, image under text, image under recognized text that's removed from background image (as in maps)
  4. explicit background, which would erase background images drawn in Xdoc strategy
  5. small bounding boxes from Capture, resulting in clipped endpoints

If determined to be scanned paper chunk, convert to method used in Xdoc.

Version:
$Revision: 1.9 $ $Date: 2003/08/29 04:10:18 $

Field Summary
static java.lang.String VAR_LAYER
          Key to Layer for OCR-specific behaviors.
 
Constructor Summary
OCR()
           
 
Method Summary
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

VAR_LAYER

public static final java.lang.String VAR_LAYER
Key to Layer for OCR-specific behaviors.

See Also:
Constant Field Values
Constructor Detail

OCR

public OCR()

Multivalent API