multivalent.std.adaptor.pdf

Class OCR

public class OCR extends Object

Normalize OCR + text, which can be implemented in various ways, into a document tree with hybrid image-text leaves (FixedLeafOCRs). Various ways to implement OCR + text in PDF:
  1. image: full page, over content only (not margins), patches, strips width of screen but only 100 pixels high so you need 30 separate FAX images per page
  2. text: invisible text (Tr 3), white text on white background
  3. image + text: image over text, image under text, image under recognized text that's removed from background image (as in maps)
  4. explicit background, which would erase background images drawn in Xdoc strategy
  5. small bounding boxes from Capture, resulting in clipped endpoints

If determined to be scanned paper chunk, convert to method used in Xdoc.

Version: $Revision: 1.9 $ $Date: 2003/08/29 04:10:18 $

Field Summary
static StringVAR_LAYER
Key to Layer for OCR-specific behaviors.

Field Detail

VAR_LAYER

public static final String VAR_LAYER
Key to Layer for OCR-specific behaviors.