multivalent.std.adaptor.pdf
Class OCR
public
class
OCR
extends Object
Normalize OCR + text, which can be implemented in various ways, into a document tree with
hybrid image-text leaves (
FixedLeafOCRs).
Various ways to implement OCR + text in PDF:
- image: full page, over content only (not margins), patches, strips width of screen but only 100 pixels high so you need 30 separate FAX images per page
- text: invisible text (
Tr 3
), white text on white background
- image + text: image over text, image under text, image under recognized text that's removed from background image (as in maps)
- explicit background, which would erase background images drawn in Xdoc strategy
- small bounding boxes from Capture, resulting in clipped endpoints
If determined to be scanned paper chunk, convert to method used in Xdoc.
Version: $Revision: 1.9 $ $Date: 2003/08/29 04:10:18 $
public static final String VAR_LAYER
Key to
Layer for OCR-specific behaviors.