tool
public class ExtractText extends Object
command line tool
or method ExtractText from another program.
Version: $Revision: 1.11 $ $Date: 2003/08/29 06:07:31 $
See Also: java.text.BreakIterator
Field Summary | |
---|---|
static String | USAGE |
static String | VERSION |
Constructor Summary | |
---|---|
ExtractText() |
Method Summary | |
---|---|
void | defaults() |
String | extract(URI uri, String mimeType)
Return java.lang.StringBuffer with text of document at uri. |
void | extractFlow(Node top, StringBuffer sb)
Traverse document tree and extract text. |
static void | extractFlowFixed(Node top, StringBuffer sb) Extract text in same flow as document tree but track coordinates, which is apt for PDF. |
static void | extractFlowStruct(Node n, StringBuffer sb) Extract text by following structure in document tree, which is apt for HTML. |
static void | extractLayout(Node top, StringBuffer sb)
Preserve layout as much as possible in straight ASCII. |
static void | main(String[] argv) |
void | setLayout(boolean b) |
void | setQuiet(boolean b) |
void | setRange(String range) |
void | setVerbose(boolean b) |
Parameters: mimeType mimeType of document, or null
if not known uri URI of document, local file or network
Returns: null
if document of unknown type