|
Multivalent API | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectphelps.net.RobustHyperlink
Augment URL with information that can be used to find content of URL in case link breaks. See the Robust Home Page.
Strategy: Inverse word frequency: find top n most common words in document that are uncommon in web overall.
#addsignature(URL, String)
, stripSignature(String)
,
getSignature(String)
, getSignatureWords(String)
document tree
, String of words
, List of words
tool.LexSig
,
tool.html.Robust
Field Summary | |
---|---|
static int |
ALGORITHM_RANDOM
Picks words randomly. |
static int |
ALGORITHM_RANDOM100K
Picks words randomly from those that appear in fewer than 100,000 web pages. |
static int |
ALGORITHM_RAREST
Rarest picks the words rarest in the web. |
static int |
ALGORITHM_TFIDF
Term frequency-inverse document frequency picks the most frequent words in the document that are the rarest in the web. |
static int |
ALGORITHM_TFIDF2
Refines tfidf by capping page frequency at 3 to bias toward rarity. |
static boolean |
DEBUG
|
static boolean |
FoldCase
Ignore case in collecting words? |
static int |
MinWordLength
|
static java.lang.String |
PARAMETER
Canonical definition of parameter used for lexical signatures. |
static int |
SignatureLength
Signature length (in words). |
static boolean |
Verbose
|
static java.lang.String |
VERSION
|
Method Summary | |
---|---|
static java.lang.String |
addSignature(java.net.URL url,
java.lang.String words)
Add signature words to url. |
static java.lang.String |
computeSignature(java.util.List<java.lang.String> words)
Compute signature from list of words. |
static java.lang.String |
computeSignature(Node root)
Compute signature from document tree. |
static java.lang.String |
computeSignature(java.lang.String txt)
Compute signature from parsed txt. |
static int |
getFreq(java.lang.String word)
Determine web page frequency of word. |
static java.lang.String |
getSignature(java.lang.String surl)
Return signature as found in string. |
static java.lang.String |
getSignatureWords(java.lang.String surl)
Return signature as plain words: no "? |
static void |
setAlgorithm(int alg)
Set algorithm to use (N.B.: static ). |
static void |
setEngine(java.lang.String prefix,
java.lang.String hook)
Sets the search engine and key text fragment that signals the start of the web frequency information. |
static void |
setWordCache(java.io.File cache)
Client can set the file to use as the user's supplemental word frequency cache. |
static java.lang.String |
stripSignature(java.lang.String surl)
Given a URL in String form, return URL with signature, if any, stripped off. |
static void |
writeCache()
Writes user word frequency cache. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static boolean DEBUG
public static final java.lang.String VERSION
public static final java.lang.String PARAMETER
public static final int ALGORITHM_TFIDF
public static final int ALGORITHM_TFIDF2
public static final int ALGORITHM_RAREST
public static final int ALGORITHM_RANDOM
public static final int ALGORITHM_RANDOM100K
public static boolean Verbose
public static boolean FoldCase
public static int MinWordLength
public static int SignatureLength
Method Detail |
---|
public static void setWordCache(java.io.File cache)
public static void setEngine(java.lang.String prefix, java.lang.String hook)
prefix
- URL of search submissions with the query term at the end and left blankhook
- contant words in the HTML page results near the word frequency numberpublic static void setAlgorithm(int alg)
static
).
public static java.lang.String addSignature(java.net.URL url, java.lang.String words)
public static java.lang.String stripSignature(java.lang.String surl)
public static java.lang.String getSignature(java.lang.String surl)
public static java.lang.String getSignatureWords(java.lang.String surl)
public static void writeCache()
public static int getFreq(java.lang.String word)
public static java.lang.String computeSignature(Node root)
public static java.lang.String computeSignature(java.lang.String txt)
public static java.lang.String computeSignature(java.util.List<java.lang.String> words)
|
Multivalent API | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |