phelps.net
public class RobustHyperlink extends Object
Strategy: Inverse word frequency: find top n most common words in document that are uncommon in web overall.
document tree
, String of words
, List of words
Version: $Revision: 1.9 $ $Date: 2003/07/04 08:04:35 $
See Also: tool.LexSig tool.html.Robust
Field Summary | |
---|---|
static int | ALGORITHM_RANDOM Picks words randomly. |
static int | ALGORITHM_RANDOM100K Picks words randomly from those that appear in fewer than 100,000 web pages. |
static int | ALGORITHM_RAREST Rarest picks the words rarest in the web. |
static int | ALGORITHM_TFIDF Term frequency-inverse document frequency picks the most frequent words in the document that are the rarest in the web. |
static int | ALGORITHM_TFIDF2 Refines tfidf by capping page frequency at 3 to bias toward rarity. |
static boolean | DEBUG |
static boolean | FoldCase Ignore case in collecting words? |
static int | MinWordLength |
static String | PARAMETER Canonical definition of parameter used for lexical signatures. |
static int | SignatureLength Signature length (in words). |
static boolean | Verbose |
static String | VERSION |
Method Summary | |
---|---|
static String | addSignature(URL url, String words) Add signature words to url. |
static String | computeSignature(Node root) Compute signature from document tree. |
static String | computeSignature(String txt) Compute signature from parsed txt. |
static String | computeSignature(List<String> words) Compute signature from list of words. |
static int | getFreq(String word) Determine web page frequency of word. |
static String | getSignature(String surl)
Return signature as found in string.
|
static String | getSignatureWords(String surl) Return signature as plain words: no "? |
static void | setAlgorithm(int alg) Set algorithm to use (N.B.: static ). |
static void | setEngine(String prefix, String hook)
Sets the search engine and key text fragment that signals the start of the web frequency information.
|
static void | setWordCache(File cache)
Client can set the file to use as the user's supplemental word frequency cache.
|
static String | stripSignature(String surl) Given a URL in String form, return URL with signature, if any, stripped off. |
static void | writeCache()
Writes user word frequency cache. |
static
).Parameters: prefix URL of search submissions with the query term at the end and left blank hook contant words in the HTML page results near the word frequency number