phelps.net
Class Crawler
java.lang.Object
java.util.Observable
phelps.net.Crawler
- All Implemented Interfaces:
- java.lang.Runnable
- public class Crawler
- extends java.util.Observable
- implements java.lang.Runnable
Under development.
Crawler over network or file system, reporting to clients.
Given a start page, crawls.
Specify: this page only, subtree only, site only.
And: levels deep
Multithreaded, no cycles, respects ROBOTS.TXT, ...
Add as an observer to take action after each page parse.
- Version:
- $Revision: 1.2 $ $Date: 2003/06/01 08:13:48 $
- See Also:
RobustHyperlink
Constructor Summary |
Crawler(java.net.URL start)
|
Methods inherited from class java.util.Observable |
addObserver, clearChanged, countObservers, deleteObserver, deleteObservers, hasChanged, notifyObservers, notifyObservers, setChanged |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
fTrace
public boolean fTrace
PAGE
public static final int PAGE
- See Also:
- Constant Field Values
SUBTREE
public static final int SUBTREE
- See Also:
- Constant Field Values
SITE
public static final int SITE
- See Also:
- Constant Field Values
ANY
public static final int ANY
- See Also:
- Constant Field Values
Crawler
public Crawler(java.net.URL start)
setMaxThreads
public void setMaxThreads(int max)
getMaxThreads
public int getMaxThreads()
setScope
public void setScope(int scope)
getScope
public int getScope()
setMaxDepth
public void setMaxDepth(int maxDepth)
getMaxDepth
public int getMaxDepth()
run
public void run()
- Specified by:
run
in interface java.lang.Runnable
main
public static void main(java.lang.String[] argv)