|
SMILA (incubation) API documentation | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.eclipse.smila.connectivity.framework.crawler.web.http.RobotRulesParser
public class RobotRulesParser
This class handles the parsing of robots.txt files. It emits RobotRules objects, which describe the
download permissions as described in RobotRulesParser.
| Nested Class Summary | |
|---|---|
static class |
RobotRulesParser.RobotRuleSet
This class holds the rules which were parsed from a robots.txt file, and can test paths against those rules. |
| Constructor Summary | |
|---|---|
RobotRulesParser(Configuration conf)
Creates new RobotRulesParser with the given configuration. |
|
| Method Summary | |
|---|---|
Configuration |
getConf()
Return the configuration used by this object. |
long |
getCrawlDelay(HttpBase http,
java.net.URL url)
Returns a Crawl-Delay value extracted from robots.txt file. |
boolean |
isAllowed(HttpBase http,
java.net.URL url)
Returns true if the URL is allowed for fetching and false otherwise. |
void |
setConf(Configuration conf)
Set the configuration to be used by this object. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public RobotRulesParser(Configuration conf)
conf - Configuration| Method Detail |
|---|
public void setConf(Configuration conf)
setConf in interface Configurableconf - Configurationpublic Configuration getConf()
getConf in interface Configurable
public boolean isAllowed(HttpBase http,
java.net.URL url)
true if the URL is allowed for fetching and false otherwise.
http - HttpBase object that is used to get the robots.txt contents.url - URL to be checked.
public long getCrawlDelay(HttpBase http,
java.net.URL url)
http - HttpBase object that is used to get the robots.txt contents.url - URL
|
SMILA (incubation) API documentation | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||