|
SMILA (incubation) API documentation | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.eclipse.smila.connectivity.framework.crawler.web.http.RobotRulesParser
public class RobotRulesParser
This class handles the parsing of robots.txt
files. It emits RobotRules objects, which describe the
download permissions as described in RobotRulesParser.
Nested Class Summary | |
---|---|
static class |
RobotRulesParser.RobotRuleSet
This class holds the rules which were parsed from a robots.txt file, and can test paths against those rules. |
Constructor Summary | |
---|---|
RobotRulesParser(Configuration conf)
Creates new RobotRulesParser with the given configuration. |
Method Summary | |
---|---|
Configuration |
getConf()
Return the configuration used by this object. |
long |
getCrawlDelay(HttpBase http,
java.net.URL url)
Returns a Crawl-Delay value extracted from robots.txt file. |
boolean |
isAllowed(HttpBase http,
java.net.URL url)
Returns true if the URL is allowed for fetching and false otherwise. |
void |
setConf(Configuration conf)
Set the configuration to be used by this object. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public RobotRulesParser(Configuration conf)
conf
- ConfigurationMethod Detail |
---|
public void setConf(Configuration conf)
setConf
in interface Configurable
conf
- Configurationpublic Configuration getConf()
getConf
in interface Configurable
public boolean isAllowed(HttpBase http, java.net.URL url)
true
if the URL is allowed for fetching and false
otherwise.
http
- HttpBase
object that is used to get the robots.txt contents.url
- URL to be checked.
public long getCrawlDelay(HttpBase http, java.net.URL url)
http
- HttpBase
object that is used to get the robots.txt contents.url
- URL
|
SMILA (incubation) API documentation | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |