SMILA (incubation) API documentation

org.eclipse.smila.connectivity.framework.crawler.web.http
Class SitemapParser

java.lang.Object
  extended by org.eclipse.smila.connectivity.framework.crawler.web.http.SitemapParser
All Implemented Interfaces:
Configurable

public class SitemapParser
extends java.lang.Object
implements Configurable

This class handles the parsing of sitemap.xml files.


Constructor Summary
SitemapParser(Configuration conf)
          Creates new SitemapParser with the given configuration.
 
Method Summary
 Configuration getConf()
          Return the configuration used by this object.
 Outlink[] getSitemapLinks(HttpBase http, java.net.URL url)
          Returns a set of Outlinks extracted from sitemap.xml loc tag.
 void setConf(Configuration conf)
          Set the configuration to be used by this object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SitemapParser

public SitemapParser(Configuration conf)
Creates new SitemapParser with the given configuration.

Parameters:
conf - Configuration
Method Detail

setConf

public void setConf(Configuration conf)
Set the configuration to be used by this object.

Specified by:
setConf in interface Configurable
Parameters:
conf - Configuration

getConf

public Configuration getConf()
Return the configuration used by this object.

Specified by:
getConf in interface Configurable
Returns:
Configuration

getSitemapLinks

public Outlink[] getSitemapLinks(HttpBase http,
                                 java.net.URL url)
Returns a set of Outlinks extracted from sitemap.xml loc tag.

Parameters:
http - HttpBase object used to fetch sitemap.xml file.
url - URL to fetch site map for.
Returns:
Outlink[] Array of extracted outlinks. If no outlinks were found returns an empty array.

SMILA (incubation) API documentation