SMILA 1.0 API documentation

org.eclipse.smila.importing.crawler.web
Interface Fetcher

All Known Implementing Classes:
SimpleFetcher

public interface Fetcher

Interface for Fetcher service of the WebCrawlerWorker and WebFetcherWorker. The fetcher is reponsible for getting metadata and content

Author:
scum36

Method Summary
 void crawl(Record linkRecord, AnyMap parameters, TaskLog taskLog)
          invoked by WebCrawlerWorker to resolve the URL in an input record.
 void fetch(Record crawledRecord, AnyMap parameters, TaskLog taskLog)
          invoked by WebFetcherWorker to get the content of a resource for which the crawler did not already attach the content.
 

Method Detail

crawl

void crawl(Record linkRecord,
           AnyMap parameters,
           TaskLog taskLog)
           throws WebCrawlerException
invoked by WebCrawlerWorker to resolve the URL in an input record. Must write metadata from HTTP header to attributes, and attaches the content of resources that can be used for link extraction.

Parameters:
linkRecord - record containing the URL and maybe additional information necessary to access the web resource.
parameters - configuration parameters, may be null.
taskLog - log facility provided by worker frame.
Throws:
WebCrawlerException - if resource cannot be crawled. If recoverable the request should be retried later, else the record should be skipped by the crawler worker.

fetch

void fetch(Record crawledRecord,
           AnyMap parameters,
           TaskLog taskLog)
           throws WebCrawlerException
invoked by WebFetcherWorker to get the content of a resource for which the crawler did not already attach the content.

Parameters:
crawledRecord -
parameters - configuration parameters, may be null.
taskLog - log facility provided by worker frame.
Throws:
WebCrawlerException - if resource cannot be fetched. If recoverable the request should be retried later, else the record should be skipped by the crawler worker.

SMILA 1.0 API documentation