|
SMILA 1.0 API documentation | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface Fetcher
Interface for Fetcher service of the WebCrawlerWorker and WebFetcherWorker. The fetcher is responsible for getting metadata and content
Method Summary | |
---|---|
void |
crawl(java.lang.String url,
Record linkRecord,
WebCrawlingContext context)
invoked by WebCrawlerWorker to resolve the URL in an input record. |
void |
fetch(java.lang.String url,
Record crawledRecord,
WebCrawlingContext context)
invoked by WebFetcherWorker to get the content of a resource for which the crawler did not already attach the content. |
Methods inherited from interface org.eclipse.smila.importing.ContentFetcher |
---|
getContent |
Method Detail |
---|
void crawl(java.lang.String url, Record linkRecord, WebCrawlingContext context) throws WebCrawlerException
url
- the url to crawllinkRecord
- record containing the URL and maybe additional information necessary to access the web resource.
WebCrawlerException
- if resource cannot be crawled. If recoverable the request should be retried later, else the record should
be skipped by the crawler worker.void fetch(java.lang.String url, Record crawledRecord, WebCrawlingContext context) throws WebCrawlerException
Please note: the crawledRecord will already have been mapped.
url
- the url to fetch into the recordcrawledRecord
-
WebCrawlerException
- if resource cannot be fetched. If recoverable the request should be retried later, else the record should
be skipped by the crawler worker.
|
SMILA 1.0 API documentation | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |