|
SMILA 1.0 API documentation | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.eclipse.smila.importing.crawler.web.fetcher.DefaultFetcher
public class DefaultFetcher
Example implementation of a Fetcher service. It uses GET method to access the resource.
| Constructor Summary | |
|---|---|
DefaultFetcher()
initialize HttpClient with disabled redirects. |
|
| Method Summary | |
|---|---|
void |
crawl(java.lang.String url,
Record linkRecord,
WebCrawlingContext context)
invoked by WebCrawlerWorker to resolve the URL in an input record. |
void |
fetch(java.lang.String url,
Record crawledRecord,
WebCrawlingContext context)
invoked by WebFetcherWorker to get the content of a resource for which the crawler did not already attach the content. |
java.io.InputStream |
getContent(Record crawledRecord,
TaskContext taskContext)
get a stream on a content object. |
void |
setLinkFilter(LinkFilter linkFilter)
DS service reference injection method. |
void |
setVisitedLinks(VisitedLinksService visitedLinks)
DS service reference injection method. |
void |
unsetLinkFilter(LinkFilter linkFilter)
DS service reference removal method. |
void |
unsetVisitedLinks(VisitedLinksService visitedLinks)
DS service reference removal method. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public DefaultFetcher()
| Method Detail |
|---|
public void crawl(java.lang.String url,
Record linkRecord,
WebCrawlingContext context)
throws WebCrawlerException
Fetcher
crawl in interface Fetcherurl - the url to crawllinkRecord - record containing the URL and maybe additional information necessary to access the web resource.
WebCrawlerException - if resource cannot be crawled. If recoverable the request should be retried later, else the record should
be skipped by the crawler worker.
public void fetch(java.lang.String url,
Record crawledRecord,
WebCrawlingContext context)
throws WebCrawlerException
FetcherPlease note: the crawledRecord will already have been mapped.
fetch in interface Fetcherurl - the url to fetch into the record
WebCrawlerException - if resource cannot be fetched. If recoverable the request should be retried later, else the record should
be skipped by the crawler worker.
public java.io.InputStream getContent(Record crawledRecord,
TaskContext taskContext)
throws ImportingException
Please note: a mapped record (at least URL must be mapped) is expected here!
getContent in interface ContentFetchercrawledRecord - a crawled record describing the content object.taskContext - the TaskContexrt containing job parameters and more
ImportingException - error accessing the content object.public void setVisitedLinks(VisitedLinksService visitedLinks)
public void unsetVisitedLinks(VisitedLinksService visitedLinks)
public void setLinkFilter(LinkFilter linkFilter)
public void unsetLinkFilter(LinkFilter linkFilter)
|
SMILA 1.0 API documentation | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||