public class DefaultFetcher extends java.lang.Object implements Fetcher
| Constructor and Description |
|---|
DefaultFetcher()
initialize HttpClient with disabled redirects.
|
| Modifier and Type | Method and Description |
|---|---|
void |
crawl(java.lang.String url,
Record linkRecord,
WebCrawlingContext context)
invoked by WebCrawlerWorker to resolve the URL in an input record.
|
void |
fetch(java.lang.String url,
Record crawledRecord,
WebCrawlingContext context)
invoked by WebFetcherWorker to get the content of a resource for which the crawler did not already attach the
content.
|
java.io.InputStream |
getContent(Record crawledRecord,
TaskContext taskContext)
get a stream on a content object.
|
void |
setJobRunDataProvider(JobRunDataProvider jobRunDataProvider)
DS service reference injection method.
|
void |
setLinkFilter(LinkFilter linkFilter)
DS service reference injection method.
|
void |
setVisitedLinks(VisitedLinksService visitedLinks)
DS service reference injection method.
|
void |
unsetJobRunDataProvider(JobRunDataProvider jobRunDataProvider)
DS service reference removal method.
|
void |
unsetLinkFilter(LinkFilter linkFilter)
DS service reference removal method.
|
void |
unsetVisitedLinks(VisitedLinksService visitedLinks)
DS service reference removal method.
|
public DefaultFetcher()
public void crawl(java.lang.String url,
Record linkRecord,
WebCrawlingContext context)
throws WebCrawlerException
Fetchercrawl in interface Fetcherurl - the url to crawllinkRecord - record containing the URL and maybe additional information necessary to access the web resource.WebCrawlerException - if resource cannot be crawled. If recoverable the request should be retried later, else the record should
be skipped by the crawler worker.public void fetch(java.lang.String url,
Record crawledRecord,
WebCrawlingContext context)
throws WebCrawlerException
FetcherPlease note: the crawledRecord will already have been mapped.
fetch in interface Fetcherurl - the url to fetch into the recordWebCrawlerException - if resource cannot be fetched. If recoverable the request should be retried later, else the record should
be skipped by the crawler worker.public java.io.InputStream getContent(Record crawledRecord, TaskContext taskContext) throws ImportingException
Please note: a mapped record (at least URL must be mapped) is expected here!
getContent in interface ContentFetchercrawledRecord - a crawled record describing the content object.taskContext - the TaskContexrt containing job parameters and moreImportingException - error accessing the content object.public void setVisitedLinks(VisitedLinksService visitedLinks)
public void unsetVisitedLinks(VisitedLinksService visitedLinks)
public void setLinkFilter(LinkFilter linkFilter)
public void unsetLinkFilter(LinkFilter linkFilter)
public void setJobRunDataProvider(JobRunDataProvider jobRunDataProvider)
public void unsetJobRunDataProvider(JobRunDataProvider jobRunDataProvider)