Interface | Description |
---|---|
Fetcher |
Interface for Fetcher service of the WebCrawlerWorker and WebFetcherWorker.
|
LinkExtractor |
Extract links from content contained in input record.
|
LinkFilter |
interface for LinkFilter services.
|
RecordProducer |
Produces resulting records from fetched input record.
|
Class | Description |
---|---|
WebCrawlerConstants |
constants used by web crawler and subcomponents: attribute and attachment names, task parameters.
|
WebCrawlerWorker |
Worker for Web crawling.
|
WebCrawlingContext |
Context holding information needed throughout most of the web crawling process like mapper, filter configuration etc.
|
WebExtractorWorker |
Compound extractor worker to use in web crawling workflows.
|
WebFetcherWorker |
Fetches binary content from URL and stores the content as record attachment.
|
Enum | Description |
---|---|
WebCrawlerConstants.ErrorHandling |
what to do on IO errors when fetching links.
|
Exception | Description |
---|---|
WebCrawlerException |
exceptions thrown by WebCrawler components.
|