| Interface | Description |
|---|---|
| Fetcher |
Interface for Fetcher service of the WebCrawlerWorker and WebFetcherWorker.
|
| LinkExtractor |
Extract links from content contained in input record.
|
| LinkFilter |
interface for LinkFilter services.
|
| RecordProducer |
Produces resulting records from fetched input record.
|
| Class | Description |
|---|---|
| WebCrawlerConstants |
constants used by web crawler and subcomponents: attribute and attachment names, task parameters.
|
| WebCrawlerWorker |
Worker for Web crawling.
|
| WebCrawlingContext |
Context holding information needed throughout most of the web crawling process like mapper, filter configuration etc.
|
| WebExtractorWorker |
Compound extractor worker to use in web crawling workflows.
|
| WebFetcherWorker |
Fetches binary content from URL and stores the content as record attachment.
|
| Enum | Description |
|---|---|
| WebCrawlerConstants.ErrorHandling |
what to do on IO errors when fetching links.
|
| Exception | Description |
|---|---|
| WebCrawlerException |
exceptions thrown by WebCrawler components.
|