org.eclipse.smila.importing.crawler.web
Interface LinkFilter
- All Known Implementing Classes:
- SimpleLinkFilter
public interface LinkFilter
interface for LinkFilter services. The LinkFilter is called on the result of the LinkExtractor to select only
those links that should really be followed in follow-up tasks.
filterLinks
java.util.Collection<Record> filterLinks(java.util.Collection<Record> extractedLinks,
Record sourceLink,
AnyMap parameters,
TaskLog taskLog)
throws WebCrawlerException
- filter extracted links.
- Parameters:
extractedLinks - result from LinkExtractor service.sourceLink - record from which links where extracted.parameters - task parameters, can configure the operation.taskLog - log facility provided by WorkerManager.
- Returns:
- links to follow in follow-up tasks
- Throws:
WebCrawlerException - error in processing the links.