org.eclipse.smila.importing.crawler.web
Interface LinkExtractor
- All Known Implementing Classes:
- SimpleLinkExtractor
public interface LinkExtractor
Extract links from content contained in input record.
extractLinks
java.util.Collection<Record> extractLinks(Record inputRecord,
AnyMap parameters,
TaskLog taskLog)
throws WebCrawlerException
- Parameters:
inputRecord - input record with contentparameters - configuration parameters, may be null.taskLog - log facility provided by worker frame.
- Returns:
- for each extracted link a new record is created
- Throws:
WebCrawlerException