SMILA 1.0 API documentation

org.eclipse.smila.importing.crawler.web
Interface LinkExtractor

All Known Implementing Classes:
SimpleLinkExtractor

public interface LinkExtractor

Extract links from content contained in input record.


Method Summary
 java.util.Collection<Record> extractLinks(Record inputRecord, AnyMap parameters, TaskLog taskLog)
           
 

Method Detail

extractLinks

java.util.Collection<Record> extractLinks(Record inputRecord,
                                          AnyMap parameters,
                                          TaskLog taskLog)
                                          throws WebCrawlerException
Parameters:
inputRecord - input record with content
parameters - configuration parameters, may be null.
taskLog - log facility provided by worker frame.
Returns:
for each extracted link a new record is created
Throws:
WebCrawlerException

SMILA 1.0 API documentation