org.eclipse.smila.importing.crawler.web.extractor
Class SimpleLinkExtractor
java.lang.Object
org.eclipse.smila.importing.crawler.web.extractor.SimpleLinkExtractor
- All Implemented Interfaces:
- LinkExtractor
public class SimpleLinkExtractor
- extends java.lang.Object
- implements LinkExtractor
Simple LinkExtractor implementation using an HTML extractor.
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SimpleLinkExtractor
public SimpleLinkExtractor()
extractLinks
public java.util.Collection<Record> extractLinks(Record inputRecord,
AnyMap parameters,
TaskLog taskLog)
throws WebCrawlerException
- Specified by:
extractLinks in interface LinkExtractor
- Parameters:
inputRecord - input record with contentparameters - configuration parameters, may be null.taskLog - log facility provided by worker frame.
- Returns:
- for each extracted link a new record is created
- Throws:
WebCrawlerException
getAbsoluteUri
public java.lang.String getAbsoluteUri(java.lang.String baseUri,
java.lang.String uri)
throws URIException
- Returns:
- absolute URI from given URI by using base URI.
- Throws:
URIException
setLinkExtractorHtml
public void setLinkExtractorHtml(LinkExtractorHtml linkExtractorHtml)
- sets the HTML extractor implementation to use.