SMILA 1.0 API documentation

org.eclipse.smila.importing.crawler.web.extractor
Class SimpleLinkExtractor

java.lang.Object
  extended by org.eclipse.smila.importing.crawler.web.extractor.SimpleLinkExtractor
All Implemented Interfaces:
LinkExtractor

public class SimpleLinkExtractor
extends java.lang.Object
implements LinkExtractor

Simple LinkExtractor implementation using an HTML extractor.


Constructor Summary
SimpleLinkExtractor()
           
 
Method Summary
 java.util.Collection<Record> extractLinks(Record inputRecord, AnyMap parameters, TaskLog taskLog)
           
 java.lang.String getAbsoluteUri(java.lang.String baseUri, java.lang.String uri)
           
 void setLinkExtractorHtml(LinkExtractorHtml linkExtractorHtml)
          sets the HTML extractor implementation to use.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleLinkExtractor

public SimpleLinkExtractor()
Method Detail

extractLinks

public java.util.Collection<Record> extractLinks(Record inputRecord,
                                                 AnyMap parameters,
                                                 TaskLog taskLog)
                                          throws WebCrawlerException
Specified by:
extractLinks in interface LinkExtractor
Parameters:
inputRecord - input record with content
parameters - configuration parameters, may be null.
taskLog - log facility provided by worker frame.
Returns:
for each extracted link a new record is created
Throws:
WebCrawlerException

getAbsoluteUri

public java.lang.String getAbsoluteUri(java.lang.String baseUri,
                                       java.lang.String uri)
                                throws URIException
Returns:
absolute URI from given URI by using base URI.
Throws:
URIException

setLinkExtractorHtml

public void setLinkExtractorHtml(LinkExtractorHtml linkExtractorHtml)
sets the HTML extractor implementation to use.


SMILA 1.0 API documentation