SMILA 1.0 API documentation

org.eclipse.smila.importing.crawler.web.extractor
Class DefaultLinkExtractor

java.lang.Object
  extended by org.eclipse.smila.importing.crawler.web.extractor.DefaultLinkExtractor
All Implemented Interfaces:
LinkExtractor

public class DefaultLinkExtractor
extends java.lang.Object
implements LinkExtractor

Simple LinkExtractor implementation using an HTML extractor.


Constructor Summary
DefaultLinkExtractor()
           
 
Method Summary
 java.util.Collection<Record> extractLinks(Record inputRecord, WebCrawlingContext context)
           
 void setLinkExtractorHtml(LinkExtractorHtml linkExtractorHtml)
          sets the HTML extractor implementation to use.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DefaultLinkExtractor

public DefaultLinkExtractor()
Method Detail

extractLinks

public java.util.Collection<Record> extractLinks(Record inputRecord,
                                                 WebCrawlingContext context)
                                          throws WebCrawlerException
Specified by:
extractLinks in interface LinkExtractor
Parameters:
inputRecord - input record with content
context - the web crawling context
Returns:
for each extracted link a new record is created
Throws:
WebCrawlerException

setLinkExtractorHtml

public void setLinkExtractorHtml(LinkExtractorHtml linkExtractorHtml)
sets the HTML extractor implementation to use.


SMILA 1.0 API documentation