SMILA 1.0 API documentation

org.eclipse.smila.importing.crawler.web.filter
Class SimpleLinkFilter

java.lang.Object
  extended by org.eclipse.smila.importing.crawler.web.filter.SimpleLinkFilter
All Implemented Interfaces:
LinkFilter

public class SimpleLinkFilter
extends java.lang.Object
implements LinkFilter

Simple example implementation:

Also removes duplicates with exactly the same URL.


Constructor Summary
SimpleLinkFilter()
           
 
Method Summary
 java.util.Collection<Record> filterLinks(java.util.Collection<Record> extractedLinks, Record sourceLink, AnyMap parameters, TaskLog taskLog)
          filter extracted links.
protected static java.lang.String getHost(java.lang.String urlString, TaskLog log)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleLinkFilter

public SimpleLinkFilter()
Method Detail

filterLinks

public java.util.Collection<Record> filterLinks(java.util.Collection<Record> extractedLinks,
                                                Record sourceLink,
                                                AnyMap parameters,
                                                TaskLog taskLog)
                                         throws WebCrawlerException
Description copied from interface: LinkFilter
filter extracted links.

Specified by:
filterLinks in interface LinkFilter
Parameters:
extractedLinks - result from LinkExtractor service.
sourceLink - record from which links where extracted.
parameters - task parameters, can configure the operation.
taskLog - log facility provided by WorkerManager.
Returns:
links to follow in follow-up tasks
Throws:
WebCrawlerException - error in processing the links.

getHost

protected static java.lang.String getHost(java.lang.String urlString,
                                          TaskLog log)
Returns:
host part of URL in link record.

SMILA 1.0 API documentation