SMILA 1.0 API documentation

org.eclipse.smila.importing.crawler.web
Interface LinkFilter

All Known Implementing Classes:
DefaultLinkFilter

public interface LinkFilter

interface for LinkFilter services. The LinkFilter is called on the result of the LinkExtractor to select only those links that should really be followed in follow-up tasks.


Method Summary
 boolean allowLink(java.lang.String link, WebCrawlingContext context)
          Check if it is allowed to follow a given link.
 java.util.Collection<Record> filterLinks(java.util.Collection<Record> extractedLinks, WebCrawlingContext context)
          filter extracted links.
 

Method Detail

filterLinks

java.util.Collection<Record> filterLinks(java.util.Collection<Record> extractedLinks,
                                         WebCrawlingContext context)
                                         throws WebCrawlerException
filter extracted links.

Parameters:
extractedLinks - result from LinkExtractor service.
context - the WebCrawlingContext.
Returns:
links to follow in follow-up tasks
Throws:
WebCrawlerException - error in processing the links.

allowLink

boolean allowLink(java.lang.String link,
                  WebCrawlingContext context)
                  throws WebCrawlerException
Check if it is allowed to follow a given link.

Parameters:
link - a String containing the link to be checked
context - the WebCrawlingContext.
Returns:
true if the link is allowed to be followed, false otherwise
Throws:
WebCrawlerException - error in processing the links.

SMILA 1.0 API documentation