SMILA 1.0 API documentation

org.eclipse.smila.importing.crawler.web
Class WebCrawlerWorker

java.lang.Object
  extended by org.eclipse.smila.importing.crawler.web.WebCrawlerWorker
All Implemented Interfaces:
Worker

public class WebCrawlerWorker
extends java.lang.Object
implements Worker

Worker for Web crawling.


Field Summary
static java.lang.String INPUT_SLOT_LINKS_TO_CRAWL
          name of input slot containing the links to crawl.
static java.lang.String NAME
          Name of the worker, used in worker description and workflows.
static java.lang.String OUTPUT_SLOT_CRAWLED_RECORDS
          name of input slot containing the crawled records.
static java.lang.String OUTPUT_SLOT_LINKS_TO_CRAWL
          name of output slot containing the links to crawl.
 
Constructor Summary
WebCrawlerWorker()
           
 
Method Summary
static java.lang.String getMimeType(Record record)
          get MIME Type from record.
 java.lang.String getName()
           
static java.lang.String getUrl(Record record)
          get URL from record.
 void perform(TaskContext taskContext)
          Performs a computation on the data available in the TaskContext, such as a task for this worker, input and (if configured) output slots.
 void setCompoundExtractor(CompoundExtractor compoundExtractor)
          DS service reference injection method.
 void setFetcher(Fetcher fetcher)
          DS service reference injection method.
 void setLinkExtractor(LinkExtractor linkExtractor)
          DS service reference injection method.
 void setLinkFilter(LinkFilter linkFilter)
          DS service reference injection method.
 void setRecordProducer(RecordProducer recordProducer)
          DS service reference injection method.
 void setVisitedLinks(VisitedLinksService visitedLinks)
          DS service reference injection method.
 void unsetCompoundExtractor(CompoundExtractor compoundExtractor)
          DS service reference removal method.
 void unsetFetcher(Fetcher fetcher)
          DS service reference removal method.
 void unsetLinkExtractor(LinkExtractor linkExtractor)
          DS service reference removal method.
 void unsetLinkFilter(LinkFilter linkFilter)
          DS service reference removal method.
 void unsetRecordProducer(RecordProducer recordProducer)
          DS service reference removal method.
 void unsetVisitedLinks(VisitedLinksService visitedLinks)
          DS service reference removal method.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NAME

public static final java.lang.String NAME
Name of the worker, used in worker description and workflows.

See Also:
Constant Field Values

INPUT_SLOT_LINKS_TO_CRAWL

public static final java.lang.String INPUT_SLOT_LINKS_TO_CRAWL
name of input slot containing the links to crawl.

See Also:
Constant Field Values

OUTPUT_SLOT_LINKS_TO_CRAWL

public static final java.lang.String OUTPUT_SLOT_LINKS_TO_CRAWL
name of output slot containing the links to crawl.

See Also:
Constant Field Values

OUTPUT_SLOT_CRAWLED_RECORDS

public static final java.lang.String OUTPUT_SLOT_CRAWLED_RECORDS
name of input slot containing the crawled records.

See Also:
Constant Field Values
Constructor Detail

WebCrawlerWorker

public WebCrawlerWorker()
Method Detail

getUrl

public static java.lang.String getUrl(Record record)
get URL from record.


getMimeType

public static java.lang.String getMimeType(Record record)
get MIME Type from record.


getName

public java.lang.String getName()
Specified by:
getName in interface Worker
Returns:
the name of the worker. The worker function will be executed for tasks tied to this worker name.

perform

public void perform(TaskContext taskContext)
             throws java.lang.Exception
Description copied from interface: Worker
Performs a computation on the data available in the TaskContext, such as a task for this worker, input and (if configured) output slots. An implementor must make sure, calls to this method must be thread-safe!

Specified by:
perform in interface Worker
Parameters:
taskContext - the TaskContext information with which this operation can be performed.
Throws:
java.lang.Exception

setVisitedLinks

public void setVisitedLinks(VisitedLinksService visitedLinks)
DS service reference injection method.


unsetVisitedLinks

public void unsetVisitedLinks(VisitedLinksService visitedLinks)
DS service reference removal method.


setFetcher

public void setFetcher(Fetcher fetcher)
DS service reference injection method.


unsetFetcher

public void unsetFetcher(Fetcher fetcher)
DS service reference removal method.


setLinkExtractor

public void setLinkExtractor(LinkExtractor linkExtractor)
DS service reference injection method.


unsetLinkExtractor

public void unsetLinkExtractor(LinkExtractor linkExtractor)
DS service reference removal method.


setLinkFilter

public void setLinkFilter(LinkFilter linkFilter)
DS service reference injection method.


unsetLinkFilter

public void unsetLinkFilter(LinkFilter linkFilter)
DS service reference removal method.


setRecordProducer

public void setRecordProducer(RecordProducer recordProducer)
DS service reference injection method.


unsetRecordProducer

public void unsetRecordProducer(RecordProducer recordProducer)
DS service reference removal method.


setCompoundExtractor

public void setCompoundExtractor(CompoundExtractor compoundExtractor)
DS service reference injection method.


unsetCompoundExtractor

public void unsetCompoundExtractor(CompoundExtractor compoundExtractor)
DS service reference removal method.


SMILA 1.0 API documentation