Modifier and Type | Field and Description |
---|---|
static java.lang.String |
INPUT_SLOT_LINKS_TO_CRAWL
name of input slot containing the links to crawl.
|
static java.lang.String |
NAME
Name of the worker, used in worker description and workflows.
|
static java.lang.String |
OUTPUT_SLOT_CRAWLED_RECORDS
name of input slot containing the crawled records.
|
static java.lang.String |
OUTPUT_SLOT_LINKS_TO_CRAWL
name of output slot containing the links to crawl.
|
Constructor and Description |
---|
WebCrawlerWorker() |
Modifier and Type | Method and Description |
---|---|
static java.lang.String |
getMimeType(Record record)
get MIME Type from record.
|
java.lang.String |
getName() |
void |
perform(TaskContext taskContext)
Performs a computation on the data available in the
TaskContext , such as a task for this worker, input and
(if configured) output slots. |
void |
setCompoundExtractor(CompoundExtractor compoundExtractor)
DS service reference injection method.
|
void |
setFetcher(Fetcher fetcher)
DS service reference injection method.
|
void |
setLinkExtractor(LinkExtractor linkExtractor)
DS service reference injection method.
|
void |
setLinkFilter(LinkFilter linkFilter)
DS service reference injection method.
|
void |
setRecordProducer(RecordProducer recordProducer)
DS service reference injection method.
|
void |
setVisitedLinks(VisitedLinksService visitedLinks)
DS service reference injection method.
|
void |
unsetCompoundExtractor(CompoundExtractor compoundExtractor)
DS service reference removal method.
|
void |
unsetFetcher(Fetcher fetcher)
DS service reference removal method.
|
void |
unsetLinkExtractor(LinkExtractor linkExtractor)
DS service reference removal method.
|
void |
unsetLinkFilter(LinkFilter linkFilter)
DS service reference removal method.
|
void |
unsetRecordProducer(RecordProducer recordProducer)
DS service reference removal method.
|
void |
unsetVisitedLinks(VisitedLinksService visitedLinks)
DS service reference removal method.
|
public static final java.lang.String NAME
public static final java.lang.String INPUT_SLOT_LINKS_TO_CRAWL
public static final java.lang.String OUTPUT_SLOT_LINKS_TO_CRAWL
public static final java.lang.String OUTPUT_SLOT_CRAWLED_RECORDS
public static java.lang.String getMimeType(Record record)
public java.lang.String getName()
public void perform(TaskContext taskContext) throws java.lang.Exception
Worker
TaskContext
, such as a task for this worker, input and
(if configured) output slots. An implementor must make sure, calls to this method must be thread-safe!public void setVisitedLinks(VisitedLinksService visitedLinks)
public void unsetVisitedLinks(VisitedLinksService visitedLinks)
public void setFetcher(Fetcher fetcher)
public void unsetFetcher(Fetcher fetcher)
public void setLinkExtractor(LinkExtractor linkExtractor)
public void unsetLinkExtractor(LinkExtractor linkExtractor)
public void setLinkFilter(LinkFilter linkFilter)
public void unsetLinkFilter(LinkFilter linkFilter)
public void setRecordProducer(RecordProducer recordProducer)
public void unsetRecordProducer(RecordProducer recordProducer)
public void setCompoundExtractor(CompoundExtractor compoundExtractor)
public void unsetCompoundExtractor(CompoundExtractor compoundExtractor)