public abstract class ExtractorWorkerBase extends java.lang.Object implements Worker
ContentFetcher
implementation and a method that converts the records produced by the CompoundExtractor
to records that are
compatible with the associated crawler worker.Constructor and Description |
---|
ExtractorWorkerBase() |
Modifier and Type | Method and Description |
---|---|
protected void |
concatAttributeValues(Record sourceRecord,
java.lang.String sourceAttribute,
Record targetRecord,
java.lang.String targetAttribute,
java.lang.String separator)
utility method for subclasses: concat a source attribute value to a target attribute string value.
|
protected abstract Record |
convertRecord(Record compoundRecord,
Record extractedRecord,
TaskContext taskContext)
create a record from the extracted record that conforms to the records produced by the matching crawler.
|
protected void |
copyAttachment(Record sourceRecord,
Record targetRecord,
java.lang.String attachmentName)
utility method for subclasses: copy attachment from sourceRecord to targetRecord, if it exists.
|
protected void |
copyAttribute(Record sourceRecord,
java.lang.String sourceAttribute,
Record targetRecord,
java.lang.String targetAttribute)
utility method for subclasses: copy an attribute if it exists.
|
protected void |
copyCompoundAttributes(Record compoundRecord,
Record extractedRecord,
Record convertedRecord)
add compound related system attributes to the converted record.
|
protected void |
copySetToStringAttribute(Record sourceRecord,
java.lang.String sourceAttribute,
Record targetRecord,
java.lang.String targetAttribute,
java.lang.String separator)
utility method for subclasses: copy a set attribute to a plain string attribute.
|
protected boolean |
filterRecord(Record record,
TaskContext taskContext)
Filter extracted records.
|
protected abstract ContentFetcher |
getContentFetcher()
get a content fetcher for the data source type.
|
protected abstract java.util.Iterator<Record> |
invokeExtractor(CompoundExtractor extractor,
Record compoundRecord,
java.io.InputStream compoundContent,
TaskContext taskContext)
invoke extractor with data from the crawled record.
|
protected void |
mapRecord(Record record,
TaskContext taskContext)
Hook for subclasses to support mapping of the converted record according to mapping rules.
|
void |
perform(TaskContext taskContext)
Performs a computation on the data available in the
TaskContext , such as a task for this worker, input and
(if configured) output slots. |
void |
setCompoundExtractor(CompoundExtractor extractor)
DS service reference bind method.
|
void |
unsetCompoundExtractor(CompoundExtractor extractor)
DS service reference unbind method.
|
public void perform(TaskContext taskContext) throws java.lang.Exception
Worker
TaskContext
, such as a task for this worker, input and
(if configured) output slots. An implementor must make sure, calls to this method must be thread-safe!protected void mapRecord(Record record, TaskContext taskContext)
record
- the Record
taskContext
- the TaskContext
protected boolean filterRecord(Record record, TaskContext taskContext)
record
- the record to checktaskContext
- the task context containing the task parameterstrue
if the record passes the filter(s), false
if not.protected abstract java.util.Iterator<Record> invokeExtractor(CompoundExtractor extractor, Record compoundRecord, java.io.InputStream compoundContent, TaskContext taskContext) throws CompoundExtractorException
CompoundExtractorException
protected abstract Record convertRecord(Record compoundRecord, Record extractedRecord, TaskContext taskContext)
protected abstract ContentFetcher getContentFetcher()
protected void copyAttachment(Record sourceRecord, Record targetRecord, java.lang.String attachmentName)
protected void copyAttribute(Record sourceRecord, java.lang.String sourceAttribute, Record targetRecord, java.lang.String targetAttribute)
protected void copySetToStringAttribute(Record sourceRecord, java.lang.String sourceAttribute, Record targetRecord, java.lang.String targetAttribute, java.lang.String separator)
protected void concatAttributeValues(Record sourceRecord, java.lang.String sourceAttribute, Record targetRecord, java.lang.String targetAttribute, java.lang.String separator)
protected void copyCompoundAttributes(Record compoundRecord, Record extractedRecord, Record convertedRecord)
public void setCompoundExtractor(CompoundExtractor extractor)
public void unsetCompoundExtractor(CompoundExtractor extractor)