SMILA 1.0 API documentation

org.eclipse.smila.importing.compounds
Class ExtractorWorkerBase

java.lang.Object
  extended by org.eclipse.smila.importing.compounds.ExtractorWorkerBase
All Implemented Interfaces:
Worker
Direct Known Subclasses:
FileExtractorWorker, WebExtractorWorker

public abstract class ExtractorWorkerBase
extends java.lang.Object
implements Worker

base implementation for workers doing compound extraction. Subclasses must provide a ContentFetcher implementation and a method that converts the records produced by the CompoundExtractor to records that are compatible with the associated crawler worker.


Constructor Summary
ExtractorWorkerBase()
           
 
Method Summary
protected  void concatAttributeValues(Record sourceRecord, java.lang.String sourceAttribute, Record targetRecord, java.lang.String targetAttribute, java.lang.String separator)
          utility method for subclasses: concat a source attribute value to a target attribute string value.
protected abstract  Record convertRecord(Record compoundRecord, Record extractedRecord, TaskContext taskContext)
          create a record from the extracted record that conforms to the records produced by the matching crawler.
protected  void copyAttachment(Record sourceRecord, Record targetRecord, java.lang.String attachmentName)
          utility method for subclasses: copy attachment from sourceRecord to targetRecord, if it exists.
protected  void copyAttribute(Record sourceRecord, java.lang.String sourceAttribute, Record targetRecord, java.lang.String targetAttribute)
          utility method for subclasses: copy an attribute if it exists.
protected  void copyCompoundAttributes(Record compoundRecord, Record extractedRecord, Record convertedRecord)
          add compound related system attributes to the converted record.
protected  void copySetToStringAttribute(Record sourceRecord, java.lang.String sourceAttribute, Record targetRecord, java.lang.String targetAttribute, java.lang.String separator)
          utility method for subclasses: copy a set attribute to a plain string attribute.
protected  boolean filterRecord(Record record, TaskContext taskContext)
          Filter extracted records.
protected abstract  ContentFetcher getContentFetcher()
          get a content fetcher for the data source type.
protected abstract  java.util.Iterator<Record> invokeExtractor(CompoundExtractor extractor, Record compoundRecord, java.io.InputStream compoundContent, TaskContext taskContext)
          invoke extractor with data from the crawled record.
protected  void mapRecord(Record record, TaskContext taskContext)
          Hook for subclasses to support mapping of the converted record according to mapping rules.
 void perform(TaskContext taskContext)
          Performs a computation on the data available in the TaskContext, such as a task for this worker, input and (if configured) output slots.
 void setCompoundExtractor(CompoundExtractor extractor)
          DS service reference bind method.
 void unsetCompoundExtractor(CompoundExtractor extractor)
          DS service reference unbind method.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.eclipse.smila.taskworker.Worker
getName
 

Constructor Detail

ExtractorWorkerBase

public ExtractorWorkerBase()
Method Detail

perform

public void perform(TaskContext taskContext)
             throws java.lang.Exception
Description copied from interface: Worker
Performs a computation on the data available in the TaskContext, such as a task for this worker, input and (if configured) output slots. An implementor must make sure, calls to this method must be thread-safe!

Specified by:
perform in interface Worker
Parameters:
taskContext - the TaskContext information with which this operation can be performed.
Throws:
java.lang.Exception

mapRecord

protected void mapRecord(Record record,
                         TaskContext taskContext)
Hook for subclasses to support mapping of the converted record according to mapping rules.

Parameters:
record - the Record
taskContext - the TaskContext

filterRecord

protected boolean filterRecord(Record record,
                               TaskContext taskContext)
Filter extracted records.

Parameters:
record - the record to check
taskContext - the task context containing the task parameters
Returns:
true if the record passes the filter(s), false if not.

invokeExtractor

protected abstract java.util.Iterator<Record> invokeExtractor(CompoundExtractor extractor,
                                                              Record compoundRecord,
                                                              java.io.InputStream compoundContent,
                                                              TaskContext taskContext)
                                                       throws CompoundExtractorException
invoke extractor with data from the crawled record.

Throws:
CompoundExtractorException

convertRecord

protected abstract Record convertRecord(Record compoundRecord,
                                        Record extractedRecord,
                                        TaskContext taskContext)
create a record from the extracted record that conforms to the records produced by the matching crawler.


getContentFetcher

protected abstract ContentFetcher getContentFetcher()
get a content fetcher for the data source type.


copyAttachment

protected void copyAttachment(Record sourceRecord,
                              Record targetRecord,
                              java.lang.String attachmentName)
utility method for subclasses: copy attachment from sourceRecord to targetRecord, if it exists.


copyAttribute

protected void copyAttribute(Record sourceRecord,
                             java.lang.String sourceAttribute,
                             Record targetRecord,
                             java.lang.String targetAttribute)
utility method for subclasses: copy an attribute if it exists.


copySetToStringAttribute

protected void copySetToStringAttribute(Record sourceRecord,
                                        java.lang.String sourceAttribute,
                                        Record targetRecord,
                                        java.lang.String targetAttribute,
                                        java.lang.String separator)
utility method for subclasses: copy a set attribute to a plain string attribute.


concatAttributeValues

protected void concatAttributeValues(Record sourceRecord,
                                     java.lang.String sourceAttribute,
                                     Record targetRecord,
                                     java.lang.String targetAttribute,
                                     java.lang.String separator)
utility method for subclasses: concat a source attribute value to a target attribute string value.


copyCompoundAttributes

protected void copyCompoundAttributes(Record compoundRecord,
                                      Record extractedRecord,
                                      Record convertedRecord)
add compound related system attributes to the converted record.


setCompoundExtractor

public void setCompoundExtractor(CompoundExtractor extractor)
DS service reference bind method.


unsetCompoundExtractor

public void unsetCompoundExtractor(CompoundExtractor extractor)
DS service reference unbind method.


SMILA 1.0 API documentation