SMILA 1.0 API documentation

org.eclipse.smila.importing.crawler.file
Class FileExtractorWorker

java.lang.Object
  extended by org.eclipse.smila.importing.compounds.ExtractorWorkerBase
      extended by org.eclipse.smila.importing.crawler.file.FileExtractorWorker
All Implemented Interfaces:
Worker

public class FileExtractorWorker
extends ExtractorWorkerBase

Compound extractor worker to use in file crawling workflows.


Field Summary
static java.lang.String NAME
          name of worker.
 
Constructor Summary
FileExtractorWorker()
           
 
Method Summary
protected  Record convertRecord(Record compoundRecord, Record extractedRecord, TaskContext taskContext)
          create a record from the extracted record that conforms to the records produced by the matching crawler.
protected  boolean filterRecord(Record record, TaskContext taskContext)
          Filter extracted records.
protected  ContentFetcher getContentFetcher()
          get a content fetcher for the data source type.
 java.lang.String getName()
           
protected  java.util.Iterator<Record> invokeExtractor(CompoundExtractor extractor, Record compoundRecord, java.io.InputStream compoundContent, TaskContext taskContext)
          invoke extractor with data from the crawled record.
 void setFileCrawlerService(FileCrawlerService fileCrawler)
          DS service reference bind method.
 void unsetFileCrawlerService(FileCrawlerService fileCrawler)
          DS service reference unbind method.
 
Methods inherited from class org.eclipse.smila.importing.compounds.ExtractorWorkerBase
concatAttributeValues, copyAttachment, copyAttribute, copyCompoundAttributes, copySetToStringAttribute, mapRecord, perform, setCompoundExtractor, unsetCompoundExtractor
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NAME

public static final java.lang.String NAME
name of worker.

See Also:
Constant Field Values
Constructor Detail

FileExtractorWorker

public FileExtractorWorker()
Method Detail

getName

public java.lang.String getName()
Returns:
the name of the worker. The worker function will be executed for tasks tied to this worker name.

invokeExtractor

protected java.util.Iterator<Record> invokeExtractor(CompoundExtractor extractor,
                                                     Record compoundRecord,
                                                     java.io.InputStream compoundContent,
                                                     TaskContext taskContext)
                                              throws CompoundExtractorException
Description copied from class: ExtractorWorkerBase
invoke extractor with data from the crawled record.

Specified by:
invokeExtractor in class ExtractorWorkerBase
Throws:
CompoundExtractorException

convertRecord

protected Record convertRecord(Record compoundRecord,
                               Record extractedRecord,
                               TaskContext taskContext)
Description copied from class: ExtractorWorkerBase
create a record from the extracted record that conforms to the records produced by the matching crawler.

Specified by:
convertRecord in class ExtractorWorkerBase

filterRecord

protected boolean filterRecord(Record record,
                               TaskContext taskContext)
Filter extracted records. Filters applied to extracted records:

Overrides:
filterRecord in class ExtractorWorkerBase
Parameters:
record - the record to check
taskContext - the task context containing the task parameters
Returns:
true if the record passes the filter(s), false if not.

getContentFetcher

protected ContentFetcher getContentFetcher()
Description copied from class: ExtractorWorkerBase
get a content fetcher for the data source type.

Specified by:
getContentFetcher in class ExtractorWorkerBase

setFileCrawlerService

public void setFileCrawlerService(FileCrawlerService fileCrawler)
DS service reference bind method.


unsetFileCrawlerService

public void unsetFileCrawlerService(FileCrawlerService fileCrawler)
DS service reference unbind method.


SMILA 1.0 API documentation