public class WebExtractorWorker extends ExtractorWorkerBase
| Modifier and Type | Field and Description |
|---|---|
static java.lang.String |
NAME
name of worker.
|
| Constructor and Description |
|---|
WebExtractorWorker() |
| Modifier and Type | Method and Description |
|---|---|
protected Record |
convertRecord(Record compoundRecord,
Record extractedRecord,
TaskContext taskContext)
create a record from the extracted record that conforms to the records produced by the matching crawler.
|
protected boolean |
filterRecord(Record record,
TaskContext taskContext)
Filters applied to extracted records:
urlPatterns (to the name of the extracted file).
|
protected ContentFetcher |
getContentFetcher()
get a content fetcher for the data source type.
|
java.lang.String |
getName() |
protected java.util.Iterator<Record> |
invokeExtractor(CompoundExtractor extractor,
Record compoundRecord,
java.io.InputStream compoundContent,
TaskContext taskContext)
invoke extractor with data from the crawled record.
|
protected void |
mapRecord(Record record,
TaskContext taskContext)
Hook for subclasses to support mapping of the converted record according to mapping rules.
|
void |
setFetcher(Fetcher fetcher)
DS service reference injection method.
|
void |
unsetFetcher(Fetcher fetcher)
DS service reference removal method.
|
concatAttributeValues, copyAttachment, copyAttribute, copyCompoundAttributes, copySetToStringAttribute, perform, setCompoundExtractor, unsetCompoundExtractorpublic static final java.lang.String NAME
public java.lang.String getName()
protected java.util.Iterator<Record> invokeExtractor(CompoundExtractor extractor, Record compoundRecord, java.io.InputStream compoundContent, TaskContext taskContext) throws CompoundExtractorException
invokeExtractor in class ExtractorWorkerBaseCompoundExtractorExceptionprotected Record convertRecord(Record compoundRecord, Record extractedRecord, TaskContext taskContext)
convertRecord in class ExtractorWorkerBaseprotected boolean filterRecord(Record record, TaskContext taskContext)
filterRecord in class ExtractorWorkerBaserecord - the record to checktaskContext - the task context containing the task parameterstrue if the record passes the filter(s), false if not.protected void mapRecord(Record record, TaskContext taskContext)
mapRecord in class ExtractorWorkerBaserecord - the RecordtaskContext - the TaskContextprotected ContentFetcher getContentFetcher()
getContentFetcher in class ExtractorWorkerBasepublic void setFetcher(Fetcher fetcher)
public void unsetFetcher(Fetcher fetcher)