|
SMILA 1.0 API documentation | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.eclipse.smila.importing.compounds.ExtractorWorkerBase
org.eclipse.smila.importing.crawler.web.WebExtractorWorker
public class WebExtractorWorker
Compound extractor worker to use in web crawling workflows.
| Field Summary | |
|---|---|
static java.lang.String |
NAME
name of worker. |
| Constructor Summary | |
|---|---|
WebExtractorWorker()
|
|
| Method Summary | |
|---|---|
protected Record |
convertRecord(Record compoundRecord,
Record extractedRecord,
TaskContext taskContext)
create a record from the extracted record that conforms to the records produced by the matching crawler. |
protected boolean |
filterRecord(Record record,
TaskContext taskContext)
Filters applied to extracted records: urlPatterns (to the name of the extracted file). |
protected ContentFetcher |
getContentFetcher()
get a content fetcher for the data source type. |
java.lang.String |
getName()
|
protected java.util.Iterator<Record> |
invokeExtractor(CompoundExtractor extractor,
Record compoundRecord,
java.io.InputStream compoundContent,
TaskContext taskContext)
invoke extractor with data from the crawled record. |
protected void |
mapRecord(Record record,
TaskContext taskContext)
Hook for subclasses to support mapping of the converted record according to mapping rules. |
void |
setFetcher(Fetcher fetcher)
DS service reference injection method. |
void |
unsetFetcher(Fetcher fetcher)
DS service reference removal method. |
| Methods inherited from class org.eclipse.smila.importing.compounds.ExtractorWorkerBase |
|---|
concatAttributeValues, copyAttachment, copyAttribute, copyCompoundAttributes, copySetToStringAttribute, perform, setCompoundExtractor, unsetCompoundExtractor |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final java.lang.String NAME
| Constructor Detail |
|---|
public WebExtractorWorker()
| Method Detail |
|---|
public java.lang.String getName()
protected java.util.Iterator<Record> invokeExtractor(CompoundExtractor extractor,
Record compoundRecord,
java.io.InputStream compoundContent,
TaskContext taskContext)
throws CompoundExtractorException
invokeExtractor in class ExtractorWorkerBaseCompoundExtractorException
protected Record convertRecord(Record compoundRecord,
Record extractedRecord,
TaskContext taskContext)
convertRecord in class ExtractorWorkerBase
protected boolean filterRecord(Record record,
TaskContext taskContext)
filterRecord in class ExtractorWorkerBaserecord - the record to checktaskContext - the task context containing the task parameters
true if the record passes the filter(s), false if not.
protected void mapRecord(Record record,
TaskContext taskContext)
mapRecord in class ExtractorWorkerBaserecord - the RecordtaskContext - the TaskContextprotected ContentFetcher getContentFetcher()
getContentFetcher in class ExtractorWorkerBasepublic void setFetcher(Fetcher fetcher)
public void unsetFetcher(Fetcher fetcher)
|
SMILA 1.0 API documentation | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||