| Modifier and Type | Field and Description |
|---|---|
static java.lang.Long |
DIRS_PER_BULK_DEFAULT
default: at least 10 directories per follow-up task.
|
static java.lang.String |
INPUT_SLOT_DIRS_TO_CRAWL
name of input slot containing records with directories to crawl.
|
static java.lang.Long |
MAX_FILES_PER_BULK_DEFAULT
default: write up to 1000 files to one file bulk.
|
static java.lang.Long |
MIN_FILES_PER_BULK_DEFAULT
default: try to add at least 100 files from subdirectories, if current folder has too few files.
|
static java.lang.String |
NAME
Name of the worker, used in worker description and workflows.
|
static java.lang.String |
OUTPUT_SLOT_CRAWLED_RECORDS
name of output slot taking the file records to process in ETL.
|
static java.lang.String |
OUTPUT_SLOT_DIRS_TO_CRAWL
name of output slot taking the directories to crawl in follow-up tasks.
|
static java.lang.String |
TASK_PARAM_DIRS_PER_BULK
number of directories to write to one bulk object.
|
static java.lang.String |
TASK_PARAM_MAX_FILES_PER_BULK
Maximum number of files in one bulk object.
|
static java.lang.String |
TASK_PARAM_MIN_FILES_PER_BULK
Minimum number of files in one bulk object.
|
static java.lang.String |
TASK_PARAM_ROOT_FOLDER
Name of the task parameter that contains the root folder for crawling.
|
| Constructor and Description |
|---|
FileCrawlerWorker() |
| Modifier and Type | Method and Description |
|---|---|
java.lang.String |
getName() |
void |
perform(TaskContext taskContext)
Performs a computation on the data available in the
TaskContext, such as a task for this worker, input and
(if configured) output slots. |
void |
setCompoundExtractor(CompoundExtractor compoundExtractor)
DS service reference bind method.
|
void |
setFileCrawlerService(FileCrawlerService fileCrawler)
DS service reference bind method.
|
void |
setVisitedLinks(VisitedLinksService visitedLinks)
DS service reference bind method.
|
void |
unsetCompoundExtractor(CompoundExtractor compoundExtractor)
DS service reference unbind method.
|
void |
unsetFileCrawlerService(FileCrawlerService fileCrawler)
DS service reference unbind method.
|
void |
unsetVisitedLinks(VisitedLinksService visitedLinks)
DS service reference unbind method.
|
public static final java.lang.String NAME
public static final java.lang.String INPUT_SLOT_DIRS_TO_CRAWL
public static final java.lang.String OUTPUT_SLOT_DIRS_TO_CRAWL
public static final java.lang.String OUTPUT_SLOT_CRAWLED_RECORDS
public static final java.lang.String TASK_PARAM_ROOT_FOLDER
public static final java.lang.String TASK_PARAM_MAX_FILES_PER_BULK
public static final java.lang.String TASK_PARAM_MIN_FILES_PER_BULK
public static final java.lang.String TASK_PARAM_DIRS_PER_BULK
public static final java.lang.Long MAX_FILES_PER_BULK_DEFAULT
public static final java.lang.Long MIN_FILES_PER_BULK_DEFAULT
public static final java.lang.Long DIRS_PER_BULK_DEFAULT
public java.lang.String getName()
public void perform(TaskContext taskContext) throws java.lang.Exception
WorkerTaskContext, such as a task for this worker, input and
(if configured) output slots. An implementor must make sure, calls to this method must be thread-safe!public void setFileCrawlerService(FileCrawlerService fileCrawler)
public void unsetFileCrawlerService(FileCrawlerService fileCrawler)
public void setCompoundExtractor(CompoundExtractor compoundExtractor)
public void unsetCompoundExtractor(CompoundExtractor compoundExtractor)
public void setVisitedLinks(VisitedLinksService visitedLinks)
public void unsetVisitedLinks(VisitedLinksService visitedLinks)