SMILA 1.0 API documentation

org.eclipse.smila.importing.crawler.file
Class FileCrawlerWorker

java.lang.Object
  extended by org.eclipse.smila.importing.crawler.file.FileCrawlerWorker
All Implemented Interfaces:
Worker

public class FileCrawlerWorker
extends java.lang.Object
implements Worker

Worker implementation that performs file crawling.

Author:
stuc07

Field Summary
static java.lang.Long DIRS_PER_BULK_DEFAULT
          default: one directory per follow-up task.
static java.lang.String INPUT_SLOT_DIRS_TO_CRAWL
          name of input slot containing records with directories to crawl.
static java.lang.Long MAX_FILES_PER_BULK_DEFAULT
          default: write up to 1000 files to one file bulk.
static java.lang.Long MIN_FILES_PER_BULK_DEFAULT
          default: don't add files from subdirectories, if current folder has too few files.
static java.lang.String NAME
          Name of the worker, used in worker description and workflows.
static java.lang.String OUTPUT_SLOT_DIRS_TO_CRAWL
          name of output slot taking the directories to crawl in follow-up tasks.
static java.lang.String OUTPUT_SLOT_FILES_TO_CRAWL
          name of output slot taking the file records to process in ETL.
static java.lang.String TASK_PARAM_DIRS_PER_BULK
          number of directories to write to one bulk object.
static java.lang.String TASK_PARAM_MAX_FILES_PER_BULK
          Maximum number of files in one bulk object.
static java.lang.String TASK_PARAM_MIN_FILES_PER_BULK
          Minimum number of files in one bulk object.
static java.lang.String TASK_PARAM_ROOT_FOLDER
          Name of the task parameter that contains the root folder for crawling.
 
Constructor Summary
FileCrawlerWorker()
           
 
Method Summary
 java.lang.String getName()
           
 void perform(TaskContext taskContext)
          Performs a computation on the data available in the TaskContext, such as a task for this worker, input and (if configured) output slots.
 void setCompoundExtractor(CompoundExtractor compoundExtractor)
          DS service reference bind method.
 void setFileCrawlerService(FileCrawlerService fileCrawler)
          DS service reference bind method.
 void setVisitedLinks(VisitedLinksService visitedLinks)
          DS service reference bind method.
 void unsetCompoundExtractor(CompoundExtractor compoundExtractor)
          DS service reference unbind method.
 void unsetFileCrawlerService(FileCrawlerService fileCrawler)
          DS service reference unbind method.
 void unsetVisitedLinks(VisitedLinksService visitedLinks)
          DS service reference unbind method.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NAME

public static final java.lang.String NAME
Name of the worker, used in worker description and workflows.

See Also:
Constant Field Values

INPUT_SLOT_DIRS_TO_CRAWL

public static final java.lang.String INPUT_SLOT_DIRS_TO_CRAWL
name of input slot containing records with directories to crawl.

See Also:
Constant Field Values

OUTPUT_SLOT_DIRS_TO_CRAWL

public static final java.lang.String OUTPUT_SLOT_DIRS_TO_CRAWL
name of output slot taking the directories to crawl in follow-up tasks.

See Also:
Constant Field Values

OUTPUT_SLOT_FILES_TO_CRAWL

public static final java.lang.String OUTPUT_SLOT_FILES_TO_CRAWL
name of output slot taking the file records to process in ETL.

See Also:
Constant Field Values

TASK_PARAM_ROOT_FOLDER

public static final java.lang.String TASK_PARAM_ROOT_FOLDER
Name of the task parameter that contains the root folder for crawling.

See Also:
Constant Field Values

TASK_PARAM_MAX_FILES_PER_BULK

public static final java.lang.String TASK_PARAM_MAX_FILES_PER_BULK
Maximum number of files in one bulk object.

See Also:
Constant Field Values

TASK_PARAM_MIN_FILES_PER_BULK

public static final java.lang.String TASK_PARAM_MIN_FILES_PER_BULK
Minimum number of files in one bulk object.

See Also:
Constant Field Values

TASK_PARAM_DIRS_PER_BULK

public static final java.lang.String TASK_PARAM_DIRS_PER_BULK
number of directories to write to one bulk object.

See Also:
Constant Field Values

MAX_FILES_PER_BULK_DEFAULT

public static final java.lang.Long MAX_FILES_PER_BULK_DEFAULT
default: write up to 1000 files to one file bulk.


MIN_FILES_PER_BULK_DEFAULT

public static final java.lang.Long MIN_FILES_PER_BULK_DEFAULT
default: don't add files from subdirectories, if current folder has too few files.


DIRS_PER_BULK_DEFAULT

public static final java.lang.Long DIRS_PER_BULK_DEFAULT
default: one directory per follow-up task.

Constructor Detail

FileCrawlerWorker

public FileCrawlerWorker()
Method Detail

getName

public java.lang.String getName()
Specified by:
getName in interface Worker
Returns:
the name of the worker. The worker function will be executed for tasks tied to this worker name.

perform

public void perform(TaskContext taskContext)
             throws java.lang.Exception
Description copied from interface: Worker
Performs a computation on the data available in the TaskContext, such as a task for this worker, input and (if configured) output slots. An implementor must make sure, calls to this method must be thread-safe!

Specified by:
perform in interface Worker
Parameters:
taskContext - the TaskContext information with which this operation can be performed.
Throws:
java.lang.Exception

setFileCrawlerService

public void setFileCrawlerService(FileCrawlerService fileCrawler)
DS service reference bind method.


unsetFileCrawlerService

public void unsetFileCrawlerService(FileCrawlerService fileCrawler)
DS service reference unbind method.


setCompoundExtractor

public void setCompoundExtractor(CompoundExtractor compoundExtractor)
DS service reference bind method.


unsetCompoundExtractor

public void unsetCompoundExtractor(CompoundExtractor compoundExtractor)
DS service reference unbind method.


setVisitedLinks

public void setVisitedLinks(VisitedLinksService visitedLinks)
DS service reference bind method.


unsetVisitedLinks

public void unsetVisitedLinks(VisitedLinksService visitedLinks)
DS service reference unbind method.


SMILA 1.0 API documentation