SMILA 1.0 API documentation

org.eclipse.smila.importing.crawler.web
Class WebCrawlingContext

java.lang.Object
  extended by org.eclipse.smila.importing.crawler.web.WebCrawlingContext

public class WebCrawlingContext
extends java.lang.Object

Context holding information needed throughout most of the web crawling process like mapper, filter confiruration etc. for one task


Constructor Summary
WebCrawlingContext(TaskContext taskContext)
          creates a crawling context from the taskContext.
 
Method Summary
 java.lang.String getCurrentInputBulkId()
           
 java.util.Set<java.lang.String> getExtractedUrls()
           
 FilterConfiguration getFilterConfiguration()
           
 java.lang.String getJobRunId()
           
 int getLinksPerBulk()
           
 PropertyNameMapper getMapper()
           
 java.lang.String getSource()
           
 TaskContext getTaskContext()
           
 TaskLog getTaskLog()
           
 AnyMap getTaskParameters()
           
 java.util.Set<java.lang.String> getVisitedUrls()
           
 void setCurrentInputBulkId(java.lang.String inputBulkId)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WebCrawlingContext

public WebCrawlingContext(TaskContext taskContext)
creates a crawling context from the taskContext.

Method Detail

getMapper

public PropertyNameMapper getMapper()
Returns:
the _mapper

getFilterConfiguration

public FilterConfiguration getFilterConfiguration()
Returns:
the _filterConfiguration

getTaskLog

public TaskLog getTaskLog()
Returns:
the _taskLog

getTaskParameters

public AnyMap getTaskParameters()
Returns:
the _parameters

getTaskContext

public TaskContext getTaskContext()
Returns:
the _taskContext

getSource

public java.lang.String getSource()
Returns:
the _source

getJobRunId

public java.lang.String getJobRunId()
Returns:
the _jobRunId

getCurrentInputBulkId

public java.lang.String getCurrentInputBulkId()
Returns:
the _jobRunId

setCurrentInputBulkId

public void setCurrentInputBulkId(java.lang.String inputBulkId)
Parameters:
inputBulkId -

getVisitedUrls

public java.util.Set<java.lang.String> getVisitedUrls()
Returns:
the visited urls of this task

getLinksPerBulk

public int getLinksPerBulk()
Returns:
the number of links per output bulk.

getExtractedUrls

public java.util.Set<java.lang.String> getExtractedUrls()
Returns:
set in which the Urls are collected that have been written to linksToCrawl in this task, for duplicate prevention.

SMILA 1.0 API documentation