org.eclipse.smila.importing.crawler.web
Class WebCrawlingContext
java.lang.Object
org.eclipse.smila.importing.crawler.web.WebCrawlingContext
public class WebCrawlingContext
- extends java.lang.Object
Context holding information needed throughout most of the web crawling process like mapper, filter confiruration etc.
for one task
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
WebCrawlingContext
public WebCrawlingContext(TaskContext taskContext)
- creates a crawling context from the taskContext.
getMapper
public PropertyNameMapper getMapper()
- Returns:
- the _mapper
getFilterConfiguration
public FilterConfiguration getFilterConfiguration()
- Returns:
- the _filterConfiguration
getTaskLog
public TaskLog getTaskLog()
- Returns:
- the _taskLog
getTaskParameters
public AnyMap getTaskParameters()
- Returns:
- the _parameters
getTaskContext
public TaskContext getTaskContext()
- Returns:
- the _taskContext
getSource
public java.lang.String getSource()
- Returns:
- the _source
getJobRunId
public java.lang.String getJobRunId()
- Returns:
- the _jobRunId
getCurrentInputBulkId
public java.lang.String getCurrentInputBulkId()
- Returns:
- the _jobRunId
setCurrentInputBulkId
public void setCurrentInputBulkId(java.lang.String inputBulkId)
- Parameters:
inputBulkId -
getVisitedUrls
public java.util.Set<java.lang.String> getVisitedUrls()
- Returns:
- the visited urls of this task
getLinksPerBulk
public int getLinksPerBulk()
- Returns:
- the number of links per output bulk.
getExtractedUrls
public java.util.Set<java.lang.String> getExtractedUrls()
- Returns:
- set in which the Urls are collected that have been written to linksToCrawl in this task, for duplicate
prevention.