SMILA 1.0 API documentation

org.eclipse.smila.importing.state.objectstore
Class ObjectStoreVisitedLinksService

java.lang.Object
  extended by org.eclipse.smila.importing.state.objectstore.ObjectStoreVisitedLinksService
All Implemented Interfaces:
VisitedLinksService

public class ObjectStoreVisitedLinksService
extends java.lang.Object
implements VisitedLinksService

ObjectStore based implementation of the VisitedLinksService for the jobmanager based importing framework.

Author:
stuc07

Field Summary
static java.lang.String BUNDLE_ID
          bundle ID for configuration area access.
static java.lang.String STORENAME
          objectstore store name.
 
Constructor Summary
ObjectStoreVisitedLinksService()
           
 
Method Summary
protected  void activate(ComponentContext context)
          service activation.
 boolean checkAndMarkVisited(java.lang.String sourceId, java.lang.String url, java.lang.String jobRunId, java.lang.String inputBulkId)
          Determines if the link was already visited for this sourceId.
 void clearAll()
          delete all state information in the service about all data sources.
 void clearSource(java.lang.String sourceId)
          delete all state information in the service about the given data source.
 long countEntries(java.lang.String sourceId, boolean countExact)
           
protected  void deactivate(ComponentContext context)
          service deactivation.
 java.util.Collection<java.lang.String> getSourceIds()
          get Ids of all sources that currently have entries in the VisitedLinksService.
 boolean isVisited(java.lang.String sourceId, java.lang.String url, java.lang.String jobRunId)
          Determines if the link was already visited for this sourceId in the same job run or not.
 void markAsVisited(java.lang.String sourceId, java.lang.String url, java.lang.String jobRunId, java.lang.String inputBulkId)
          Mark the link as visited in the current crawl job run.
 void setObjectStore(ObjectStoreService objectStore)
          used by DS to set service reference.
 void unsetObjectStore(ObjectStoreService objectStore)
          used by DS to remove service reference.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

BUNDLE_ID

public static final java.lang.String BUNDLE_ID
bundle ID for configuration area access.

See Also:
Constant Field Values

STORENAME

public static final java.lang.String STORENAME
objectstore store name.

See Also:
Constant Field Values
Constructor Detail

ObjectStoreVisitedLinksService

public ObjectStoreVisitedLinksService()
Method Detail

activate

protected void activate(ComponentContext context)
service activation.


deactivate

protected void deactivate(ComponentContext context)
service deactivation.


checkAndMarkVisited

public boolean checkAndMarkVisited(java.lang.String sourceId,
                                   java.lang.String url,
                                   java.lang.String jobRunId,
                                   java.lang.String inputBulkId)
                            throws VisitedLinksException
Description copied from interface: VisitedLinksService
Determines if the link was already visited for this sourceId. If not, the link is marked as visited.

Specified by:
checkAndMarkVisited in interface VisitedLinksService
Parameters:
sourceId - the name of the data source that contains the link.
url - the link to check, e.g. an URL.
jobRunId - the current job run id in which the crawler is running.
inputBulkId - the id of the inputBulk where the URL to check originates from.
Returns:
true if the URL was already visited for this sourceId, false otherwise
Throws:
VisitedLinksException

isVisited

public boolean isVisited(java.lang.String sourceId,
                         java.lang.String url,
                         java.lang.String jobRunId)
                  throws VisitedLinksException
Description copied from interface: VisitedLinksService
Determines if the link was already visited for this sourceId in the same job run or not.

Specified by:
isVisited in interface VisitedLinksService
Parameters:
sourceId - the name of the data source that contains the link.
url - the link to check, e.g. an URL.
jobRunId - the current job run id in which the crawler is running.
Returns:
true if the URL was already visited for this sourceId in the same job run, false otherwise
Throws:
VisitedLinksException

markAsVisited

public void markAsVisited(java.lang.String sourceId,
                          java.lang.String url,
                          java.lang.String jobRunId,
                          java.lang.String inputBulkId)
                   throws VisitedLinksException
Description copied from interface: VisitedLinksService
Mark the link as visited in the current crawl job run.

Specified by:
markAsVisited in interface VisitedLinksService
Parameters:
sourceId - the name of the data source that contains the link.
url - the link to mark, e.g. an URL.
jobRunId - the current job run id in which the crawler is running.
inputBulkId - the id of the inputBulk where the URL to mark originates from.
Throws:
VisitedLinksException

clearSource

public void clearSource(java.lang.String sourceId)
                 throws VisitedLinksException
Description copied from interface: VisitedLinksService
delete all state information in the service about the given data source.

Specified by:
clearSource in interface VisitedLinksService
Parameters:
sourceId - data source name.
Throws:
VisitedLinksException

clearAll

public void clearAll()
              throws VisitedLinksException
Description copied from interface: VisitedLinksService
delete all state information in the service about all data sources.

Specified by:
clearAll in interface VisitedLinksService
Throws:
VisitedLinksException

getSourceIds

public java.util.Collection<java.lang.String> getSourceIds()
                                                    throws VisitedLinksException
Description copied from interface: VisitedLinksService
get Ids of all sources that currently have entries in the VisitedLinksService.

Specified by:
getSourceIds in interface VisitedLinksService
Throws:
VisitedLinksException

countEntries

public long countEntries(java.lang.String sourceId,
                         boolean countExact)
                  throws DeltaException
Specified by:
countEntries in interface VisitedLinksService
countExact - set to true to get an exact reault, but this may take some time. Else the service may return only an estimated value.
Returns:
number of entries for given source id.
Throws:
DeltaException

setObjectStore

public void setObjectStore(ObjectStoreService objectStore)
used by DS to set service reference.


unsetObjectStore

public void unsetObjectStore(ObjectStoreService objectStore)
used by DS to remove service reference.


SMILA 1.0 API documentation