|
SMILA 1.0 API documentation | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface DeltaService
Service interface for checking if a crawled record must be sent to the processing job.
Nested Class Summary | |
---|---|
static class |
DeltaService.EntryId
returned by getUnvisitedEntries(String, String) . |
Method Summary | |
---|---|
State |
checkState(java.lang.String sourceId,
java.lang.String recordId,
java.lang.String jobRunId,
java.lang.String hashCode)
Determine delta state of record identified by sourceId and recordId. |
State |
checkState(java.lang.String sourceId,
java.lang.String recordId,
java.lang.String compoundRecordId,
java.lang.String jobRunId,
java.lang.String hashCode)
Determine delta state of record identified by sourceId and recordId. |
void |
clearAll()
delete all state information in the service about all data sources. |
void |
clearSource(java.lang.String sourceId)
delete all state information in the service about the given data source. |
long |
countEntries(java.lang.String sourceId,
boolean countExact)
|
void |
deleteEntry(java.lang.String sourceId,
DeltaService.EntryId entryId)
remove an entry, e.g. after it has been deleted. |
java.util.Collection<java.lang.String> |
getShardPrefixes(java.lang.String sourceId)
get possible input values for #getRecordIdsToDelete(String) . |
java.util.Collection<java.lang.String> |
getSourceIds()
get Ids of all sources that currently have entries in the DeltaService. |
java.util.Collection<DeltaService.EntryId> |
getUnvisitedEntries(java.lang.String sourceAndShardPrefix,
java.lang.String jobRunId)
get the record IDs in the given data source and shard that have not been visited in the given job run and therefore must be sent as deleted records to the target job. |
void |
markAsUpdated(java.lang.String sourceId,
java.lang.String recordId,
java.lang.String jobRunId,
java.lang.String hashCode)
Mark the record as visited in the current crawl job run. |
void |
markAsUpdated(java.lang.String sourceId,
java.lang.String recordId,
java.lang.String compoundRecordId,
java.lang.String jobRunId,
java.lang.String hashCode)
Mark the record that was extracted from a compound as visited in the current crawl job run. |
void |
markCompoundElementsVisited(java.lang.String sourceId,
java.lang.String compoundRecordId,
java.lang.String jobRunId)
Set jobRunId of all elements of the given compound record, because the compound itself has not changed. |
Method Detail |
---|
State checkState(java.lang.String sourceId, java.lang.String recordId, java.lang.String jobRunId, java.lang.String hashCode) throws DeltaException
State.UPTODATE
the
service also marks the record as visited in the current crawl job run already, so there is no need to call
markAsUpdated(String, String, String, String)
afterwards. In the other cases the crawler should call
markAsUpdated(String, String, String, String)
only if the record is actually submitted to a processing
job.
sourceId
- the name of the data source that contains the record.recordId
- the record idjobRunId
- the current job run id in which the crawler is running.hashCode
- a string that reflects changes in the record content. This can be as simple as a version identifier if
such is available in record metadata, or even a hash calculated on the actual content of the record.
State
value.
DeltaException
State checkState(java.lang.String sourceId, java.lang.String recordId, java.lang.String compoundRecordId, java.lang.String jobRunId, java.lang.String hashCode) throws DeltaException
State.UPTODATE
the
service also marks the record as visited in the current crawl job run already, so there is no need to call
markAsUpdated(String, String, String, String)
afterwards. In the other cases the crawler should call
markAsUpdated(String, String, String, String)
only if the record is actually submitted to a processing
job.
sourceId
- the name of the data source that contains the record.recordId
- the record idcompoundRecordId
- the record id of the compound this record was extracted from. May be null.jobRunId
- the current job run id in which the crawler is running.hashCode
- a string that reflects changes in the record content. This can be as simple as a version identifier if
such is available in record metadata, or even a hash calculated on the actual content of the record.
State
value.
DeltaException
void markCompoundElementsVisited(java.lang.String sourceId, java.lang.String compoundRecordId, java.lang.String jobRunId) throws DeltaException
sourceId
- compoundRecordId
- jobRunId
-
DeltaException
void markAsUpdated(java.lang.String sourceId, java.lang.String recordId, java.lang.String jobRunId, java.lang.String hashCode) throws DeltaException
sourceId
- the name of the data source that contains the record.recordId
- the record idjobRunId
- the current job run id in which the crawler is running.hashCode
- a string that reflects changes in the record content. This can be as simple as a version identifier if
such is available in record metadata, or even a hash calculated on the actual content of the record.
DeltaException
void markAsUpdated(java.lang.String sourceId, java.lang.String recordId, java.lang.String compoundRecordId, java.lang.String jobRunId, java.lang.String hashCode) throws DeltaException
sourceId
- the name of the data source that contains the record.recordId
- the record idcompoundRecordId
- the record id of the compound this record was extracted from. May be null.jobRunId
- the current job run id in which the crawler is running.hashCode
- a string that reflects changes in the record content. This can be as simple as a version identifier if
such is available in record metadata, or even a hash calculated on the actual content of the record.
DeltaException
void clearSource(java.lang.String sourceId) throws DeltaException
sourceId
- data source name.
DeltaException
void clearAll() throws DeltaException
DeltaException
java.util.Collection<java.lang.String> getSourceIds() throws DeltaException
DeltaException
long countEntries(java.lang.String sourceId, boolean countExact) throws DeltaException
sourceId
- the name of the data source to examinecountExact
- set to true to get an exact reault, but this may take some time. Else the service may return only an
estimated value.
DeltaException
java.util.Collection<java.lang.String> getShardPrefixes(java.lang.String sourceId) throws DeltaException
#getRecordIdsToDelete(String)
. This makes it possible to parallelize and
distribute the check for records to delete.
sourceId
- the name of the data source to examine.
DeltaException
java.util.Collection<DeltaService.EntryId> getUnvisitedEntries(java.lang.String sourceAndShardPrefix, java.lang.String jobRunId) throws DeltaException
getShardPrefixes(String)
and call this method with each of the shard-prefix
values.
sourceAndShardPrefix
- one of the values returned by getShardPrefixes(String)
DeltaException
void deleteEntry(java.lang.String sourceId, DeltaService.EntryId entryId) throws DeltaException
sourceId
- data source IdentryId
- ID of the entry, e.g. as returned by getUnvisitedEntries(String, String)
DeltaException
|
SMILA 1.0 API documentation | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |