SMILA (incubation) API documentation

org.eclipse.smila.connectivity.framework.crawler.web
Class WebCrawler

java.lang.Object
  extended by org.eclipse.smila.connectivity.framework.AbstractCrawler
      extended by org.eclipse.smila.connectivity.framework.crawler.web.WebCrawler
All Implemented Interfaces:
Crawler, CrawlerCallback

public class WebCrawler
extends AbstractCrawler

The WebCrawler class.


Field Summary
static java.lang.String POC_AVEREGE_TIME_TO_FETCH
          The Constant POC_AVEREGE_TIME_TO_FETCH.
static java.lang.String POC_BYTES
          The Constant POC_BYTES.
static java.lang.String POC_PAGES
          The Constant POC_PAGES.
static java.lang.String POC_PRODUCER_EXCEPTIONS
          The Constant POC_PRODUCER_EXCEPTIONS.
 
Constructor Summary
WebCrawler()
          Instantiates a new web crawler.
 
Method Summary
 void close()
          Ends crawl, allowing the Crawler implementation to close any open resources.
 void dispose(Id id)
          Disposes the record with the given Id.
 byte[] getAttachment(Id id, java.lang.String name)
          Returns the attachment for the given Id and name pair.
 java.lang.String[] getAttachmentNames(Id id)
          Returns an array of String[] containing the names of the available attachments for the given id.
 MObject getMObject(Id id)
          Returns the MObject for the given id.
 DataReference[] getNext()
          Returns an array of DataReference objects.
 void initialize(DataSourceConnectionConfig config)
          Initialize.
 void setParserManager(ParserManager parserManager)
          To be used by Declarative Services.
 void unsetParserManager(ParserManager parserManager)
          To be used by Declarative Services.
 
Methods inherited from class org.eclipse.smila.connectivity.framework.AbstractCrawler
activate, getCrawlerId
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

POC_BYTES

public static final java.lang.String POC_BYTES
The Constant POC_BYTES.

See Also:
Constant Field Values

POC_PAGES

public static final java.lang.String POC_PAGES
The Constant POC_PAGES.

See Also:
Constant Field Values

POC_PRODUCER_EXCEPTIONS

public static final java.lang.String POC_PRODUCER_EXCEPTIONS
The Constant POC_PRODUCER_EXCEPTIONS.

See Also:
Constant Field Values

POC_AVEREGE_TIME_TO_FETCH

public static final java.lang.String POC_AVEREGE_TIME_TO_FETCH
The Constant POC_AVEREGE_TIME_TO_FETCH.

See Also:
Constant Field Values
Constructor Detail

WebCrawler

public WebCrawler()
Instantiates a new web crawler.

Method Detail

initialize

public void initialize(DataSourceConnectionConfig config)
                throws CrawlerException,
                       CrawlerCriticalException
Initialize.

Parameters:
config - the DataSourceConnectionConfig
Throws:
CrawlerException - the crawler exception
CrawlerCriticalException - the crawler critical exception

getNext

public DataReference[] getNext()
                        throws CrawlerException,
                               CrawlerCriticalException
Returns an array of DataReference objects. The size of the returned array may vary from call to call. The maximum size of the array is determined by configuration or by the implementation class.

Returns:
an array of DataReference objects or null, if no more DataReference exist
Throws:
CrawlerException - if any error occurs
CrawlerCriticalException - the crawler critical exception

close

public void close()
           throws CrawlerException
Ends crawl, allowing the Crawler implementation to close any open resources.

Throws:
CrawlerException - if any error occurs

getMObject

public MObject getMObject(Id id)
                   throws CrawlerException,
                          CrawlerCriticalException
Returns the MObject for the given id.

Parameters:
id - the record id
Returns:
the MObject
Throws:
CrawlerException - if any non critical error occurs
CrawlerCriticalException - if any critical error occurs

getAttachment

public byte[] getAttachment(Id id,
                            java.lang.String name)
                     throws CrawlerException,
                            CrawlerCriticalException
Returns the attachment for the given Id and name pair.

Parameters:
id - the record id
name - the name of the attachment
Returns:
a byte[] containing the attachment
Throws:
CrawlerException - if any non critical error occurs
CrawlerCriticalException - if any critical error occurs

getAttachmentNames

public java.lang.String[] getAttachmentNames(Id id)
                                      throws CrawlerException,
                                             CrawlerCriticalException
Returns an array of String[] containing the names of the available attachments for the given id.

Parameters:
id - the record id
Returns:
an array of String[] containing the names of the available attachments
Throws:
CrawlerException - if any non critical error occurs
CrawlerCriticalException - if any critical error occurs

dispose

public void dispose(Id id)
Disposes the record with the given Id.

Parameters:
id - the record id

setParserManager

public void setParserManager(ParserManager parserManager)
To be used by Declarative Services. Sets the ParserManager service.

Parameters:
parserManager - ParserManager Service.

unsetParserManager

public void unsetParserManager(ParserManager parserManager)
To be used by Declarative Services. Removes ParserManager service.

Parameters:
parserManager - ParserManager Service.

SMILA (incubation) API documentation