|
SMILA (incubation) API documentation | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.eclipse.smila.connectivity.framework.crawler.web.IndexDocument
public class IndexDocument
This class for indexing contains all relevant data rom the web page.
| Constructor Summary | |
|---|---|
IndexDocument(java.lang.String url,
java.lang.String title,
byte[] content,
java.util.List<java.lang.String> responseHeaders,
java.util.List<java.lang.String> htmlMetaData,
java.util.List<java.lang.String> metaDataWithResponseHeaderFallBack)
Constructor. |
|
| Method Summary | |
|---|---|
java.lang.String |
extractFromResponseHeaders(java.util.regex.Pattern pattern,
int group)
extract something from response headers. |
byte[] |
getContent()
Returns content of the downloaded document. |
java.util.List<java.lang.String> |
getHtmlMetaData()
Returns the list of HTML meta data extracted from HTML meta tags. |
java.util.List<java.lang.String> |
getMetaDataWithResponseHeaderFallBack()
Returns combination of response headers and HTML meta data. |
java.util.List<java.lang.String> |
getResponseHeaders()
Returns response headers. |
java.lang.String |
getTitle()
Returns title of the web page. |
java.lang.String |
getUrl()
Returns url of the page. |
void |
setContent(byte[] content)
Assigns text content of the web page to the index document. |
void |
setHtmlMetaData(java.util.List<java.lang.String> metaData)
Assigns HTML meta data to the index document. |
void |
setMetaDataWithResponseHeaderFallBack(java.util.List<java.lang.String> metaDataWithResponseHeaderFallBack)
Assigns combination of response headers and HTML meta data to the index document. |
void |
setResponseHeaders(java.util.List<java.lang.String> headers)
Assigns response headers to the index document. |
void |
setTitle(java.lang.String title)
Assigns title of the page. |
void |
setUrl(java.lang.String url)
Assigns URL of the page. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public IndexDocument(java.lang.String url,
java.lang.String title,
byte[] content,
java.util.List<java.lang.String> responseHeaders,
java.util.List<java.lang.String> htmlMetaData,
java.util.List<java.lang.String> metaDataWithResponseHeaderFallBack)
url - URL of the web pagetitle - title of the web pagecontent - extracted contentresponseHeaders - list of response headershtmlMetaData - list of extracted HTML meta datametaDataWithResponseHeaderFallBack - responseHeaders and htmlMetaData merged together| Method Detail |
|---|
public byte[] getContent()
public void setContent(byte[] content)
content - Stringpublic java.lang.String getTitle()
public void setTitle(java.lang.String title)
title - Stringpublic java.lang.String getUrl()
public void setUrl(java.lang.String url)
url - Stringpublic java.util.List<java.lang.String> getHtmlMetaData()
public void setHtmlMetaData(java.util.List<java.lang.String> metaData)
metaData - Listpublic java.util.List<java.lang.String> getResponseHeaders()
public void setResponseHeaders(java.util.List<java.lang.String> headers)
headers - Listpublic java.util.List<java.lang.String> getMetaDataWithResponseHeaderFallBack()
public void setMetaDataWithResponseHeaderFallBack(java.util.List<java.lang.String> metaDataWithResponseHeaderFallBack)
metaDataWithResponseHeaderFallBack - List
public java.lang.String extractFromResponseHeaders(java.util.regex.Pattern pattern,
int group)
pattern - a regular expressiongroup - index of group in regular expression to return
|
SMILA (incubation) API documentation | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||