|
SMILA (incubation) API documentation | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.eclipse.smila.connectivity.framework.crawler.web.IndexDocument
public class IndexDocument
This class for indexing contains all relevant data rom the web page.
Constructor Summary | |
---|---|
IndexDocument(java.lang.String url,
java.lang.String title,
byte[] content,
java.util.List<java.lang.String> responseHeaders,
java.util.List<java.lang.String> htmlMetaData,
java.util.List<java.lang.String> metaDataWithResponseHeaderFallBack)
Constructor. |
Method Summary | |
---|---|
java.lang.String |
extractFromResponseHeaders(java.util.regex.Pattern pattern,
int group)
extract something from response headers. |
byte[] |
getContent()
Returns content of the downloaded document. |
java.util.List<java.lang.String> |
getHtmlMetaData()
Returns the list of HTML meta data extracted from HTML meta tags. |
java.util.List<java.lang.String> |
getMetaDataWithResponseHeaderFallBack()
Returns combination of response headers and HTML meta data. |
java.util.List<java.lang.String> |
getResponseHeaders()
Returns response headers. |
java.lang.String |
getTitle()
Returns title of the web page. |
java.lang.String |
getUrl()
Returns url of the page. |
void |
setContent(byte[] content)
Assigns text content of the web page to the index document. |
void |
setHtmlMetaData(java.util.List<java.lang.String> metaData)
Assigns HTML meta data to the index document. |
void |
setMetaDataWithResponseHeaderFallBack(java.util.List<java.lang.String> metaDataWithResponseHeaderFallBack)
Assigns combination of response headers and HTML meta data to the index document. |
void |
setResponseHeaders(java.util.List<java.lang.String> headers)
Assigns response headers to the index document. |
void |
setTitle(java.lang.String title)
Assigns title of the page. |
void |
setUrl(java.lang.String url)
Assigns URL of the page. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public IndexDocument(java.lang.String url, java.lang.String title, byte[] content, java.util.List<java.lang.String> responseHeaders, java.util.List<java.lang.String> htmlMetaData, java.util.List<java.lang.String> metaDataWithResponseHeaderFallBack)
url
- URL of the web pagetitle
- title of the web pagecontent
- extracted contentresponseHeaders
- list of response headershtmlMetaData
- list of extracted HTML meta datametaDataWithResponseHeaderFallBack
- responseHeaders and htmlMetaData merged togetherMethod Detail |
---|
public byte[] getContent()
public void setContent(byte[] content)
content
- Stringpublic java.lang.String getTitle()
public void setTitle(java.lang.String title)
title
- Stringpublic java.lang.String getUrl()
public void setUrl(java.lang.String url)
url
- Stringpublic java.util.List<java.lang.String> getHtmlMetaData()
public void setHtmlMetaData(java.util.List<java.lang.String> metaData)
metaData
- Listpublic java.util.List<java.lang.String> getResponseHeaders()
public void setResponseHeaders(java.util.List<java.lang.String> headers)
headers
- Listpublic java.util.List<java.lang.String> getMetaDataWithResponseHeaderFallBack()
public void setMetaDataWithResponseHeaderFallBack(java.util.List<java.lang.String> metaDataWithResponseHeaderFallBack)
metaDataWithResponseHeaderFallBack
- Listpublic java.lang.String extractFromResponseHeaders(java.util.regex.Pattern pattern, int group)
pattern
- a regular expressiongroup
- index of group in regular expression to return
|
SMILA (incubation) API documentation | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |