SMILA (incubation) API documentation

org.eclipse.smila.connectivity.framework.crawler.web.parse
Class ParseData

java.lang.Object
  extended by org.eclipse.smila.connectivity.framework.crawler.web.configuration.Configured
      extended by org.eclipse.smila.connectivity.framework.crawler.web.parse.ParseData
All Implemented Interfaces:
Configurable

public final class ParseData
extends Configured

Data extracted from a page's content.


Field Summary
 
Fields inherited from class org.eclipse.smila.connectivity.framework.crawler.web.configuration.Configured
_configuration
 
Constructor Summary
ParseData()
          Empty constructor.
ParseData(ParseStatus status, java.lang.String title, Outlink[] outlinks, Metadata contentMeta)
          Creates new object with empty html metatags.
ParseData(ParseStatus status, java.lang.String title, Outlink[] outlinks, Metadata contentMeta, HTMLMetaTags htmlMetaTags)
          Creates new object with empty parse meta data.
ParseData(ParseStatus status, java.lang.String title, Outlink[] outlinks, Metadata contentMeta, Metadata parseMeta, HTMLMetaTags htmlMetaTags)
          Creates new ParseData object with given configuration.
 
Method Summary
 boolean equals(java.lang.Object o)
          
 Metadata getContentMeta()
          The original Meta data retrieved from content.
 HTMLMetaTags getHtmlMetaTags()
          Returns HTML meta tags information.
 java.lang.String getMeta(java.lang.String name)
          Get a meta data single value.
 Outlink[] getOutlinks()
          The outlinks of the page.
 Metadata getParseMeta()
          Other content properties.
 ParseStatus getStatus()
          The status of parsing the page.
 java.lang.String getTitle()
          The title of the page.
 int hashCode()
          
 void setHtmlMetaTags(HTMLMetaTags htmlMetaTags)
          Assigns HTML meta tags information.
 void setParseMeta(Metadata parseMeta)
          Assigns parse meta data.
 java.lang.String toString()
          
 
Methods inherited from class org.eclipse.smila.connectivity.framework.crawler.web.configuration.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ParseData

public ParseData()
Empty constructor.


ParseData

public ParseData(ParseStatus status,
                 java.lang.String title,
                 Outlink[] outlinks,
                 Metadata contentMeta)
Creates new object with empty html metatags.

Parameters:
status - ParseStatus
title - String title of the page
outlinks - OutLinks array
contentMeta - Meta data extracted from content

ParseData

public ParseData(ParseStatus status,
                 java.lang.String title,
                 Outlink[] outlinks,
                 Metadata contentMeta,
                 HTMLMetaTags htmlMetaTags)
Creates new object with empty parse meta data.

Parameters:
status - ParseStatus
title - String title of the page
outlinks - OutLinks array
contentMeta - Meta data extracted from content
htmlMetaTags - Meta data extracted from HTML tags

ParseData

public ParseData(ParseStatus status,
                 java.lang.String title,
                 Outlink[] outlinks,
                 Metadata contentMeta,
                 Metadata parseMeta,
                 HTMLMetaTags htmlMetaTags)
Creates new ParseData object with given configuration.

Parameters:
status - ParseStatus
title - String title of the page
outlinks - OutLinks array
contentMeta - Meta data extracted from content
parseMeta - Meta data parse Meta data
htmlMetaTags - Meta data extracted from HTML tags
Method Detail

getStatus

public ParseStatus getStatus()
The status of parsing the page.

Returns:
ParseStatus

getTitle

public java.lang.String getTitle()
The title of the page.

Returns:
String

getOutlinks

public Outlink[] getOutlinks()
The outlinks of the page.

Returns:
Outlinks array

getContentMeta

public Metadata getContentMeta()
The original Meta data retrieved from content.

Returns:
Meta data

getParseMeta

public Metadata getParseMeta()
Other content properties.

Returns:
Meta data

setParseMeta

public void setParseMeta(Metadata parseMeta)
Assigns parse meta data.

Parameters:
parseMeta - parser specific content properties.

getMeta

public java.lang.String getMeta(java.lang.String name)
Get a meta data single value. This method first looks for the meta data value in the parse meta data. If no value is found it the looks for the meta data in the content meta data.

Parameters:
name - Name of meta data element
Returns:
String Meta data value
See Also:
getContentMeta(), getParseMeta()

getHtmlMetaTags

public HTMLMetaTags getHtmlMetaTags()
Returns HTML meta tags information.

Returns:
meta tags extracted from HTML tags

setHtmlMetaTags

public void setHtmlMetaTags(HTMLMetaTags htmlMetaTags)
Assigns HTML meta tags information.

Parameters:
htmlMetaTags - meta tags extracted from HTML tags

equals

public boolean equals(java.lang.Object o)

Overrides:
equals in class java.lang.Object

hashCode

public int hashCode()

Overrides:
hashCode in class java.lang.Object

toString

public java.lang.String toString()

Overrides:
toString in class java.lang.Object

SMILA (incubation) API documentation