SMILA (incubation) API documentation

org.eclipse.smila.connectivity.framework.crawler.web.parse.html
Class HtmlParser

java.lang.Object
  extended by org.eclipse.smila.connectivity.framework.crawler.web.parse.html.HtmlParser
All Implemented Interfaces:
Configurable, Parser

public class HtmlParser
extends java.lang.Object
implements Parser, Configurable

The Class HtmlParser.


Constructor Summary
HtmlParser()
           
 
Method Summary
 Configuration getConf()
          Return the configuration used by this object.
 java.lang.String[] getContentTypes()
          Returns array of content-types that are supported by this parser.
 Parse getParse(Content content)
          Returns the Parse result for the given Content.
 void setConf(Configuration configuration)
          Set the configuration to be used by this object.
 void setJavascriptParser(Parser parser)
          Sets javascript parser reference that is needed for extracting js links.
 void unsetJavascriptParser(Parser parser)
          Removes javascript parser reference.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlParser

public HtmlParser()
Method Detail

getParse

public Parse getParse(Content content)
Returns the Parse result for the given Content.

Specified by:
getParse in interface Parser
Parameters:
content - Content to be parsed.
Returns:
Parse

setConf

public void setConf(Configuration configuration)
Set the configuration to be used by this object.

Specified by:
setConf in interface Configurable
Parameters:
configuration - Configuration

getConf

public Configuration getConf()
Return the configuration used by this object.

Specified by:
getConf in interface Configurable
Returns:
Configuration

getContentTypes

public java.lang.String[] getContentTypes()
Returns array of content-types that are supported by this parser.

Specified by:
getContentTypes in interface Parser
Returns:
array of content-types.

setJavascriptParser

public void setJavascriptParser(Parser parser)
Sets javascript parser reference that is needed for extracting js links.

Parameters:
parser - Javascript parser reference.

unsetJavascriptParser

public void unsetJavascriptParser(Parser parser)
Removes javascript parser reference.

Parameters:
parser - javascript parser reference

SMILA (incubation) API documentation