SMILA (incubation) API documentation

org.eclipse.smila.processing.pipelets
Class HtmlToTextPipelet

java.lang.Object
  extended by org.eclipse.smila.processing.pipelets.ATransformationPipelet
      extended by org.eclipse.smila.processing.pipelets.HtmlToTextPipelet
All Implemented Interfaces:
Pipelet

public class HtmlToTextPipelet
extends ATransformationPipelet

Simple HTML-to-Text extractor pipelet using NekoHTML parser.

Author:
jschumacher

Nested Class Summary
 class HtmlToTextPipelet.CommentRemover
          removes comments from HTML files.
 class HtmlToTextPipelet.MetadataExtractor
          extract metadata from META tags.
 class HtmlToTextPipelet.PlainTextWriter
          Append plain text from document to a string builder.
 
Field Summary
 
Fields inherited from class org.eclipse.smila.processing.pipelets.ATransformationPipelet
_dataFactory, _inputName, _inputType, _outputName, _outputType, ENCODING_ATTACHMENT, PROP_INPUT_NAME, PROP_INPUT_TYPE, PROP_OUTPUT_NAME, PROP_OUTPUT_TYPE
 
Constructor Summary
HtmlToTextPipelet()
           
 
Method Summary
 void configure(AnyMap configuration)
          set configuration of pipelet.
 java.lang.String[] process(Blackboard blackboard, java.lang.String[] recordIds)
          process records on Blackboard service.
 
Methods inherited from class org.eclipse.smila.processing.pipelets.ATransformationPipelet
getInputName, getInputType, getOutputName, getOutputType, isReadFromAttribute, isStoreInAttribute, readInput, readStringInput, storeResult, storeResult, storeResults
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlToTextPipelet

public HtmlToTextPipelet()
Method Detail

configure

public void configure(AnyMap configuration)
               throws ProcessingException
set configuration of pipelet. called once after instantiation before the pipelet is actually used in a workflow.

Specified by:
configure in interface Pipelet
Overrides:
configure in class ATransformationPipelet
Parameters:
configuration - configuration of pipelet.
Throws:
ProcessingException - configuration is not applicable for pipelet (missing properties, wrong datatypes)

process

public java.lang.String[] process(Blackboard blackboard,
                                  java.lang.String[] recordIds)
                           throws ProcessingException
process records on Blackboard service.

Parameters:
blackboard - Blackboard service managing the records.
recordIds - Ids of records to process.
Returns:
Ids of result records. By default this should be the same as the passed in recordIds unless there is a specific reason not to do so. This is especially true for SearchPiplets.
Throws:
ProcessingException - error during processing.

SMILA (incubation) API documentation