SMILA (incubation) API documentation

org.eclipse.smila.processing.pipelets
Class HtmlToTextPipelet

java.lang.Object
  extended by org.eclipse.smila.processing.pipelets.ATransformationPipelet
      extended by org.eclipse.smila.processing.pipelets.HtmlToTextPipelet
All Implemented Interfaces:
IPipelet, SimplePipelet

public class HtmlToTextPipelet
extends ATransformationPipelet

Simple HTML-to-Text extractor pipelet using NekoHTML parser.

Author:
jschumacher

Nested Class Summary
 class HtmlToTextPipelet.CommentRemover
          removes comments from HTML files.
 class HtmlToTextPipelet.MetadataExtractor
          extract metadata from META tags.
 class HtmlToTextPipelet.PlainTextWriter
          Append plain text from document to a string builder.
 
Field Summary
 
Fields inherited from class org.eclipse.smila.processing.pipelets.ATransformationPipelet
_inputName, _inputPath, _inputType, _outputName, _outputPath, _outputType, ENCODING_ATTACHMENT, PROP_INPUT_NAME, PROP_INPUT_TYPE, PROP_OUTPUT_NAME, PROP_OUTPUT_TYPE
 
Constructor Summary
HtmlToTextPipelet()
           
 
Method Summary
 void configure(PipeletConfiguration configuration)
          set configuration of pipelet. called once after instantiation before the pipelet is actually used in a workflow.
 Id[] process(Blackboard blackboard, Id[] recordIds)
          process records on Blackboard service.
 
Methods inherited from class org.eclipse.smila.processing.pipelets.ATransformationPipelet
getInputName, getInputPath, getInputType, getOutputName, getOutputPath, getOutputType, isReadFromAttribute, isStoreInAttribute, readInput, readStringInput, storeResult, storeResult, storeResults
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlToTextPipelet

public HtmlToTextPipelet()
Method Detail

configure

public void configure(PipeletConfiguration configuration)
               throws ProcessingException
set configuration of pipelet. called once after instantiation before the pipelet is actually used in a workflow.

Specified by:
configure in interface IPipelet
Overrides:
configure in class ATransformationPipelet
Parameters:
configuration - configuration of pipelet.
Throws:
ProcessingException - configuration is not applicable for pipelet (missing properties, wrong datatypes)
See Also:
#configure(org.eclipse.smila.processing.configuration.PipeletConfiguration)

process

public Id[] process(Blackboard blackboard,
                    Id[] recordIds)
             throws ProcessingException
process records on Blackboard service.

Parameters:
blackboard - Blackboard service managing the records.
recordIds - Ids of records to process.
Returns:
Ids of result records.
Throws:
ProcessingException - error during processing.
See Also:
SimplePipelet.process(org.eclipse.smila.blackboard.Blackboard, org.eclipse.smila.datamodel.id.Id[])

SMILA (incubation) API documentation