org.eclipse.smila.processing.pipelets
Class HtmlToTextPipelet
java.lang.Object
org.eclipse.smila.processing.pipelets.ATransformationPipelet
org.eclipse.smila.processing.pipelets.HtmlToTextPipelet
- All Implemented Interfaces:
- Pipelet
public class HtmlToTextPipelet
- extends ATransformationPipelet
Simple HTML-to-Text extractor pipelet using NekoHTML parser.
- Author:
- jschumacher
Methods inherited from class org.eclipse.smila.processing.pipelets.ATransformationPipelet |
getInputName, getInputType, getOutputName, getOutputType, isReadFromAttribute, isStoreInAttribute, readInput, readStringInput, storeResult, storeResult, storeResults |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
HtmlToTextPipelet
public HtmlToTextPipelet()
getDefaultEncoding
protected java.lang.String getDefaultEncoding(ParameterAccessor paramAccessor)
throws MissingParameterException
- Returns:
- default encoding parameter.
- Throws:
MissingParameterException
getRemoveContentTags
protected final java.lang.String[] getRemoveContentTags(ParameterAccessor paramAccessor)
throws MissingParameterException
- Returns:
- the tag names for which the complete content is removed from result.
- Throws:
MissingParameterException
configure
public void configure(AnyMap configuration)
throws ProcessingException
- set configuration of pipelet. called once after instantiation before the pipelet is actually used in a workflow.
note: additionally configures mata attribute mapping (which is not applicable via parameter accessor.
- Specified by:
configure
in interface Pipelet
- Overrides:
configure
in class ATransformationPipelet
- Parameters:
configuration
- configuration of pipelet.
- Throws:
ProcessingException
- configuration is not applicable for pipelet (missing properties, wrong datatypes)
process
public java.lang.String[] process(Blackboard blackboard,
java.lang.String[] recordIds)
throws ProcessingException
- process given records.
- Parameters:
blackboard
- Blackboard holding and managing the records.recordIds
- Ids of records to process.
- Returns:
- Ids of records to be passed into the next pipelet. By default this should be the same as the passed in
recordIds unless there is a specific (businesslogic) reason not to do so.
- Throws:
ProcessingException
- error during processing.