public class HtmlToTextPipelet extends ATransformationPipelet
Modifier and Type | Class and Description |
---|---|
static class |
HtmlToTextPipelet.CommentRemover
removes comments from HTML files.
|
class |
HtmlToTextPipelet.MetadataExtractor
extract metadata from META tags.
|
class |
HtmlToTextPipelet.PlainTextWriter
Append plain text from document to a string builder.
|
_config, ENCODING_ATTACHMENT, ENCODING_CHARSET, PROP_INPUT_NAME, PROP_INPUT_TYPE, PROP_OUTPUT_NAME, PROP_OUTPUT_TYPE, PROP_OUTPUT_VALUE_TYPE
Constructor and Description |
---|
HtmlToTextPipelet() |
Modifier and Type | Method and Description |
---|---|
void |
configure(AnyMap configuration)
set configuration of pipelet. called once after instantiation before the pipelet is actually used in a workflow.
|
protected java.lang.String |
getDefaultEncoding(ParameterAccessor paramAccessor) |
java.lang.String[] |
process(Blackboard blackboard,
java.lang.String[] recordIds)
process given records.
|
getInputName, getInputStream, getInputType, getOutputName, getOutputType, getOutputValueType, isReadFromAttribute, isStoreInAttribute, readInput, readStringInput, storeResult, storeResult, storeResult, storeResults
protected java.lang.String getDefaultEncoding(ParameterAccessor paramAccessor) throws MissingParameterException
MissingParameterException
public void configure(AnyMap configuration) throws ProcessingException
configure
in interface Pipelet
configure
in class ATransformationPipelet
configuration
- configuration of pipelet.ProcessingException
- configuration is not applicable for pipelet (missing properties, wrong datatypes)public java.lang.String[] process(Blackboard blackboard, java.lang.String[] recordIds) throws ProcessingException
blackboard
- Blackboard holding and managing the records.recordIds
- Ids of records to process.ProcessingException
- error during processing.