public class TikaPipelet extends ATransformationPipelet
| Modifier and Type | Class and Description |
|---|---|
static class |
TikaPipelet.StoreMode
possible values for parameter 'storeMode'.
|
| Modifier and Type | Field and Description |
|---|---|
static int |
DEFAULT_MAX_LENGTH
default length. -1 is "unlimited".
|
static java.lang.String |
PROP_ATTACHMENT_CONTENT_TYPE_ATTRIBUTE
(Optional) parameter referencing the attribute that contains the content type (e.g.
|
static java.lang.String |
PROP_EXPORT_AS_HTML
(Optional) parameter that defines if the content should be transformed to (X)HTML (true) or plain text (false).
|
static java.lang.String |
PROP_EXTRACT_PROPERTIES
(Optional) Parameter that defines what to extract from input and copy into record attributes with the name of the
extracted properties.
|
static java.lang.String |
PROP_FILE_NAME_ATTRIBUTE
(Optional) parameter referencing the attribute that contains the file name that can give the Tika parser possibly a
hint about how to parse the file.
|
static java.lang.String |
PROP_KEEP_HYPHENS
(Optional) parameter that defines whether the hyphens should be kept in the output (as in the input) (true) or
whether the software should try to remove the hyphens by a heuristic approach.
|
static java.lang.String |
PROP_MAPPING_METADATA_NAME
name of the metadata field (will be matched case-insensitively).
|
static java.lang.String |
PROP_MAPPING_SINGLE_RESULT
will only one result (true) multiple, if available (false) be considered.
|
static java.lang.String |
PROP_MAPPING_STORE_MODE
(Optional) parameter that defines how the extracted properties are stored in the target attribute.
|
static java.lang.String |
PROP_MAPPING_TARGET_ATTRIBUTE
name of the target attribute for the metadata entry (optional, default: the value of
PROP_MAPPING_METADATA_NAME with its original case. |
static java.lang.String |
PROP_MAX_LENGTH
(Optional) parameter that defines how many characters of the content should be extracted to prevent out of memory
leaks.
|
static java.lang.String |
PROP_PAGE_BREAK
(Optional) parameter that defines whether page breaks should be marked with a
|
static java.lang.String |
PROP_PAGE_NUMBER_ATTRIBUTE
(Optional) parameter referencing the attribute that that contains the page number.
|
_config, ENCODING_ATTACHMENT, ENCODING_CHARSET, PROP_INPUT_NAME, PROP_INPUT_TYPE, PROP_OUTPUT_NAME, PROP_OUTPUT_TYPE, PROP_OUTPUT_VALUE_TYPE| Constructor and Description |
|---|
TikaPipelet() |
| Modifier and Type | Method and Description |
|---|---|
void |
configure(AnyMap configuration)
set configuration of pipelet. called once after instantiation before the pipelet is actually used in a workflow.
|
java.lang.String[] |
process(Blackboard blackboard,
java.lang.String[] recordIds)
process given records.
|
getInputName, getInputStream, getInputType, getOutputName, getOutputType, getOutputValueType, isReadFromAttribute, isStoreInAttribute, readInput, readStringInput, storeResult, storeResult, storeResult, storeResultspublic static final java.lang.String PROP_ATTACHMENT_CONTENT_TYPE_ATTRIBUTE
public static final java.lang.String PROP_FILE_NAME_ATTRIBUTE
public static final java.lang.String PROP_EXTRACT_PROPERTIES
public static final java.lang.String PROP_EXPORT_AS_HTML
public static final java.lang.String PROP_PAGE_BREAK
public static final java.lang.String PROP_PAGE_NUMBER_ATTRIBUTE
public static final java.lang.String PROP_KEEP_HYPHENS
public static final java.lang.String PROP_MAX_LENGTH
public static final int DEFAULT_MAX_LENGTH
public static final java.lang.String PROP_MAPPING_METADATA_NAME
public static final java.lang.String PROP_MAPPING_TARGET_ATTRIBUTE
PROP_MAPPING_METADATA_NAME with its original case.public static final java.lang.String PROP_MAPPING_SINGLE_RESULT
public static final java.lang.String PROP_MAPPING_STORE_MODE
public void configure(AnyMap configuration) throws ProcessingException
configure in interface Pipeletconfigure in class ATransformationPipeletconfiguration - configuration of pipelet.ProcessingException - configuration is not applicable for pipelet (missing properties, wrong datatypes)public java.lang.String[] process(Blackboard blackboard, java.lang.String[] recordIds) throws ProcessingException
blackboard - Blackboard holding and managing the records.recordIds - Ids of records to process.ProcessingException - error during processing.