In this section, we will focus on text processing with ISB.
It is assumed that the reader is familiar with the general principles of Apache
Camel.
Text Processing consists of the following steps
as indicated in the following diagram

The following sections describe how the corresponding tasks can be performed in Camel Routes and Spring Bean Definitions which can either be deployed via a standalone Camel Configuration or embedded in a Camel Trigger in an Stardust Process Model.
The entry of files can be performed with an Apache Camel File or FTP endpoint, e.g.
<from uri="file://c:/data?filter=#fileFilter1"/>
or
<from uri="ftp://.."/>
Additional file filtering can be performed via org.apache.camel.component.file.AntPathMatcherGenericFileFilter to specify files to be included in and/or excluded from further processing. Exclude take precedence over includes: If a file match both exclude and include it will be regarded as excluded.
<bean id="fileFilter1" class="org.apache.camel.component.file.AntPathMatcherGenericFileFilter"> <property name="includes" value="M05_CMPCN_PBK*.TXT"/> </bean>
Once a file is received, its content can be split into text blocks using the Apache Camel Splitter Directive. Each text block obtained via split will be processed separately. For convenience splitters for line and page breaks are provided for Infinity Service Bus. Other splitting options can be obtain from Apache Camel.
Split a text by lines using line break (cr) as a separator.
<bean id="lineSplitter" class="com.infinity.integration.textprocessing.splitter.LineSplitter"/>
Split a text by page using page break as separator.
<bean id="lineSplitter" class="com.infinity.integration.textprocessing.splitter.PageBreakSplitter"/>
Page filtering is based on the content of a particular field in the page. The filed is defined by it location (column, row, size) and filtering criteria (actual value).
| Property | Description | Type |
|---|---|---|
| column | Line offset where the text to be used for filtering starts. | int |
| row | Line where the text to be used for filtering starts. | int |
| actualValue | Value to be used for filtering. | String |
| size | Number of characters to consider; | int |
| match | y for match, n otherwise; | char |
Aggregation of pages and further processing of the page groups can be achieved with the Apache Camel Aggregator Directive. The Page Assembler bean allows group pageSize of pages together.
<bean id="pageAssemble1" class="com.infinity.integration.textprocessing.assembler.PageAssembler"> <property name="pageSize" value="1"/> </bean>
Beans of type com.infinity.integration.textprocessing.extractor.DataExtractor allows to extract a specific data from a text block and store the extracted data in a hashmap attached to the exchange object of the route.
It either allows to address a start point for the extraction by providing (row, column) in the text block or allows to look for a pattern targetString and starts extraction from (offsetRow, offsetColumn) from the first occurrence of targetString.
In both cases up to maxCharacters characters are added to the extracted string. The extracted string is added to a map with the key dataId to be passed as data to the process started. dataType indicates the data type of the extracted metadata.
Routes may use chains of DataExtractor filters to extract multiple metadata.
| Property | Description | Type |
|---|---|---|
| maxCharacters | Maximum number of character to retrieve | int |
| dataId | The data Identifier where the extracted value will be stored | String |
| column | The column value in the text | int |
| row | The row (line) value in the text | int |
| searchType | The search type:
|
char |
| defaultValue | Default value to be used in case data not found | String |
Example
The following example
An empty string is used as the default value. The operation is performed for every page.
<bean id="extractQTY" class="com.infinity.integration.textprocessing.extractor.DataExtractor"> <property name="maxCharacters" value="3"/> <property name="dataId" value="QTY"/> <property name="row" value="2"/> <property name="column" value="11"/> <property name="searchType" value="S"/> <property name="defaultValue" value=""/> </bean>
Input
1STSABC123XYZ12 2BC123XYZ1200STST66
Output
QTY=200
The data extraction strategy define a list of data extraction details plus other properties (such as a reason code and department in the following example).
| Property | Description | Type |
|---|---|---|
| status | Strategy status (enum?) | String |
| extractors | Data extractor object list | List(com.infinity.integration.ems.extractor.DataExtractor) |
| reasonCode | Reason code | String |
| department | Department name | String |
Example
In the following example, the Instrument and Qty are extracted.
<bean id="dataextractorexample2" class="com.infinity.integration.ems.extractor.DataExtractionStrategy"> <property name="status" value="PROCESS"/> <property name="extractors"> <list> <ref bean="extractorInst"/> <ref bean="extractorQty"/> </list> </property> <property name="reasonCode" value="RESAON"/> <property name="department" value="CMPCN"/> </bean>
The workflow directives include authentication details and the process name to start with its required input data.
<to uri="ipp:authenticate:setCurrent?user=motu&password=motu"/>
<to uri="ipp:process:start?processId=DataExtraction&dataMap=${body}"/>
The following Camel Configuration
<route> <!-- File entry -->
<from uri="file://c:/data?filter=#fileFilter1"/>
<split> <!--Split the file content via page breaks -->
<method bean="pageBreakSplitter1" method="splitBody" />
<aggregate strategyRef="pageAssembler1" aggregationRepositoryRef="pagesrepociprep">
<correlationExpression>
<constant>true</constant>
</correlationExpression>
<completionPredicate>
<method ref="pageAssembler1" method="isCompleted"/>
</completionPredicate>
<filter>
<method ref="pageFilterl" method="accept"/>
<setHeader headerName="ems-it-type">
<method ref="correlationexpressionagreportname" method="evaluate"/>
</setHeader>
<aggregate strategyRef="pageAssembler1" aggregationRepositoryRef="memoryRepository1" completionTimeout="10000">
<correlationExpression>
<header>ems-it-type</header>
</correlationExpression>
<to uri="ipp:authenticate:setCurrent?user=motu&password=motu"/> <to uri="ipp:process:start?processId=EMSProcessing&Message=${body}"/> </aggregate>
</filter>
</aggregate>
</split>
</route> <!-- Beans --> <bean id="fileFilter1" class="org.apache.camel.component.file.AntPathMatcherGenericFileFilter">
<property name="includes" value="**/NAFRPT.ASH*" />
</bean> <bean id="pageBreakSplitter1" class="com.infinity.integration.ems.splitter.PageBreakSplitter"/> <bean id="pageAssembler1" class="com.infinity.integration.ems.converter.assembler.PageAssembler">
<property name="pageSize" value="2"/>
</bean> <bean id="pageFilterl" class="com.infinity.integration.textprocessing.filter.PageFilter">
<property name="column" value="1" />
<property name="actualValue" value="EWDETL" />
<property name="match" value="n" />
<property name="row" value="1" />
<property name="size" value="6" />
</bean> <bean id="memoryRepository1" class="org.apache.camel.processor.aggregate.MemoryAggregationRepository" />