Text Processing

In this section, we will focus on ISB's text processing functionalities. the capabilities provided are similar to the capabilities provided by EXP EMS (Electronic Media Storage). EMS brings electronic copies of files into the workflow creating new work or modifying existing work.

Text Processing

A basic use of EMS is to store and disseminate reports from backend systems to end users. A more complex use of EMS is to control tasks in the workflow based on success/failure reports of the backend systems. EMS functionality currently exists in PowerImage and EXP AG.

Control File

A control file defines how input files are read and how information is used in workflow. EMS control file is similar to a script in that it defines how files are read and how information is used in workflow. Below an extract:

INP1FDBP,0,-1,FDAY,FDAY,FDAY,FDAY,,,,,,COMPLETE,TEXT,T,TEXT,FDAY.PTS.INVESTOR.INP1FDBP.*,-1,-1
INP1FDBP,-1,DATE_MMDDYY,2,98
INP1FDBP,1,POWIMAGE,RPTCOFND,REPORT#,4,s,n,,2,130
INP1FDBP,2,POWIMAGE,RPTCOFND,COMPANY#,3,s,n,,1,4
INP1FDBP,3,POWIMAGE,RPTCOFND,FUND#,3,s,n,,2,4
INP1FDBP,-4,GROUP,RPTCOFND,y

File Conversion

EMS uses a control File to specify details on the processing. these files are picked up by a converter that will generate a spring configuration file. the xml file will contain the definition of the routes that will be used to process data files.

convertems -convert -controlfile Control File Path -targetfile Target File Path -validate Model File Path [-verbose]
convertems -validate Model File Path
convertems -help

As you can see from the example command below, the control file citi_ems.fdl will processed by the converter and the generated camel routes will stored in the file citi-camel-context.xml

converter.bat "-convert" "-controlfile" "Citi_ems.fdl""-targetfile","citi-camel-context.xml"
Parameter Name Description
controlfile control file location
targetfile generated file location
validate Enables validation of the generated xml file with an XPDL model.
modelfile Model file location

If the validation is enabled, the converter will check the data present int the model and map these data to the data provided by routes.

Data Extraction

The dataExtractor allows to extract a specific data from a text block. It either allows to address a start point for the extraction by providing (row,column) in the text block or allows to look for a pattern targetString and starts extraction from (offsetRow,offsetColumn) from the first occurences of targetString. In both cases up to maxCharacters characters are added to the extracted string. The extracted string is added to a map with the key dataId to be passed as data to the process started. dataType indicateds the data type of the extracted metadata.Routes may use a chain of dataExtractor filters to extract multiple metadata.

Routes Generation

The Souk project generates Camel routes for text processing. The routes includes:

Routes example

<route>
	<from uri="file://c:/data?filter=#fileFilter1"/>
	<split streaming="true">
		<method bean="linesplitterexample1" method="splitBody"/>
		<!--split file and create a group for each  records-->
		<aggregate strategyRef="pageAssemblerexample1" aggregationRepositoryRef="pagesrepoexample1">
			<correlationExpression>
				<constant>true</constant>
			</correlationExpression>
			<completionPredicate>
				<method ref="pageassemblerexample1" method="isCompleted"/>
			</completionPredicate>
			<!--Adding Extractors...-->
			<bean ref="dataextractorexample1"/>
			<!--Adding Workflow directive-->
			<to uri="ipp:authenticate:setCurrent?user=motu&password=motu"/>
			<to uri="ipp:process:start?processId=DataExtraction&dataMap=${body}"/>
		</aggregate>
	</split>
</route>

File Filter

The file filter is using Spring's AntPathMatcherGenericFileFilter to specify files to be included and/or excluded. Exclude take precedence over includes. If a file match both exclude and include it will be regarded as excluded.

<bean id="fileFilter1" class="org.apache.camel.component.file.AntPathMatcherGenericFileFilter">
		<property name="includes" value="M05_CMPCN_PBK*.TXT"/>
</bean>

Data Extraction

Class: com.infinity.integration.ems.extractor.DataExtractor

This class extracts data from a text by specifying the number of character to retrieve their location in the text (row, column) and a search type.

PropertyDescriptionType
maxCharactersMaximum number of character to retrieveint
dataIdThe data Identifier where the extracted value will be storedString
columnThe column value in the textint
rowThe row (line) value in the textint
searchTypeThe search type:
  • FIRST_PAGE_FIRST_IDENTIFIER = 's'
  • EACH_PAGE_FIRST_IDENTIFIER = 'S'
  • FIRST_PAGE_MULTIPPLE_IDENTIFIERS = 'm'
  • EACH_PAGE_MULTIPPLE_IDENTIFIERS = 'M'
  • IDENTIFIER_IN_FILENAME = 'F'
char
defaultValueDefault value to be used in case data not foundString

Example

<bean id="extractQTY" class="com.infinity.integration.ems.extractor.DataExtractor">
		<property name="maxCharacters" value="3"/>
		<property name="dataId" value="QTY"/>
		<property name="row" value="2"/>
		<property name="column" value="11"/>
		<property name="searchType" value="S"/>
		<property name="defaultValue" value=""/>
</bean>

Input
1STSABC123XYZ12
2BC123XYZ1200STST66
Output
QTY=200

Data Extraction Strategy

Class: com.infinity.integration.ems.extractor.DataExtractionStrategy

The data extraction strategy define a list of data extraction details plus other properties (sunch as a reason code and departement in the following example).

PropertyDescriptionType
statusStrategy status (enum?)String
extractorsData extractor object listList(com.infinity.integration.ems.extractor.DataExtractor)
reasonCodeReason codeString
DepartmentDepartment nameString

Example

In the following example, the Instrument and Qty are extracted.

 <bean id="dataextractorexample2" class="com.infinity.integration.ems.extractor.DataExtractionStrategy">
	<property name="status" value="PROCESS"/>
	<property name="extractors">
		<list>
			<ref bean="extractorInst"/>
			<ref bean="extractorQty"/>
		</list>
	</property>
	<property name="reasonCode" value="RESAON"/>
	<property name="departement" value="CMPCN"/>
</bean>

Text splitters

Line Splitter

Split a text by lines using line break (cr) as a separator.

<bean id="lineSplitter" class="com.infinity.integration.ems.splitter.LineSplitter"/>
public class LineSplitter implements ISplitter
{
   /**
    * Logger for this class
    */
   private static final Logger logger = Logger.getLogger(LineSplitter.class);

   private final static char LINE_DELIMITER = '\n';

   public List splitBody(File file) throws ServiceException
   {
      logger.info("splitBody --> Splitting file using LINE DELIMITER");
      List response = FileUtil.retrieveContent(file, LINE_DELIMITER);
      if (logger.isDebugEnabled())
      {
         logger.debug("\t file " + file.getAbsolutePath() + " using LINE DELIMITER");
         for (Line line : response)
            logger.debug("\t" + line);

      }
      logger.info("splitBody <-- File splitted found <" + response.size() + "> lines...");
      return response;
   }
}

Pagebreak splitter

Split a text by page using page break as separator.

<bean id="lineSplitter" class="com.infinity.integration.ems.splitter.PageBreakSplitter"/>

Break based on the number of lines???

Page Assembler

Page assembler role is to aggregate a number of pages (page size) together.
	<bean id="pageAssemble1" class="com.infinity.integration.ems.converter.assembler.PageAssembler">
	<property name="pageSize" value="1"/>
	</bean>

Workflow directives

The workflow directives include authentication details and the process name to start with its required input data.

	<to uri="ipp:authenticate:setCurrent?user=motu&password=motu"/>
	<to uri="ipp:process:start?processId=DataExtraction&dataMap=${body}"/>

Packages

PackageCasses
com.infinity.integration.ems.converter.utils BeansUtils.java
CamelContextUtils.java
FileUtil.java
KeyValue.java
Line.java
PageUtil.java
StringUtil.java
com.infinity.integration.ems.converter AttachProcessDirective.java
Constant.java
ConversionDirective.java
Converter.java
DataExtractionDirective.java
DateDataExtractionDirective.java
GroupingRecordDirective.java
Indent.java
PagesFilterDirective.java
ProcessingDirective.java
SplitPagesDirective.java
StartProcessDirective.java
StringDataExtractionDirective.java
TimeDataExtractionDirective.java
WorkflowDirective.java
com.infinity.integration.ems.converter.xmlConversionDirectiveXmlGenerator.java
ConversionDirectivesXmlGenerator.java
FromBlockGenerator.java
PagesFilterDirectiveXmlGenerator.java
SpringBeanGenerator.java