SMILA 1.0 API documentation

org.eclipse.smila.importing.compounds.simple
Class SimpleCompoundExtractorService

java.lang.Object
  extended by org.eclipse.smila.importing.compounds.simple.SimpleCompoundExtractorService
All Implemented Interfaces:
CompoundExtractor

public class SimpleCompoundExtractorService
extends java.lang.Object
implements CompoundExtractor

Simple compound extractor that extracts only zip archives and gzip files.


Field Summary
protected  java.lang.String _encoding
          encoding.
protected  Log _log
          log.
protected  MimeTypeIdentifier _mimeTypeIdentifier
          mime type identifier service.
protected static java.lang.String APPLICATION_ZIP
          mime type for ZIP.
protected static java.util.Collection<java.lang.String> SUPPORTED_MIME_TYPES
          mime types for ZIP and GZIP.
 
Fields inherited from interface org.eclipse.smila.importing.compounds.CompoundExtractor
KEY_COMMENT, KEY_COMPOUNDS, KEY_COMPRESSED_SIZE, KEY_FILE_NAME, KEY_IS_COMPOUND, KEY_IS_ROOT_COMPOUND_RECORD, KEY_SIZE, KEY_TIME
 
Constructor Summary
SimpleCompoundExtractorService()
           
 
Method Summary
protected  void activate()
          service activation.
 boolean canExtract(java.io.File file)
          Can the file be extracted by the CompoundExtractor service?
 boolean canExtract(java.lang.String fileName, java.lang.String mimeType)
          check if we can handle this.
 boolean canExtract(java.net.URL url, java.lang.String mimeType)
          Can the file be extracted by the CompoundExtractor service?
protected  void deactivate()
          service deactivation.
 java.util.Iterator<Record> extract(java.io.InputStream compoundInputStream, java.lang.String fileName, java.lang.String contentAttachmentName)
          Extract the compounds (recursively) and return an iterator over the resulting records that have been created from the extracted compound.
 java.util.Iterator<Record> extract(java.io.InputStream compoundInputStream, java.lang.String fileName, java.lang.String mimeType, java.lang.String contentAttachmentName)
          Extract the compounds (recursively) and return an iterator over the resulting records that have been created from the extracted compound.
 void setMimeTypeIdentifier(MimeTypeIdentifier mimeTypeIdentifier)
           
 void unsetMimeTypeIdentifier(MimeTypeIdentifier mimeTypeIdentifier)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

APPLICATION_ZIP

protected static final java.lang.String APPLICATION_ZIP
mime type for ZIP.

See Also:
Constant Field Values

SUPPORTED_MIME_TYPES

protected static final java.util.Collection<java.lang.String> SUPPORTED_MIME_TYPES
mime types for ZIP and GZIP.


_mimeTypeIdentifier

protected MimeTypeIdentifier _mimeTypeIdentifier
mime type identifier service.


_log

protected final Log _log
log.


_encoding

protected java.lang.String _encoding
encoding.

Constructor Detail

SimpleCompoundExtractorService

public SimpleCompoundExtractorService()
Method Detail

activate

protected void activate()
service activation.


deactivate

protected void deactivate()
service deactivation.


canExtract

public boolean canExtract(java.io.File file)
Can the file be extracted by the CompoundExtractor service? The service may or may not invest the file more closely or may simply guess by the file extension. So a true result does not guarantee, that the file may be extracted without any exceptions.

Specified by:
canExtract in interface CompoundExtractor
Parameters:
file - the file in question.
Returns:
true if the given file can be extracted, false if not.

canExtract

public boolean canExtract(java.net.URL url,
                          java.lang.String mimeType)
Can the file be extracted by the CompoundExtractor service? The service may or may not invest the file more closely or may simply guess by the given mime type and the file extension. So a true result does not guarantee, that the file may be extracted without any exceptions.

Specified by:
canExtract in interface CompoundExtractor
Parameters:
url - URL in question
mimeType - mimetype (if any could be determined)
Returns:

canExtract

public boolean canExtract(java.lang.String fileName,
                          java.lang.String mimeType)
check if we can handle this.

Specified by:
canExtract in interface CompoundExtractor
Parameters:
fileName - the name of the file in question.
mimeType - mimetype (if any could be determined)
Returns:

extract

public java.util.Iterator<Record> extract(java.io.InputStream compoundInputStream,
                                          java.lang.String fileName,
                                          java.lang.String contentAttachmentName)
                                   throws CompoundExtractorException
Extract the compounds (recursively) and return an iterator over the resulting records that have been created from the extracted compound. The Extractor should also return a Record for the compound itself, also if the content of that record might be empty.

Specified by:
extract in interface CompoundExtractor
Parameters:
compoundInputStream - the input stream of the compound object.
fileName - the name of the file in question.
contentAttachmentName - name of attachment to store content of extracted elements in.
Returns:
an iterator for the records that resulted from the entries included in the compound along with their content. The Iterator must not be null but empty if there are no records to be extracted.
Throws:
CompoundExtractorException

extract

public java.util.Iterator<Record> extract(java.io.InputStream compoundInputStream,
                                          java.lang.String fileName,
                                          java.lang.String mimeType,
                                          java.lang.String contentAttachmentName)
                                   throws CompoundExtractorException
Extract the compounds (recursively) and return an iterator over the resulting records that have been created from the extracted compound. The Extractor should also return a Record for the compound itself, also if the content of that record might be empty.. This extract method extracts entries on the fly, i.e. you must not close the input stream prior to consume the last record from the iterator.

Specified by:
extract in interface CompoundExtractor
Parameters:
compoundInputStream - the input stream of the compound object.
fileName - the name of the file in question.
mimeType - mimetype (if any could be determined)
contentAttachmentName - name of attachment to store content of extracted elements in.
Returns:
an iterator for the records that resulted from the entries included in the compound along with their content. The Iterator must not be null but empty if there are no records to be extracted.
Throws:
CompoundExtractorException

setMimeTypeIdentifier

public void setMimeTypeIdentifier(MimeTypeIdentifier mimeTypeIdentifier)
Parameters:
mimeTypeIdentifier - the mimeTypeIdentifier to set

unsetMimeTypeIdentifier

public void unsetMimeTypeIdentifier(MimeTypeIdentifier mimeTypeIdentifier)
Parameters:
mimeTypeIdentifier - the mimeTypeIdentifier to set

SMILA 1.0 API documentation