SMILA (incubation) API documentation

org.eclipse.smila.utils.file
Class EncodingHelper

java.lang.Object
  extended by org.eclipse.smila.utils.file.EncodingHelper

public final class EncodingHelper
extends java.lang.Object

Utility class to help with common encoding problems.


Field Summary
static java.lang.String ENCODING_UTF_16BE
          Constant for the encoding UTF-16BE.
static java.lang.String ENCODING_UTF_16LE
          Constant for the encoding UTF-16LE.
static java.lang.String ENCODING_UTF_32BE
          Constant for the encoding UTF-32BE.
static java.lang.String ENCODING_UTF_32LE
          Constant for the encoding UTF-32LE.
static java.lang.String ENCODING_UTF_8
          Constant for the encoding UTF-8.
 
Method Summary
static java.lang.String convertToString(byte[] bytes)
          Converts a given byte[] to a String.
static java.lang.String getEncoding(byte[] bytes)
          Read bytes and detect encoding based on potential BOM marks or xml or html encoding information.
static java.lang.String getEncodingFromBOM(byte[] bom)
          Read bytes and detect encoding based on potential BOM marks.
static java.lang.String getEncodingFromContent(byte[] bytes)
          Read bytes and detect encoding based on potential xml or html encoding information from tags.
static boolean isMarkup(byte[] bytes)
          Checks if the given bytes array represents some kind of markup language (xml, html), by checking if the first non whitespace character is a <.
static boolean isSupportedEncoding(java.lang.String charset)
          Checks if the given charset is supported by the current java VM.
static byte[] removeBOM(byte[] originalBytes)
          Checks if the originalBytes contain a BOM and Removes the BOM from the byte array.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ENCODING_UTF_32BE

public static final java.lang.String ENCODING_UTF_32BE
Constant for the encoding UTF-32BE.

See Also:
Constant Field Values

ENCODING_UTF_32LE

public static final java.lang.String ENCODING_UTF_32LE
Constant for the encoding UTF-32LE.

See Also:
Constant Field Values

ENCODING_UTF_8

public static final java.lang.String ENCODING_UTF_8
Constant for the encoding UTF-8.

See Also:
Constant Field Values

ENCODING_UTF_16BE

public static final java.lang.String ENCODING_UTF_16BE
Constant for the encoding UTF-16BE.

See Also:
Constant Field Values

ENCODING_UTF_16LE

public static final java.lang.String ENCODING_UTF_16LE
Constant for the encoding UTF-16LE.

See Also:
Constant Field Values
Method Detail

convertToString

public static java.lang.String convertToString(byte[] bytes)
                                        throws java.io.IOException
Converts a given byte[] to a String. The method tries to detect the bytes encoding by checking for a BOM and checking for markup encoding information. If no encoding is detected or the detected encoding is invalid the method tries to convert to String using encoding UTF-8. If this fails it tries to convert using the platforms default encoding.

Parameters:
bytes - the bytes to convert to String
Returns:
the converted String
Throws:
java.io.IOException - if any error occurs

isSupportedEncoding

public static boolean isSupportedEncoding(java.lang.String charset)
Checks if the given charset is supported by the current java VM.

Parameters:
charset - the name of the charset.
Returns:
true if the charset is supported, false otherwise

getEncoding

public static java.lang.String getEncoding(byte[] bytes)
                                    throws java.io.IOException
Read bytes and detect encoding based on potential BOM marks or xml or html encoding information.

Parameters:
bytes - the byte[] to detect a encoding in
Returns:
the encoding of the bytes, or null if encoding could not be detected
Throws:
java.io.IOException - if any error occur

getEncodingFromBOM

public static java.lang.String getEncodingFromBOM(byte[] bom)
Read bytes and detect encoding based on potential BOM marks.

Parameters:
bom - the byte[] to detect a BOM in
Returns:
the encoding of the bytes, or null if encoding could not be detected

removeBOM

public static byte[] removeBOM(byte[] originalBytes)
Checks if the originalBytes contain a BOM and Removes the BOM from the byte array. The number of bytes removed depend on if the encoding uses a BOM. If the encoding does not use a BOM the originalBytes are returned. Otherwise the modified byte[]

Parameters:
originalBytes - the bytes to check for and remove the BOM
Returns:
the originalBytes if no BOM was found and removed, otherwise the originalBytes without the BOM

getEncodingFromContent

public static java.lang.String getEncodingFromContent(byte[] bytes)
                                               throws java.io.IOException
Read bytes and detect encoding based on potential xml or html encoding information from tags. Returns encoding if document is xml or html and if an encoding is defined; null otherwise Stops searching for an encoding. Does not allow a BOM at the start of the bytes.

Parameters:
bytes - the byte[] to detect a encoding in
Returns:
the encoding of the bytes, or null if encoding could not be detected
Throws:
java.io.IOException - if any error occur

isMarkup

public static boolean isMarkup(byte[] bytes)
                        throws java.io.IOException
Checks if the given bytes array represents some kind of markup language (xml, html), by checking if the first non whitespace character is a <. Does not allow a BOM at the start of the bytes.

Parameters:
bytes - the byte[] to check for markup content
Returns:
true if the bytes contain xml or html markup, false otherwise
Throws:
java.io.IOException - if any error occurs

SMILA (incubation) API documentation