org.eclipse.jgit.util
Class RawParseUtils

java.lang.Object
  extended by org.eclipse.jgit.util.RawParseUtils

public final class RawParseUtils
extends Object

Handy utility functions to parse raw object contents.


Field Summary
static Charset UTF8_CHARSET
          UTF-8 charset constant.
 
Method Summary
static int author(byte[] b, int ptr)
          Locate the "author " header line data.
static int commitMessage(byte[] b, int ptr)
          Locate the position of the commit message body.
static int committer(byte[] b, int ptr)
          Locate the "committer " header line data.
static String decode(byte[] buffer)
          Decode a buffer under UTF-8, if possible.
static String decode(byte[] buffer, int start, int end)
          Decode a buffer under UTF-8, if possible.
static String decode(Charset cs, byte[] buffer)
          Decode a buffer under the specified character set if possible.
static String decode(Charset cs, byte[] buffer, int start, int end)
          Decode a region of the buffer under the specified character set if possible.
static String decodeNoFallback(Charset cs, byte[] buffer, int start, int end)
          Decode a region of the buffer under the specified character set if possible.
static int encoding(byte[] b, int ptr)
          Locate the "encoding " header line.
static int endOfFooterLineKey(byte[] raw, int ptr)
          Locate the end of a footer line key string.
static int endOfParagraph(byte[] b, int start)
          Locate the end of a paragraph.
static String extractBinaryString(byte[] buffer, int start, int end)
          Decode a region of the buffer under the ISO-8859-1 encoding.
static int formatBase10(byte[] b, int o, int value)
          Format a base 10 numeric into a temporary buffer.
static IntList lineMap(byte[] buf, int ptr, int end)
          Index the region between [ptr, end) to find line starts.
static int match(byte[] b, int ptr, byte[] src)
          Determine if b[ptr] matches src.
static int next(byte[] b, int ptr, char chrA)
          Locate the first position after a given character.
static int nextLF(byte[] b, int ptr)
          Locate the first position after the next LF.
static int nextLF(byte[] b, int ptr, char chrA)
          Locate the first position after either the given character or LF.
static int parseBase10(byte[] b, int ptr, MutableInteger ptrResult)
          Parse a base 10 numeric from a sequence of ASCII digits into an int.
static Charset parseEncoding(byte[] b)
          Parse the "encoding " header into a character set reference.
static int parseHexInt16(byte[] bs, int p)
          Parse 4 character base 16 (hex) formatted string to unsigned integer.
static int parseHexInt32(byte[] bs, int p)
          Parse 8 character base 16 (hex) formatted string to unsigned integer.
static int parseHexInt4(byte digit)
          Parse a single hex digit to its numeric value (0-15).
static long parseLongBase10(byte[] b, int ptr, MutableInteger ptrResult)
          Parse a base 10 numeric from a sequence of ASCII digits into a long.
static PersonIdent parsePersonIdent(byte[] raw, int nameB)
          Parse a name line (e.g.
static PersonIdent parsePersonIdent(String in)
          Parse a name string (e.g.
static PersonIdent parsePersonIdentOnly(byte[] raw, int nameB)
          Parse a name data (e.g.
static int parseTimeZoneOffset(byte[] b, int ptr)
          Parse a Git style timezone string.
static int prev(byte[] b, int ptr, char chrA)
          Locate the first position before a given character.
static int prevLF(byte[] b, int ptr)
          Locate the first position before the previous LF.
static int prevLF(byte[] b, int ptr, char chrA)
          Locate the previous position before either the given character or LF.
static int tagger(byte[] b, int ptr)
          Locate the "tagger " header line data.
static int tagMessage(byte[] b, int ptr)
          Locate the position of the tag message body.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

UTF8_CHARSET

public static final Charset UTF8_CHARSET
UTF-8 charset constant.

Since:
2.2
Method Detail

match

public static final int match(byte[] b,
                              int ptr,
                              byte[] src)
Determine if b[ptr] matches src.

Parameters:
b - the buffer to scan.
ptr - first position within b, this should match src[0].
src - the buffer to test for equality with b.
Returns:
ptr + src.length if b[ptr..src.length] == src; else -1.

formatBase10

public static int formatBase10(byte[] b,
                               int o,
                               int value)
Format a base 10 numeric into a temporary buffer.

Formatting is performed backwards. The method starts at offset o-1 and ends at o-1-digits, where digits is the number of positions necessary to store the base 10 value.

The argument and return values from this method make it easy to chain writing, for example:

 final byte[] tmp = new byte[64];
 int ptr = tmp.length;
 tmp[--ptr] = '\n';
 ptr = RawParseUtils.formatBase10(tmp, ptr, 32);
 tmp[--ptr] = ' ';
 ptr = RawParseUtils.formatBase10(tmp, ptr, 18);
 tmp[--ptr] = 0;
 final String str = new String(tmp, ptr, tmp.length - ptr);
 

Parameters:
b - buffer to write into.
o - one offset past the location where writing will begin; writing proceeds towards lower index values.
value - the value to store.
Returns:
the new offset value o. This is the position of the last byte written. Additional writing should start at one position earlier.

parseBase10

public static final int parseBase10(byte[] b,
                                    int ptr,
                                    MutableInteger ptrResult)
Parse a base 10 numeric from a sequence of ASCII digits into an int.

Digit sequences can begin with an optional run of spaces before the sequence, and may start with a '+' or a '-' to indicate sign position. Any other characters will cause the method to stop and return the current result to the caller.

Parameters:
b - buffer to scan.
ptr - position within buffer to start parsing digits at.
ptrResult - optional location to return the new ptr value through. If null the ptr value will be discarded.
Returns:
the value at this location; 0 if the location is not a valid numeric.

parseLongBase10

public static final long parseLongBase10(byte[] b,
                                         int ptr,
                                         MutableInteger ptrResult)
Parse a base 10 numeric from a sequence of ASCII digits into a long.

Digit sequences can begin with an optional run of spaces before the sequence, and may start with a '+' or a '-' to indicate sign position. Any other characters will cause the method to stop and return the current result to the caller.

Parameters:
b - buffer to scan.
ptr - position within buffer to start parsing digits at.
ptrResult - optional location to return the new ptr value through. If null the ptr value will be discarded.
Returns:
the value at this location; 0 if the location is not a valid numeric.

parseHexInt16

public static final int parseHexInt16(byte[] bs,
                                      int p)
Parse 4 character base 16 (hex) formatted string to unsigned integer.

The number is read in network byte order, that is, most significant nybble first.

Parameters:
bs - buffer to parse digits from; positions [p, p+4) will be parsed.
p - first position within the buffer to parse.
Returns:
the integer value.
Throws:
ArrayIndexOutOfBoundsException - if the string is not hex formatted.

parseHexInt32

public static final int parseHexInt32(byte[] bs,
                                      int p)
Parse 8 character base 16 (hex) formatted string to unsigned integer.

The number is read in network byte order, that is, most significant nybble first.

Parameters:
bs - buffer to parse digits from; positions [p, p+8) will be parsed.
p - first position within the buffer to parse.
Returns:
the integer value.
Throws:
ArrayIndexOutOfBoundsException - if the string is not hex formatted.

parseHexInt4

public static final int parseHexInt4(byte digit)
Parse a single hex digit to its numeric value (0-15).

Parameters:
digit - hex character to parse.
Returns:
numeric value, in the range 0-15.
Throws:
ArrayIndexOutOfBoundsException - if the input digit is not a valid hex digit.

parseTimeZoneOffset

public static final int parseTimeZoneOffset(byte[] b,
                                            int ptr)
Parse a Git style timezone string.

The sequence "-0315" will be parsed as the numeric value -195, as the lower two positions count minutes, not 100ths of an hour.

Parameters:
b - buffer to scan.
ptr - position within buffer to start parsing digits at.
Returns:
the timezone at this location, expressed in minutes.

next

public static final int next(byte[] b,
                             int ptr,
                             char chrA)
Locate the first position after a given character.

Parameters:
b - buffer to scan.
ptr - position within buffer to start looking for chrA at.
chrA - character to find.
Returns:
new position just after chrA.

nextLF

public static final int nextLF(byte[] b,
                               int ptr)
Locate the first position after the next LF.

This method stops on the first '\n' it finds.

Parameters:
b - buffer to scan.
ptr - position within buffer to start looking for LF at.
Returns:
new position just after the first LF found.

nextLF

public static final int nextLF(byte[] b,
                               int ptr,
                               char chrA)
Locate the first position after either the given character or LF.

This method stops on the first match it finds from either chrA or '\n'.

Parameters:
b - buffer to scan.
ptr - position within buffer to start looking for chrA or LF at.
chrA - character to find.
Returns:
new position just after the first chrA or LF to be found.

prev

public static final int prev(byte[] b,
                             int ptr,
                             char chrA)
Locate the first position before a given character.

Parameters:
b - buffer to scan.
ptr - position within buffer to start looking for chrA at.
chrA - character to find.
Returns:
new position just before chrA, -1 for not found

prevLF

public static final int prevLF(byte[] b,
                               int ptr)
Locate the first position before the previous LF.

This method stops on the first '\n' it finds.

Parameters:
b - buffer to scan.
ptr - position within buffer to start looking for LF at.
Returns:
new position just before the first LF found, -1 for not found

prevLF

public static final int prevLF(byte[] b,
                               int ptr,
                               char chrA)
Locate the previous position before either the given character or LF.

This method stops on the first match it finds from either chrA or '\n'.

Parameters:
b - buffer to scan.
ptr - position within buffer to start looking for chrA or LF at.
chrA - character to find.
Returns:
new position just before the first chrA or LF to be found, -1 for not found

lineMap

public static final IntList lineMap(byte[] buf,
                                    int ptr,
                                    int end)
Index the region between [ptr, end) to find line starts.

The returned list is 1 indexed. Index 0 contains Integer.MIN_VALUE to pad the list out.

Using a 1 indexed list means that line numbers can be directly accessed from the list, so list.get(1) (aka get line 1) returns ptr.

The last element (index map.size()-1) always contains end.

Parameters:
buf - buffer to scan.
ptr - position within the buffer corresponding to the first byte of line 1.
end - 1 past the end of the content within buf.
Returns:
a line map indexing the start position of each line.

author

public static final int author(byte[] b,
                               int ptr)
Locate the "author " header line data.

Parameters:
b - buffer to scan.
ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the commit buffer and does not accidentally look at message body.
Returns:
position just after the space in "author ", so the first character of the author's name. If no author header can be located -1 is returned.

committer

public static final int committer(byte[] b,
                                  int ptr)
Locate the "committer " header line data.

Parameters:
b - buffer to scan.
ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the commit buffer and does not accidentally look at message body.
Returns:
position just after the space in "committer ", so the first character of the committer's name. If no committer header can be located -1 is returned.

tagger

public static final int tagger(byte[] b,
                               int ptr)
Locate the "tagger " header line data.

Parameters:
b - buffer to scan.
ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the tag buffer and does not accidentally look at message body.
Returns:
position just after the space in "tagger ", so the first character of the tagger's name. If no tagger header can be located -1 is returned.

encoding

public static final int encoding(byte[] b,
                                 int ptr)
Locate the "encoding " header line.

Parameters:
b - buffer to scan.
ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the buffer and does not accidentally look at the message body.
Returns:
position just after the space in "encoding ", so the first character of the encoding's name. If no encoding header can be located -1 is returned (and UTF-8 should be assumed).

parseEncoding

public static Charset parseEncoding(byte[] b)
Parse the "encoding " header into a character set reference.

Locates the "encoding " header (if present) by first calling encoding(byte[], int) and then returns the proper character set to apply to this buffer to evaluate its contents as character data.

If no encoding header is present, Constants.CHARSET is assumed.

Parameters:
b - buffer to scan.
Returns:
the Java character set representation. Never null.

parsePersonIdent

public static PersonIdent parsePersonIdent(String in)
Parse a name string (e.g. author, committer, tagger) into a PersonIdent.

Leading spaces won't be trimmed from the string, i.e. will show up in the parsed name afterwards.

Parameters:
in - the string to parse a name from.
Returns:
the parsed identity or null in case the identity could not be parsed.

parsePersonIdent

public static PersonIdent parsePersonIdent(byte[] raw,
                                           int nameB)
Parse a name line (e.g. author, committer, tagger) into a PersonIdent.

When passing in a value for nameB callers should use the return value of author(byte[], int) or committer(byte[], int), as these methods provide the proper position within the buffer.

Parameters:
raw - the buffer to parse character data from.
nameB - first position of the identity information. This should be the first position after the space which delimits the header field name (e.g. "author" or "committer") from the rest of the identity line.
Returns:
the parsed identity or null in case the identity could not be parsed.

parsePersonIdentOnly

public static PersonIdent parsePersonIdentOnly(byte[] raw,
                                               int nameB)
Parse a name data (e.g. as within a reflog) into a PersonIdent.

When passing in a value for nameB callers should use the return value of author(byte[], int) or committer(byte[], int), as these methods provide the proper position within the buffer.

Parameters:
raw - the buffer to parse character data from.
nameB - first position of the identity information. This should be the first position after the space which delimits the header field name (e.g. "author" or "committer") from the rest of the identity line.
Returns:
the parsed identity. Never null.

endOfFooterLineKey

public static int endOfFooterLineKey(byte[] raw,
                                     int ptr)
Locate the end of a footer line key string.

If the region at raw[ptr] matches ^[A-Za-z0-9-]+: (e.g. "Signed-off-by: A. U. Thor\n") then this method returns the position of the first ':'.

If the region at raw[ptr] does not match ^[A-Za-z0-9-]+: then this method returns -1.

Parameters:
raw - buffer to scan.
ptr - first position within raw to consider as a footer line key.
Returns:
position of the ':' which terminates the footer line key if this is otherwise a valid footer line key; otherwise -1.

decode

public static String decode(byte[] buffer)
Decode a buffer under UTF-8, if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.

Parameters:
buffer - buffer to pull raw bytes from.
Returns:
a string representation of the range [start,end), after decoding the region through the specified character set.

decode

public static String decode(byte[] buffer,
                            int start,
                            int end)
Decode a buffer under UTF-8, if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.

Parameters:
buffer - buffer to pull raw bytes from.
start - start position in buffer
end - one position past the last location within the buffer to take data from.
Returns:
a string representation of the range [start,end), after decoding the region through the specified character set.

decode

public static String decode(Charset cs,
                            byte[] buffer)
Decode a buffer under the specified character set if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.

Parameters:
cs - character set to use when decoding the buffer.
buffer - buffer to pull raw bytes from.
Returns:
a string representation of the range [start,end), after decoding the region through the specified character set.

decode

public static String decode(Charset cs,
                            byte[] buffer,
                            int start,
                            int end)
Decode a region of the buffer under the specified character set if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.

Parameters:
cs - character set to use when decoding the buffer.
buffer - buffer to pull raw bytes from.
start - first position within the buffer to take data from.
end - one position past the last location within the buffer to take data from.
Returns:
a string representation of the range [start,end), after decoding the region through the specified character set.

decodeNoFallback

public static String decodeNoFallback(Charset cs,
                                      byte[] buffer,
                                      int start,
                                      int end)
                               throws CharacterCodingException
Decode a region of the buffer under the specified character set if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, an exception is thrown.

Parameters:
cs - character set to use when decoding the buffer.
buffer - buffer to pull raw bytes from.
start - first position within the buffer to take data from.
end - one position past the last location within the buffer to take data from.
Returns:
a string representation of the range [start,end), after decoding the region through the specified character set.
Throws:
CharacterCodingException - the input is not in any of the tested character sets.

extractBinaryString

public static String extractBinaryString(byte[] buffer,
                                         int start,
                                         int end)
Decode a region of the buffer under the ISO-8859-1 encoding. Each byte is treated as a single character in the 8859-1 character encoding, performing a raw binary->char conversion.

Parameters:
buffer - buffer to pull raw bytes from.
start - first position within the buffer to take data from.
end - one position past the last location within the buffer to take data from.
Returns:
a string representation of the range [start,end).

commitMessage

public static final int commitMessage(byte[] b,
                                      int ptr)
Locate the position of the commit message body.

Parameters:
b - buffer to scan.
ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the commit buffer.
Returns:
position of the user's message buffer.

tagMessage

public static final int tagMessage(byte[] b,
                                   int ptr)
Locate the position of the tag message body.

Parameters:
b - buffer to scan.
ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the tag buffer.
Returns:
position of the user's message buffer.

endOfParagraph

public static final int endOfParagraph(byte[] b,
                                       int start)
Locate the end of a paragraph.

A paragraph is ended by two consecutive LF bytes.

Parameters:
b - buffer to scan.
start - position in buffer to start the scan at. Most callers will want to pass the first position of the commit message (as found by commitMessage(byte[], int).
Returns:
position of the LF at the end of the paragraph; b.length if no paragraph end could be located.


Copyright © 2013. All Rights Reserved.