public final class RawParseUtils extends Object
Modifier and Type | Field and Description |
---|---|
static Charset |
UTF8_CHARSET
UTF-8 charset constant.
|
Modifier and Type | Method and Description |
---|---|
static int |
author(byte[] b,
int ptr)
Locate the "author " header line data.
|
static int |
commitMessage(byte[] b,
int ptr)
Locate the position of the commit message body.
|
static int |
committer(byte[] b,
int ptr)
Locate the "committer " header line data.
|
static String |
decode(byte[] buffer)
Decode a buffer under UTF-8, if possible.
|
static String |
decode(byte[] buffer,
int start,
int end)
Decode a buffer under UTF-8, if possible.
|
static String |
decode(Charset cs,
byte[] buffer)
Decode a buffer under the specified character set if possible.
|
static String |
decode(Charset cs,
byte[] buffer,
int start,
int end)
Decode a region of the buffer under the specified character set if possible.
|
static String |
decodeNoFallback(Charset cs,
byte[] buffer,
int start,
int end)
Decode a region of the buffer under the specified character set if
possible.
|
static int |
encoding(byte[] b,
int ptr)
Locate the "encoding " header line.
|
static int |
endOfFooterLineKey(byte[] raw,
int ptr)
Locate the end of a footer line key string.
|
static int |
endOfParagraph(byte[] b,
int start)
Locate the end of a paragraph.
|
static String |
extractBinaryString(byte[] buffer,
int start,
int end)
Decode a region of the buffer under the ISO-8859-1 encoding.
|
static int |
formatBase10(byte[] b,
int o,
int value)
Format a base 10 numeric into a temporary buffer.
|
static IntList |
lineMap(byte[] buf,
int ptr,
int end)
Index the region between
[ptr, end) to find line starts. |
static int |
match(byte[] b,
int ptr,
byte[] src)
Determine if b[ptr] matches src.
|
static int |
next(byte[] b,
int ptr,
char chrA)
Locate the first position after a given character.
|
static int |
nextLF(byte[] b,
int ptr)
Locate the first position after the next LF.
|
static int |
nextLF(byte[] b,
int ptr,
char chrA)
Locate the first position after either the given character or LF.
|
static int |
parseBase10(byte[] b,
int ptr,
MutableInteger ptrResult)
Parse a base 10 numeric from a sequence of ASCII digits into an int.
|
static Charset |
parseEncoding(byte[] b)
Parse the "encoding " header into a character set reference.
|
static int |
parseHexInt16(byte[] bs,
int p)
Parse 4 character base 16 (hex) formatted string to unsigned integer.
|
static int |
parseHexInt32(byte[] bs,
int p)
Parse 8 character base 16 (hex) formatted string to unsigned integer.
|
static int |
parseHexInt4(byte digit)
Parse a single hex digit to its numeric value (0-15).
|
static long |
parseLongBase10(byte[] b,
int ptr,
MutableInteger ptrResult)
Parse a base 10 numeric from a sequence of ASCII digits into a long.
|
static PersonIdent |
parsePersonIdent(byte[] raw,
int nameB)
Parse a name line (e.g.
|
static PersonIdent |
parsePersonIdent(String in)
Parse a name string (e.g.
|
static PersonIdent |
parsePersonIdentOnly(byte[] raw,
int nameB)
Parse a name data (e.g.
|
static int |
parseTimeZoneOffset(byte[] b,
int ptr)
Parse a Git style timezone string.
|
static int |
prev(byte[] b,
int ptr,
char chrA)
Locate the first position before a given character.
|
static int |
prevLF(byte[] b,
int ptr)
Locate the first position before the previous LF.
|
static int |
prevLF(byte[] b,
int ptr,
char chrA)
Locate the previous position before either the given character or LF.
|
static int |
tagger(byte[] b,
int ptr)
Locate the "tagger " header line data.
|
static int |
tagMessage(byte[] b,
int ptr)
Locate the position of the tag message body.
|
public static final Charset UTF8_CHARSET
public static final int match(byte[] b, int ptr, byte[] src)
b
- the buffer to scan.ptr
- first position within b, this should match src[0].src
- the buffer to test for equality with b.public static int formatBase10(byte[] b, int o, int value)
Formatting is performed backwards. The method starts at offset
o-1
and ends at o-1-digits
, where
digits
is the number of positions necessary to store the
base 10 value.
The argument and return values from this method make it easy to chain writing, for example:
final byte[] tmp = new byte[64]; int ptr = tmp.length; tmp[--ptr] = '\n'; ptr = RawParseUtils.formatBase10(tmp, ptr, 32); tmp[--ptr] = ' '; ptr = RawParseUtils.formatBase10(tmp, ptr, 18); tmp[--ptr] = 0; final String str = new String(tmp, ptr, tmp.length - ptr);
b
- buffer to write into.o
- one offset past the location where writing will begin; writing
proceeds towards lower index values.value
- the value to store.o
. This is the position of
the last byte written. Additional writing should start at one
position earlier.public static final int parseBase10(byte[] b, int ptr, MutableInteger ptrResult)
Digit sequences can begin with an optional run of spaces before the sequence, and may start with a '+' or a '-' to indicate sign position. Any other characters will cause the method to stop and return the current result to the caller.
b
- buffer to scan.ptr
- position within buffer to start parsing digits at.ptrResult
- optional location to return the new ptr value through. If null
the ptr value will be discarded.public static final long parseLongBase10(byte[] b, int ptr, MutableInteger ptrResult)
Digit sequences can begin with an optional run of spaces before the sequence, and may start with a '+' or a '-' to indicate sign position. Any other characters will cause the method to stop and return the current result to the caller.
b
- buffer to scan.ptr
- position within buffer to start parsing digits at.ptrResult
- optional location to return the new ptr value through. If null
the ptr value will be discarded.public static final int parseHexInt16(byte[] bs, int p)
The number is read in network byte order, that is, most significant nybble first.
bs
- buffer to parse digits from; positions [p, p+4)
will
be parsed.p
- first position within the buffer to parse.ArrayIndexOutOfBoundsException
- if the string is not hex formatted.public static final int parseHexInt32(byte[] bs, int p)
The number is read in network byte order, that is, most significant nybble first.
bs
- buffer to parse digits from; positions [p, p+8)
will
be parsed.p
- first position within the buffer to parse.ArrayIndexOutOfBoundsException
- if the string is not hex formatted.public static final int parseHexInt4(byte digit)
digit
- hex character to parse.ArrayIndexOutOfBoundsException
- if the input digit is not a valid hex digit.public static final int parseTimeZoneOffset(byte[] b, int ptr)
The sequence "-0315" will be parsed as the numeric value -195, as the lower two positions count minutes, not 100ths of an hour.
b
- buffer to scan.ptr
- position within buffer to start parsing digits at.public static final int next(byte[] b, int ptr, char chrA)
b
- buffer to scan.ptr
- position within buffer to start looking for chrA at.chrA
- character to find.public static final int nextLF(byte[] b, int ptr)
This method stops on the first '\n' it finds.
b
- buffer to scan.ptr
- position within buffer to start looking for LF at.public static final int nextLF(byte[] b, int ptr, char chrA)
This method stops on the first match it finds from either chrA or '\n'.
b
- buffer to scan.ptr
- position within buffer to start looking for chrA or LF at.chrA
- character to find.public static final int prev(byte[] b, int ptr, char chrA)
b
- buffer to scan.ptr
- position within buffer to start looking for chrA at.chrA
- character to find.public static final int prevLF(byte[] b, int ptr)
This method stops on the first '\n' it finds.
b
- buffer to scan.ptr
- position within buffer to start looking for LF at.public static final int prevLF(byte[] b, int ptr, char chrA)
This method stops on the first match it finds from either chrA or '\n'.
b
- buffer to scan.ptr
- position within buffer to start looking for chrA or LF at.chrA
- character to find.public static final IntList lineMap(byte[] buf, int ptr, int end)
[ptr, end)
to find line starts.
The returned list is 1 indexed. Index 0 contains
Integer.MIN_VALUE
to pad the list out.
Using a 1 indexed list means that line numbers can be directly accessed
from the list, so list.get(1)
(aka get line 1) returns
ptr
.
The last element (index map.size()-1
) always contains
end
.
buf
- buffer to scan.ptr
- position within the buffer corresponding to the first byte of
line 1.end
- 1 past the end of the content within buf
.public static final int author(byte[] b, int ptr)
b
- buffer to scan.ptr
- position in buffer to start the scan at. Most callers should
pass 0 to ensure the scan starts from the beginning of the
commit buffer and does not accidentally look at message body.public static final int committer(byte[] b, int ptr)
b
- buffer to scan.ptr
- position in buffer to start the scan at. Most callers should
pass 0 to ensure the scan starts from the beginning of the
commit buffer and does not accidentally look at message body.public static final int tagger(byte[] b, int ptr)
b
- buffer to scan.ptr
- position in buffer to start the scan at. Most callers should
pass 0 to ensure the scan starts from the beginning of the tag
buffer and does not accidentally look at message body.public static final int encoding(byte[] b, int ptr)
b
- buffer to scan.ptr
- position in buffer to start the scan at. Most callers should
pass 0 to ensure the scan starts from the beginning of the
buffer and does not accidentally look at the message body.public static Charset parseEncoding(byte[] b)
Locates the "encoding " header (if present) by first calling
encoding(byte[], int)
and then returns the proper character set
to apply to this buffer to evaluate its contents as character data.
If no encoding header is present, Constants.CHARSET
is assumed.
b
- buffer to scan.public static PersonIdent parsePersonIdent(String in)
Leading spaces won't be trimmed from the string, i.e. will show up in the parsed name afterwards.
in
- the string to parse a name from.public static PersonIdent parsePersonIdent(byte[] raw, int nameB)
When passing in a value for nameB
callers should use the
return value of author(byte[], int)
or
committer(byte[], int)
, as these methods provide the proper
position within the buffer.
raw
- the buffer to parse character data from.nameB
- first position of the identity information. This should be the
first position after the space which delimits the header field
name (e.g. "author" or "committer") from the rest of the
identity line.public static PersonIdent parsePersonIdentOnly(byte[] raw, int nameB)
When passing in a value for nameB
callers should use the
return value of author(byte[], int)
or
committer(byte[], int)
, as these methods provide the proper
position within the buffer.
raw
- the buffer to parse character data from.nameB
- first position of the identity information. This should be the
first position after the space which delimits the header field
name (e.g. "author" or "committer") from the rest of the
identity line.public static int endOfFooterLineKey(byte[] raw, int ptr)
If the region at raw[ptr]
matches ^[A-Za-z0-9-]+:
(e.g.
"Signed-off-by: A. U. Thor\n") then this method returns the position of
the first ':'.
If the region at raw[ptr]
does not match ^[A-Za-z0-9-]+:
then this method returns -1.
raw
- buffer to scan.ptr
- first position within raw to consider as a footer line key.public static String decode(byte[] buffer)
buffer
- buffer to pull raw bytes from.[start,end)
,
after decoding the region through the specified character set.public static String decode(byte[] buffer, int start, int end)
buffer
- buffer to pull raw bytes from.start
- start position in bufferend
- one position past the last location within the buffer to take
data from.[start,end)
,
after decoding the region through the specified character set.public static String decode(Charset cs, byte[] buffer)
cs
- character set to use when decoding the buffer.buffer
- buffer to pull raw bytes from.[start,end)
,
after decoding the region through the specified character set.public static String decode(Charset cs, byte[] buffer, int start, int end)
cs
- character set to use when decoding the buffer.buffer
- buffer to pull raw bytes from.start
- first position within the buffer to take data from.end
- one position past the last location within the buffer to take
data from.[start,end)
,
after decoding the region through the specified character set.public static String decodeNoFallback(Charset cs, byte[] buffer, int start, int end) throws CharacterCodingException
cs
- character set to use when decoding the buffer.buffer
- buffer to pull raw bytes from.start
- first position within the buffer to take data from.end
- one position past the last location within the buffer to take
data from.[start,end)
,
after decoding the region through the specified character set.CharacterCodingException
- the input is not in any of the tested character sets.public static String extractBinaryString(byte[] buffer, int start, int end)
buffer
- buffer to pull raw bytes from.start
- first position within the buffer to take data from.end
- one position past the last location within the buffer to take
data from.[start,end)
.public static final int commitMessage(byte[] b, int ptr)
b
- buffer to scan.ptr
- position in buffer to start the scan at. Most callers should
pass 0 to ensure the scan starts from the beginning of the
commit buffer.public static final int tagMessage(byte[] b, int ptr)
b
- buffer to scan.ptr
- position in buffer to start the scan at. Most callers should
pass 0 to ensure the scan starts from the beginning of the tag
buffer.public static final int endOfParagraph(byte[] b, int start)
A paragraph is ended by two consecutive LF bytes or CRLF pairs
b
- buffer to scan.start
- position in buffer to start the scan at. Most callers will
want to pass the first position of the commit message (as
found by commitMessage(byte[], int)
.b.length
if no paragraph end could be located.Copyright © 2014. All rights reserved.