public abstract class PackParser extends Object
ObjectInserter
.
Applications can acquire an instance of a parser from ObjectInserter's
ObjectInserter.newPackParser(InputStream)
method.
Implementations of ObjectInserter
should
subclass this type and provide their own logic for the various on*()
event methods declared to be abstract.
Modifier and Type | Class and Description |
---|---|
static class |
PackParser.ObjectTypeAndSize
Type and size information about an object in the database buffer.
|
static class |
PackParser.Source
Location data is being obtained from.
|
static class |
PackParser.UnresolvedDelta
Information about an unresolved delta in this pack stream.
|
Modifier | Constructor and Description |
---|---|
protected |
PackParser(ObjectDatabase odb,
InputStream src)
Initialize a pack parser.
|
Modifier and Type | Method and Description |
---|---|
protected byte[] |
buffer()
Get a temporary byte array for use by the caller.
|
protected abstract boolean |
checkCRC(int oldCRC)
Check the current CRC matches the expected value.
|
ObjectIdSubclassMap<ObjectId> |
getBaseObjectIds()
Get set of objects the incoming pack assumed for delta purposes
|
String |
getLockMessage()
Get the message to record with the pack lock.
|
ObjectIdSubclassMap<ObjectId> |
getNewObjectIds()
Get the new objects that were sent by the user
|
PackedObjectInfo |
getObject(int nth)
Get the information about the requested object.
|
int |
getObjectCount()
Get the number of objects in the stream.
|
long |
getPackSize()
Get the size of the newly created pack.
|
ReceivedPackStatistics |
getReceivedPackStatistics()
Returns the statistics of the parsed pack.
|
List<PackedObjectInfo> |
getSortedObjectList(Comparator<PackedObjectInfo> cmp)
Get all of the objects, sorted by their name.
|
boolean |
isAllowThin()
Whether a thin pack (missing base objects) is permitted.
|
boolean |
isCheckEofAfterPackFooter()
Whether the EOF should be read from the input after the footer.
|
protected boolean |
isCheckObjectCollisions()
Whether received objects are verified to prevent collisions.
|
boolean |
isExpectDataAfterPackFooter()
Whether there is data expected after the pack footer.
|
protected PackedObjectInfo |
newInfo(AnyObjectId id,
PackParser.UnresolvedDelta delta,
ObjectId deltaBase)
Construct a PackedObjectInfo instance for this parser.
|
protected abstract boolean |
onAppendBase(int typeCode,
byte[] data,
PackedObjectInfo info)
Provide the implementation with a base that was outside of the pack.
|
protected abstract void |
onBeginOfsDelta(long deltaStreamPosition,
long baseStreamPosition,
long inflatedSize)
Event notifying start of a delta referencing its base by offset.
|
protected abstract void |
onBeginRefDelta(long deltaStreamPosition,
AnyObjectId baseId,
long inflatedSize)
Event notifying start of a delta referencing its base by ObjectId.
|
protected abstract void |
onBeginWholeObject(long streamPosition,
int type,
long inflatedSize)
Event notifying the start of an object stored whole (not as a delta).
|
protected PackParser.UnresolvedDelta |
onEndDelta()
Event notifying the current object.
|
protected abstract void |
onEndThinPack()
Event indicating a thin pack has been completely processed.
|
protected abstract void |
onEndWholeObject(PackedObjectInfo info)
Event notifying the current object.
|
protected abstract void |
onInflatedObjectData(PackedObjectInfo obj,
int typeCode,
byte[] data)
Invoked for commits, trees, tags, and small blobs.
|
protected abstract void |
onObjectData(PackParser.Source src,
byte[] raw,
int pos,
int len)
Store (and/or checksum) a portion of an object's data.
|
protected abstract void |
onObjectHeader(PackParser.Source src,
byte[] raw,
int pos,
int len)
Store (and/or checksum) an object header.
|
protected abstract void |
onPackFooter(byte[] hash)
Provide the implementation with the original stream's pack footer.
|
protected abstract void |
onPackHeader(long objCnt)
Provide the implementation with the original stream's pack header.
|
protected abstract void |
onStoreStream(byte[] raw,
int pos,
int len)
Store bytes received from the raw stream.
|
PackLock |
parse(ProgressMonitor progress)
Parse the pack stream.
|
PackLock |
parse(ProgressMonitor receiving,
ProgressMonitor resolving)
Parse the pack stream.
|
protected abstract int |
readDatabase(byte[] dst,
int pos,
int cnt)
Read from the database's current position into the buffer.
|
protected PackParser.ObjectTypeAndSize |
readObjectHeader(PackParser.ObjectTypeAndSize info)
Read the header of the current object.
|
protected abstract PackParser.ObjectTypeAndSize |
seekDatabase(PackedObjectInfo obj,
PackParser.ObjectTypeAndSize info)
Reposition the database to re-read a previously stored object.
|
protected abstract PackParser.ObjectTypeAndSize |
seekDatabase(PackParser.UnresolvedDelta delta,
PackParser.ObjectTypeAndSize info)
Reposition the database to re-read a previously stored object.
|
void |
setAllowThin(boolean allow)
Configure this index pack instance to allow a thin pack.
|
void |
setCheckEofAfterPackFooter(boolean b)
Ensure EOF is read from the input stream after the footer.
|
protected void |
setCheckObjectCollisions(boolean check)
Enable checking for collisions with existing objects.
|
void |
setExpectDataAfterPackFooter(boolean e)
Set if there is additional data in InputStream after pack.
|
protected void |
setExpectedObjectCount(long expectedObjectCount)
Set the expected number of objects in the pack stream.
|
void |
setLockMessage(String msg)
Set the lock message for the incoming pack data.
|
void |
setMaxObjectSizeLimit(long limit)
Set the maximum allowed Git object size.
|
void |
setNeedBaseObjectIds(boolean b)
Configure this index pack instance to keep track of the objects assumed
for delta bases.
|
void |
setNeedNewObjectIds(boolean b)
Configure this index pack instance to keep track of new objects.
|
void |
setObjectChecker(ObjectChecker oc)
Configure the checker used to validate received objects.
|
void |
setObjectChecking(boolean on)
Configure the checker used to validate received objects.
|
protected void |
verifySafeObject(AnyObjectId id,
int type,
byte[] data)
Verify the integrity of the object.
|
protected PackParser(ObjectDatabase odb, InputStream src)
odb
- database the parser will write its objects into.src
- the stream the parser will read.public boolean isAllowThin()
true
if a thin pack (missing base objects) is permitted.public void setAllowThin(boolean allow)
Thin packs are sometimes used during network transfers to allow a delta to be sent without a base object. Such packs are not permitted on disk.
allow
- true to enable a thin pack.protected boolean isCheckObjectCollisions()
protected void setCheckObjectCollisions(boolean check)
By default PackParser looks for each received object in the repository. If the object already exists, the existing object is compared byte-for-byte with the newly received copy to ensure they are identical. The receive is aborted with an exception if any byte differs. This check is necessary to prevent an evil attacker from supplying a replacement object into this repository in the event that a discovery enabling SHA-1 collisions is made.
This check may be very costly to perform, and some repositories may have other ways to segregate newly received object data. The check is enabled by default, but can be explicitly disabled if the implementation can provide the same guarantee, or is willing to accept the risks associated with bypassing the check.
check
- true to enable collision checking (strongly encouraged).public void setNeedNewObjectIds(boolean b)
By default an index pack doesn't save the new objects that were created
when it was instantiated. Setting this flag to true
allows the
caller to use getNewObjectIds()
to retrieve that list.
b
- true
to enable keeping track of new objects.public void setNeedBaseObjectIds(boolean b)
By default an index pack doesn't save the objects that were used as delta
bases. Setting this flag to true
will allow the caller to use
getBaseObjectIds()
to retrieve that list.
b
- true
to enable keeping track of delta bases.public boolean isCheckEofAfterPackFooter()
public void setCheckEofAfterPackFooter(boolean b)
b
- true if the EOF should be read; false if it is not checked.public boolean isExpectDataAfterPackFooter()
public void setExpectDataAfterPackFooter(boolean e)
e
- true if there is additional data in InputStream after pack.
This requires the InputStream to support the mark and reset
functions.public ObjectIdSubclassMap<ObjectId> getNewObjectIds()
public ObjectIdSubclassMap<ObjectId> getBaseObjectIds()
public void setObjectChecker(ObjectChecker oc)
Usually object checking isn't necessary, as Git implementations only create valid objects in pack files. However, additional checking may be useful if processing data from an untrusted source.
oc
- the checker instance; null to disable object checking.public void setObjectChecking(boolean on)
Usually object checking isn't necessary, as Git implementations only create valid objects in pack files. However, additional checking may be useful if processing data from an untrusted source.
This is shorthand for:
setObjectChecker(on ? new ObjectChecker() : null);
on
- true to enable the default checker; false to disable it.public String getLockMessage()
public void setLockMessage(String msg)
msg
- if not null, the message to associate with the incoming data
while it is locked to prevent garbage collection.public void setMaxObjectSizeLimit(long limit)
If an object is larger than the given size the pack-parsing will throw an exception aborting the parsing.
limit
- the Git object size limit. If zero then there is not limit.public int getObjectCount()
The object count is only available after parse(ProgressMonitor)
has returned. The count may have been increased if the stream was a thin
pack, and missing bases objects were appending onto it by the subclass.
public PackedObjectInfo getObject(int nth)
The object information is only available after
parse(ProgressMonitor)
has returned.
nth
- index of the object in the stream. Must be between 0 and
getObjectCount()
-1.public List<PackedObjectInfo> getSortedObjectList(Comparator<PackedObjectInfo> cmp)
The object information is only available after
parse(ProgressMonitor)
has returned.
To maintain lower memory usage and good runtime performance, this method
sorts the objects in-place and therefore impacts the ordering presented
by getObject(int)
.
cmp
- comparison function, if null objects are stored by ObjectId.public long getPackSize()
This will also include the pack index size if an index was created. This method should only be called after pack parsing is finished.
public ReceivedPackStatistics getReceivedPackStatistics()
This should only be called after pack parsing is finished.
ReceivedPackStatistics
public final PackLock parse(ProgressMonitor progress) throws IOException
progress
- callback to provide progress feedback during parsing. If null,
NullProgressMonitor
will be used.setLockMessage(String)
.IOException
- the stream is malformed, or contains corrupt objects.public PackLock parse(ProgressMonitor receiving, ProgressMonitor resolving) throws IOException
receiving
- receives progress feedback during the initial receiving
objects phase. If null,
NullProgressMonitor
will be used.resolving
- receives progress feedback during the resolving objects phase.setLockMessage(String)
.IOException
- the stream is malformed, or contains corrupt objects.protected PackParser.ObjectTypeAndSize readObjectHeader(PackParser.ObjectTypeAndSize info) throws IOException
After the header has been parsed, this method automatically invokes
onObjectHeader(Source, byte[], int, int)
to allow the
implementation to update its internal checksums for the bytes read.
When this method returns the database will be positioned on the first byte of the deflated data stream.
info
- the info object to populate.info
, after populating.IOException
- the size cannot be read.protected void verifySafeObject(AnyObjectId id, int type, byte[] data) throws CorruptObjectException
id
- identity of the object to be checked.type
- the type of the object.data
- raw content of the object.CorruptObjectException
protected byte[] buffer()
protected PackedObjectInfo newInfo(AnyObjectId id, PackParser.UnresolvedDelta delta, ObjectId deltaBase)
id
- identity of the object to be tracked.delta
- if the object was previously an unresolved delta, this is the
delta object that was tracking it. Otherwise null.deltaBase
- if the object was previously an unresolved delta, this is the
ObjectId of the base of the delta. The base may be outside of
the pack stream if the stream was a thin-pack.protected void setExpectedObjectCount(long expectedObjectCount)
The object count in the pack header is not always correct for some Dfs pack files. e.g. INSERT pack always assume 1 object in the header since the actual object count is unknown when the pack is written.
If external implementation wants to overwrite the expectedObjectCount,
they should call this method during onPackHeader(long)
.
expectedObjectCount
- a long.protected abstract void onStoreStream(byte[] raw, int pos, int len) throws IOException
This method is invoked during parse(ProgressMonitor)
as data is
consumed from the incoming stream. Implementors may use this event to
archive the raw incoming stream to the destination repository in large
chunks, without paying attention to object boundaries.
The only component of the pack not supplied to this method is the last 20
bytes of the pack that comprise the trailing SHA-1 checksum. Those are
passed to onPackFooter(byte[])
.
raw
- buffer to copy data out of.pos
- first offset within the buffer that is valid.len
- number of bytes in the buffer that are valid.IOException
- the stream cannot be archived.protected abstract void onObjectHeader(PackParser.Source src, byte[] raw, int pos, int len) throws IOException
Invoked after any of the onBegin()
events. The entire header is
supplied in a single invocation, before any object data is supplied.
src
- where the data came fromraw
- buffer to read data from.pos
- first offset within buffer that is valid.len
- number of bytes in buffer that are valid.IOException
- the stream cannot be archived.protected abstract void onObjectData(PackParser.Source src, byte[] raw, int pos, int len) throws IOException
This method may be invoked multiple times per object, depending on the size of the object, the size of the parser's internal read buffer, and the alignment of the object relative to the read buffer.
Invoked after onObjectHeader(Source, byte[], int, int)
.
src
- where the data came fromraw
- buffer to read data from.pos
- first offset within buffer that is valid.len
- number of bytes in buffer that are valid.IOException
- the stream cannot be archived.protected abstract void onInflatedObjectData(PackedObjectInfo obj, int typeCode, byte[] data) throws IOException
obj
- the object info, populated.typeCode
- the type of the object.data
- inflated data for the object.IOException
- the object cannot be archived.protected abstract void onPackHeader(long objCnt) throws IOException
objCnt
- number of objects expected in the stream.IOException
- the implementation refuses to work with this many objects.protected abstract void onPackFooter(byte[] hash) throws IOException
hash
- the trailing 20 bytes of the pack, this is a SHA-1 checksum of
all of the pack data.IOException
- the stream cannot be archived.protected abstract boolean onAppendBase(int typeCode, byte[] data, PackedObjectInfo info) throws IOException
This event only occurs on a thin pack for base objects that were outside of the pack and came from the local repository. Usually an implementation uses this event to compress the base and append it onto the end of the pack, so the pack stays self-contained.
typeCode
- type of the base object.data
- complete content of the base object.info
- packed object information for this base. Implementors must
populate the CRC and offset members if returning true.info
should be included in the object list
returned by getSortedObjectList(Comparator)
, false if it
should not be included.IOException
- the base could not be included into the pack.protected abstract void onEndThinPack() throws IOException
This event is invoked only if a thin pack has delta references to objects external from the pack. The event is called after all of those deltas have been resolved.
IOException
- the pack cannot be archived.protected abstract PackParser.ObjectTypeAndSize seekDatabase(PackedObjectInfo obj, PackParser.ObjectTypeAndSize info) throws IOException
If the database is computing CRC-32 checksums for object data, it should reset its internal CRC instance during this method call.
obj
- the object position to begin reading from. This is from
newInfo(AnyObjectId, UnresolvedDelta, ObjectId)
.info
- object to populate with type and size.info
object.IOException
- the database cannot reposition to this location.protected abstract PackParser.ObjectTypeAndSize seekDatabase(PackParser.UnresolvedDelta delta, PackParser.ObjectTypeAndSize info) throws IOException
If the database is computing CRC-32 checksums for object data, it should reset its internal CRC instance during this method call.
delta
- the object position to begin reading from. This is an instance
previously returned by onEndDelta()
.info
- object to populate with type and size.info
object.IOException
- the database cannot reposition to this location.protected abstract int readDatabase(byte[] dst, int pos, int cnt) throws IOException
dst
- the buffer to copy read data into.pos
- position within dst
to start copying data into.cnt
- ideal target number of bytes to read. Actual read length may
be shorter.IOException
- the database cannot be accessed.protected abstract boolean checkCRC(int oldCRC)
This method is invoked when an object is read back in from the database and its data is used during delta resolution. The CRC is validated after the object has been fully read, allowing the parser to verify there was no silent data corruption.
Implementations are free to ignore this check by always returning true if they are performing other data integrity validations at a lower level.
oldCRC
- the prior CRC that was recorded during the first scan of the
object from the pack stream.protected abstract void onBeginWholeObject(long streamPosition, int type, long inflatedSize) throws IOException
streamPosition
- position of this object in the incoming stream.type
- type of the object; one of
Constants.OBJ_COMMIT
,
Constants.OBJ_TREE
,
Constants.OBJ_BLOB
, or
Constants.OBJ_TAG
.inflatedSize
- size of the object when fully inflated. The size stored within
the pack may be larger or smaller, and is not yet known.IOException
- the object cannot be recorded.protected abstract void onEndWholeObject(PackedObjectInfo info) throws IOException
info
- object information.IOException
- the object cannot be recorded.protected abstract void onBeginOfsDelta(long deltaStreamPosition, long baseStreamPosition, long inflatedSize) throws IOException
deltaStreamPosition
- position of this object in the incoming stream.baseStreamPosition
- position of the base object in the incoming stream. The base
must be before the delta, therefore baseStreamPosition
< deltaStreamPosition
. This is not the position
returned by a prior end object event.inflatedSize
- size of the delta when fully inflated. The size stored within
the pack may be larger or smaller, and is not yet known.IOException
- the object cannot be recorded.protected abstract void onBeginRefDelta(long deltaStreamPosition, AnyObjectId baseId, long inflatedSize) throws IOException
deltaStreamPosition
- position of this object in the incoming stream.baseId
- name of the base object. This object may be later in the
stream, or might not appear at all in the stream (in the case
of a thin-pack).inflatedSize
- size of the delta when fully inflated. The size stored within
the pack may be larger or smaller, and is not yet known.IOException
- the object cannot be recorded.protected PackParser.UnresolvedDelta onEndDelta() throws IOException
IOException
- the object cannot be recorded.Copyright © 2020 Eclipse JGit Project. All rights reserved.