Component Report

Analyze a component for possible memory waste and other inefficiencies.

Introduction

A heap dump contains millions of objects. But which of those belong to your component? And what conclusions can you draw from them? This is where the Component Report can help.

Before starting, one has to decide what constitutes a component. Typically, a component is either a set of classes in a common root package or a set of classes loaded by the same class loader .

Using this root set of objects, the component report calculates a customized retained set. This retained set includes all objects kept alive by the root set. Additionally, it assumes that all objects that have become finalizable actually have been finalized and that also all soft references have been cleared.

Executing the Component Report

To run the report for a common root package, select the component report from the tool bar and provide a regular expression to match the package:

Regular expression to match common root package to be used for the component report.

Alternatively, one can group the class histogram by class loader and then right-click the appropriate class loader and select the component report:

Group histogram by class loader.

Overview

The component report is rendered as HTML. It is stored in a ZIP file next to the heap dump file.

Overview section of the component report.
  1. Details about the size, the number of classes, the number of objects and the number of different class loaders.
  2. The pie chart shows the size of the component relative to the total heap size.
  3. The Top Consumers section lists the biggest object, classes, class loader and packages which are retained by the component. It provides a good overview of what is actually kept alive by the component.
  4. Retained Set displays all objects grouped by classes which are retained.

Duplicate Strings

Duplicate Strings are a prime example for memory waste: multiple char arrays with identical content. To find the duplicates, the report groups the char arrays by their value. It lists all char arrays with 10 or more instances with identical content.

The content of the char arrays typically gives away ideas how to reduce the duplicates:
  • Sometimes the duplicate strings are used as keys or values in hash maps . For example, when reading heap dumps, MAT itself used to read the char constant denoting the type of an attribute into memory. It turned out that the heap was littered with many 'L's for references, 'B's for bytes, and 'Z's for booleans, etc. By replacing the char with an int , MAT could save some of the precious memory. Alternatively, Enumerations could do the same trick.
  • When reading XML documents , fragments like UTF-8 , tag names or tag content remains in memory. Again, think about using Enumerations for the repetitive content.
  • Another option is interning the String. This adds the string to a pool of strings which is maintained privately by the class String . For each unique string, the pool will keep on instance alive. However, if you are interning, make sure do it responsibly : A big pool of strings will have maintenance costs and one cannot rely on interned strings being garbage collected.

Empty Collections

Even if collections are empty, they usually consume memory through their internal object array. Imagine a tree structure where every node eagerly creates array lists to hold its children, but only a few nodes actually possess children.

One remedy is the lazy initialization of the collections: create the collection only when it is actually needed. To find out who is responsible for the empty collections, use the immediate dominators command.

Collection Fill Ratio

Just like empty ones, collections with only a few elements also take up a lot of memory. Again, the backing array of the collection is the main culprit. The examination of the fill ratios using a heap dump from a production system gives hints to what initial capacity to use.

Soft Reference Statistics

Soft references are cleared by the virtual machine in response to memory demand. Usually, soft references are used to implement caches: keep the objects around while there is sufficient memory, clear the objects if free memory becomes low.
  • Usually objects are cached, because they are expensive to re-create. Across a whole application, soft referenced objects might carry very different costs. However, the virtual machine cannot know this and clears the objects on some least recently used algorithm. From the outside, this is very unpredictable and difficult to fine tune.
  • Furthermore, soft references can impose a stop-the-world phase during garbage collection. Oversimplified, the GC marks the object graph behind the soft references while the virtual machine is stopped.

Finalizer Statistics

Objects which implement the finalize method are included in the component report, because those objects can have serious implications for the memory of a Java Virtual Machine:
  • Whenever an object with finalizer is created, a corresponding java.lang.ref.Finalizer object is created. If the object is only reachable via its finalizer, it is placed in the queue of the finalizer thread and processed. Only then the next garbage collection will actually free the memory. Therefore it takes at least two garbage collections until the memory is freed.
  • When using Sun's current virtual machine implementation, the finalizer thread is a single thread processing the finalizer objects sequentially. One blocking finalizer queue therefore can easily keep alive big chunks of memory (all those other objects ready to be finalized).
  • Depending on the actual algorithm, finalizer may require a stop-the-world pause during garbage collections. This, of course, can have serious implications for the responsiveness of the whole application.
  • Last not least, the time of execution of the finalizer is up to the VM and therefore unpredictable.

Map Collision Ratios

This sections analyzes the collision ratios of hash maps. Maps place the values in different buckets based on the hash code of the keys. If the hash code points to the same bucket, the elements inside the bucket are typically compared linearly.

High collision ratios can indicate sub-optimal hash codes. This is not a memory problem (a better hash code does not save space) but rather performance problem because of the linear access inside the buckets.