Interpretation of jStat (Heap memory) to suspect java memory leak

In this blog I am going to discuss the memory leak in Java and understanding that how we can observe that there are memory leak and then analyse the memory leak.

Heap

Let’s understand the heap and non heap memory.

The Java heap is the amount of memory allocated to applications running in the JVM. Objects in heap memory can be shared between threads. The practical limit for Java heap size is typically limited to under 4-8GB even with Oracle’s current-generation G1 Collector due to the impact of garbage collection pauses.

This has three main part:

  • Eden Space: The pool from which memory is initially allocated for most objects.
  • Survivor Space: The pool containing objects that have survived the garbage collection of the Eden space.
  • Tenured Generation: The pool containing objects that have existed for some time in the survivor space.

Non Heap memory is amount of memory allocated to application for the loading the classes, other meta data which

  • Permanent Generation: The pool containing all the reflective data of the virtual machine itself, such as class and method objects. With Java VMs that use class data sharing, this generation is divided into read-only and read-write areas.
  • Code Cache: The HotSpot Java VM also includes a code cache, containing memory that is used for compilation and storage of native code.

For the basic monitoring of heap we can use the jStat utility. The jstat utility uses the built-in instrumentation in the Java HotSpot VM to provide information about performance and resource consumptionof running applications. The tool can be used when diagnosing performance issues, and in particular issues related to heap sizing and garbage collection. The jstat utility does not require the VM to be started with any special options. The built-in instrumentation in the Java HotSpot VM is enabled by default. This utility is included in the JDK download for all operating system platforms supported by Oracle.

jstat [ generalOption | outputOptions vmid [interval[s|ms] [count]] ]

Below are options for the jStat tool but in this blog I will discuss only the GC option.

 

Option Displays…
Class Statistics on the behavior of the class loader.
compiler Statistics of the behavior of the HotSpot Just-in-Time compiler.
Gc Statistics of the behavior of the garbage collected heap.
gccapacity Statistics of the capacities of the generations and their corresponding spaces.
gccause Summary of garbage collection statistics (same as -gcutil), with the cause of the last and current (if applicable) garbage collection events.
gcnew Statistics of the behavior of the new generation.
gcnewcapacity Statistics of the sizes of the new generations and its corresponding spaces.
Gcold Statistics of the behavior of the old and permanent generations.
gcoldcapacity Statistics of the sizes of the old generation.
gcpermcapacity Statistics of the sizes of the permanent generation.
gcutil Summary of garbage collection statistics.
printcompilation HotSpot compilation method statistics.

 

Let’s discuss the behaviour of Garbage collection to understand memory leak. Once jstat is used with the option below are details printed.

Garbage-collected heap statistics
Column Description
S0C Current survivor space 0 capacity (KB).
S1C Current survivor space 1 capacity (KB).
S0U Survivor space 0 utilization (KB).
S1U Survivor space 1 utilization (KB).
EC Current eden space capacity (KB).
EU Eden space utilization (KB).
OC Current old space capacity (KB).
OU Old space utilization (KB).
PC Current permanent space capacity (KB).
PU Permanent space utilization (KB).
YGC Number of young generation GC Events.
YGCT Young generation garbage collection time.
FGC Number of full GC events.
FGCT Full garbage collection time.
GCT Total garbage collection time.

Let’s see how we can interpret it. Here is sample of one of the application at various points and specially when a GC occurs.

Example1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Timestamp        S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC     PU    YGC     YGCT    FGC    FGCT     GCT 
46583.0 152832.0 153600.0 74470.6  0.0   741376.0 366911.0 2097152.0  1986626.3  72768.0 65442.1    432   32.979  19     23.297   56.276
46593.0 152832.0 153600.0 74470.6  0.0   741376.0 568393.9 2097152.0  1986626.3  72768.0 65447.5    432   32.979  19     23.297   56.276
46603.0 150144.0 153600.0  0.0    0.0   741376.0  9453.8  2097152.0  1006314.0  72128.0 64552.8    433   33.054  20     24.699   57.753    
47340.3 149696.0 158400.0 139822.9  0.0   731776.0 552192.7 2097152.0  1911806.9  72128.0 65312.9    446   34.058  20     24.699   58.758
47350.4 149696.0 158400.0 139822.9  0.0   731776.0 700174.6 2097152.0  1911806.9  72128.0 65330.9    446   34.058  20     24.699   58.758
47360.3 159680.0 124864.0  0.0    0.0   731776.0 103510.6 2097152.0  1047206.0  70976.0 64632.1    447   34.143  21     25.999   60.141
47370.4 159680.0 124864.0  0.0    0.0   731776.0 192150.3 2097152.0  1047206.0  70976.0 64633.6    447   34.143  21     25.999   60.141
58039.5 215488.0 265728.0 215475.3  0.0   517120.0 501874.6 2097152.0  1955789.1  66304.0 65702.4    696   53.018  35     46.035   99.053
58040.5 215488.0 265728.0 215475.3 146995.9 517120.0 517120.0 2097152.0  2014621.6  66304.0 65703.1    697   53.018  35     46.035   99.053
58041.5 280896.0 202880.0  0.0   202841.1 501760.0   0.0    2097152.0  2026636.8  66304.0 65703.1    697   53.110  36     46.035   99.145
58042.5 280896.0 202880.0  0.0    0.0   501760.0 26749.5  2097152.0  1227003.7  66112.0 64806.5    697   53.110  36     47.578  100.688
67032.7 296896.0 237952.0  0.0   237914.2 492224.0 492072.2 2097152.0  1998610.5  66176.0 65760.1   1043   77.918  52     71.807  149.725
67033.7 296896.0 313536.0 224865.4  0.0   421504.0   0.0    2097152.0  2036326.4  66176.0 65760.1   1044   78.017  53     71.807  149.824
67034.7 296896.0 313536.0  0.0    0.0   421504.0 36119.0  2097152.0  1296443.8  66176.0 64804.6   1044   78.017  53     73.261  151.278

 

Let’s see the one row and understand it..

Timestamp        S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC     PU    YGC     YGCT    FGC    FGCT     GCT 
46583.0 152832.0 153600.0 74470.6  0.0   741376.0 366911.0 2097152.0  1986626.3  72768.0 65442.1    432   32.979  19     23.297   56.276

 

In this row we can see there are Eden space utilization (EU) and Old Space utilization.

This process is running since 45583 second (13 hours) and total 19 full GC, 432 young GC, total time taken by the GC is 56 second(23 Sec by full GC and 33 Sec by young GC). Which means approx. 35-40 minutes for full GC and 2 minutes for young GC. Which seems to be healthy initially because if full GC come every 30-50 mins it is good for the system.

Below is sample for the frequent GC.

Example2

Timestamp        S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC     PU    YGC     YGCT    FGC    FGCT     GCT   
408558.1 349504.0 349504.0  0.0    0.0   349568.0 341903.6 2097152.0  2097142.3  65536.0 64873.5   9488  780.312 1652  4592.517 5372.829
408558.1 349504.0 349504.0  0.0    0.0   349568.0 341903.6 2097152.0  2097142.3  65536.0 64873.5   9488  780.312 1652  4592.517 5372.829

 

If we look on this Example2 it has approx. 1652 GC in occurrence 113 hours which depict approximately GC occurs in every 4 minutes. This is not healthy for the system. Also it has approx 5372 second took on the Garbage Collection means approx 1.5 hours system was in hung state which is approx 12% of time.

Here is another jStat sample for the same.

Example 3

Timestamp        S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC     PU    YGC     YGCT    FGC    FGCT     GCT   
333695.7 349504.0 349504.0  0.0    0.0   349568.0 349568.0 2097152.0  2097152.0  65536.0 64908.4   6720  535.716 2043  5042.068 5577.784
333695.8 349504.0 349504.0  0.0    0.0   349568.0 349568.0 2097152.0  2097152.0  65536.0 64908.4   6720  535.716 2043  5042.068 5577.784

If we look on this Example 3 it has approx. 2043 GC occurrence in 93 hours which depict approximately GC occurs in every 2.7 minutes. Also it has approx 5577 second took on the Garbage Collection means approx 1.55 hours system was in hung state which is approx 16% of time. This is even worse than previous Jstat example.

1
2
3
4
5
Timestamp        S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC     PU    YGC     YGCT    FGC    FGCT     GCT 
46603.0 150144.0 153600.0  0.0    0.0   741376.0  9453.8  2097152.0  1006314.0  72128.0 64552.8    433   33.054  20     24.699   57.753
47360.3 159680.0 124864.0  0.0    0.0   731776.0 103510.6 2097152.0  1047206.0  70976.0 64632.1    447   34.143  21     25.999   60.141
58042.5 280896.0 202880.0  0.0    0.0   501760.0 26749.5  2097152.0  1227003.7  66112.0 64806.5    697   53.110  36     47.578  100.688
67034.7 296896.0 313536.0  0.0    0.0   421504.0 36119.0  2097152.0  1296443.8  66176.0 64804.6   1044   78.017  53     73.261  151.278

Now let’s go back again to earlier example (extract defined in Example 4) for analysis to understand that how we can determine the memory leak. After each GC OU is increasing and initially at 46603 second it was 1006314.0 KB approx. at 67034 second it increased to 1296443.8 KB means in 5.6 hours system has consumed approx. 283MB of space as a retained heap. This is clear memory leak in the system and after some time system will trow out of memory exception. With this speed system will crash after some time. As the retained heap increases
over period of time the difference between two GC time decreases due to reducing heap space.

Now let’s do the analysis of same binary. For this take the heap dump using the following command.

1
jmap -dump:file=<file_name> <pid>

Once taken the heap dump process it using the memory analyser. For this there are many tools, I like the eclipse memory analyser tool (MAT). Here is link for the MAT
Memory Analyzer (MAT)

After opening the memory leak suspect there are two main memory leak suspect is visible.

Leak Suspect 1

One instance of "org.hibernate.impl.SessionFactoryImpl" loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x740db1d00" occupies 1,011,397,080 (40.25%) bytes. The memory is accumulated in one instance of "java.util.HashMap$Entry[]" loaded by "<system class loader>".

Keywords
org.hibernate.impl.SessionFactoryImpl
org.apache.catalina.loader.WebappClassLoader @ 0x740db1d00
java.util.HashMap$Entry[]

 

This is basically because of the improper query written the system. Generally If we expand this we found the it is hibernate statistics are problem here.

heap1

In the above it is visible that approx. 1 GB of heap is retained by the hibernate statistics. Further looking on the table and key level here is snap.

heap2

In the above figure one can easily figure it out the query statement is not prepared statement and it is memory consuming. Every statement with different value will become the key for the hibernate query caching and stored in the memory in retained heap. Solution for the same is to create a prepared statement like below.

“FROM com.abc.dto.xxxxxxDTO thd where thd.xxxid=:transactionID”

heap3

For high transactional system this is really killer because the entries in the retained heap is approximately 1.8 million.

Leak Suspect 2

The classloader/component "org.apache.catalina.loader.WebappClassLoader @ 0x740db1d00" occupies 457,168,920 (18.19%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "<system class loader>".

Keywords
org.apache.catalina.loader.WebappClassLoader @ 0x740db1d00
java.util.concurrent.ConcurrentHashMap$Segment[]

 

This leak suspect is for the concurrent hashmap is keep on increasing and not getting cleared. This is consuming approximately 450 MB of space in old generation and this is not collected by Garbage collector because this is defined as retained map.

heap4
Further exploring the same we can see the session map is retaining the heap. To handle this developer need to clear the map and clear the release the memory after the use otherwise slowly it will throw the out of memory error.

heap5

In the next blog I will discuss some more memory leaks and it’s solution.

 

4 Replies to “Interpretation of jStat (Heap memory) to suspect java memory leak”

  1. Upendra says:

    One of master piece over Java Heap, Garbage Collection and related memory leak. Sound guidance to developer over Concurrent Hash map and its retaining old references.

    1. dayashanker says:

      Thanks Upendra, You can give me some topics where you need more info I will try to provide the same.

  2. Anil Yadav says:

    Thanks Daya Sir for Explaining with Good Examples

  3. Anil Yadav says:

    Following issue i am able to identify from this.
    1. Frequent GC’s effect. System ideal etc.
    2. Heap Dump analysis.
    3. Usages of OU.

    Great way to explain the issues.

Leave a Reply

Your email address will not be published. Required fields are marked *