Wednesday, February 8, 2012

JAVA Garbage Collection


JAVA Garbage Collection
I was reading some of the articles about java garbage collection, so thought of posting it. This post is completely based on my personal understanding of garbage collection and can’t claim that everything I am writing is correct. So, please read it at your own risk ;)

From the name it is clear that Garbage Collection is removing garbage from your neighborhood. By reading garbage word itself, we must be feeling to clean that out. Obviously we all like to live in clean place (except few who prefers to live in mess :) ). Same applies to a JAVA applications, there having too much of garbage can cause OutOfMemory error and our application may crash. Our first task is to find out what could be the garbage for a java application.
Lets look into JAVA memory model first.




- PC Register: It stores the pointer to the next JVM instruction to be executed. It will keep on changing as the execution progresses to next JVM instruction. So, it does not look like garbage.
- Method Stacks: In above diagram I have mentioned 2 stacks - JVM and native method stacks.
JVM stacks keeps frames for every thread. Thread adds one frame for every method execution. Every frame stores local variables, intermediate calculations of a method execution, method parameters and return value. Once the thread execution completes, the respective frame will be removed from stack. So, this memory area also does not seem to be a candidate for garbage collection. Native Method Stacks is also similar to normal stacks except that It keeps frames for a native method execution.
- Method Area: The method area stores class structure. That includes methods, constructors, constants and static fields.This can be garbage collected sometimes. If a class becomes unreferenced by application, then JVM will unload this class by using Garbage collection. This can happen in some specific scenarios, I will not be discussing this type of garbage collection in this blog.
- Heap: Heap stores class instances(objects) and arrays. Throughout execution of any JAVA application, we create numerous objects. These objects are of varying size depending on class structure. This is the area which consumes maximum of memory among all memory areas. JVM does not provide control to programmer to explicitly remove any object from Heap, once that object is no longer in use. Due to large sizes of objects we must need to deallocate this memory time to time once we feel that some of the unused objects are no longer in use. To reclaim the memory occupied by the unused object JVM uses it’s own Garbage Collector, which runs as a daemon thread by JVM and reclaims unused objects from the heap memory.

Now we know the memory area which requires to be garbage collected.

Now we can formally say:
Garbage Collection is a process using which JVM reclaims the object from heap memory, once the object is assumed to be no longer needed.
In C++, It was programmer’s responsibility to manually release these dynamically allocated objects, so it was more error prone. In case where due to programmer’s error if lots of unused objects are present in heap memory and not reclaimed, then it may lead to Out Of Memory error and ultimately the application will crash. In JAVA, this task is done automatically by JVM, so it frees programmer from this complicated task and let him concentrate more on logic implementation.
JVM uses a daemon thread called Garbage Collector(GC) to execute the garbage collection task. Before reclaiming an object from Heap memory, it invokes finalize() method. The finalize() method is declared in the java.lang.Object class. It has below signature:
protected void finalize () throws throwable { //  }
finalize() method is called just before garbage collection. It is not called immediately after the object goes out of scope. So, we should not rely on this method for normal program operation. The intent of this method is to release system resources such as open files or open sockets before getting collected.

When an object will be garbage collected ?
An object will be eligible for garbage collection when it is unreachable from application code. Below are the 4 scenarios, which can make an object unreachanble:
1) When reference to an object is set to null explicitly.
method()1{
Object obj = new Object();
obj = null;
-----------
-----------
}
In above code, we create an instance of class Object. After that we explicitly set the reference obj to null. After this statement, we can’t reach the object created in 1st statement, so this object is eligible for garbage collection.
2) When object reference goes out of scope.
method2(){

for(int i=0;i<10;i++){

Object obj = new Object();

break;

}

----------------------------------

----------------------------------
}
In above code, once execution comes out of for loop, the object created inside the loop becomes unreachable. So, it became eligible for garbage collection.
3) Island of Isolation
If an object obj1 has internal reference to another object obj2 and obj2 has internal reference to obj1 but none of them has any outside reference. In this case both of the objects will not be reachable, then both obj1 and obj2 will be eligible for garbage collection.
class A{
B b;;
}
class B{
A a;
}
class C{
public static void main(String[] args){
A a = new A();
a.b = new B();
B b = new B();
B.a=new A();
a =null;
b=null;
//after above 2 lines a and b both are set to null. The objects referenced by a and     
// b has internal references to objects of type B and A respectively, but we can’t
// access those 2 objects. So those will be garbage collected.
}
}
4) Weak references
Weak reference does not prevent referred object from being garbage collected. Any object which does not have any other reference apart from weak reference, can be garbage collected.

Can we programmatically invoke GC?
We can't force JVM to run GC. We can only request JVM to do garbage collection, after that it is up to JVM whether to run GC immediately or after some time. System.gc() method is used for this purpose.


Now lets discuss some of the approaches of garbage collectors:
1) Reference Counting Collectors
In this approach JVM keeps count of references of every object in heap. Once  the count becomes zero the object will be reclaimed from heap. One of the disadvantage is overhead of incrementing/decrementing the counter. Another disadvantage is that it cannot identify those objects for garbage collection which comes under Island of Isolation category.
2) Tracing Collectors
In this approach, JVM first traces out the graph of object references starting with root node and marks all the referred objects as reachable. Other objects are assumed to be unreachable and later be reclaimed from heap. One of the tracing algorithm is “Mark and Sweep”, in which first phase is marking the object as reachable and in sweep phase it frees up the memory occupied by unmarked objects.
3) Compacting Collectors
Compacting collectors moves live objects towards one end of heap area. So, other end of heap become contiguous free area, from which new objects can be allocated.  This process could be slow because of overhead of moving live objects to one end of heap. This includes marking the objects as live, copying live objects to other end of heap and updating the object references to point to new location. Below picture shows the status of heap memory before and after GC run:
4) Copying Collectors
In this approach, all the live objects are copied to a new area in heap as contiguous area similar to compacting collectors. But it does not use mark and sweep algorithm. It copies live objects on the fly and updates the referencing pointers. This algorithm is known as “Stop and Copy”. It also divides heap into two parts similar to compacting collectors. It uses only one heap area out of two areas at once. Once one area is full, it moves all the live objects from this area to another heap area. One of the disadvantage of this approach is that it requires more heap area, since it uses only half of the heap area at once.
5) Generational Collectors
In above algorithms every object in being scanned and/or moved in every GC run.  With large number of objects in memory, these algorithms may not perform good. Also in practical scenario we may have observed that some objects have very short life and some objects have very long life. So, copying those long lived objects multiple times is not a good strategy from performance point of view. Generation collectors are designed to address these performance issues. Generational collectors divides the heap into multiple young and old heap areas. After surviving few GC runs the object is promoted to next level(from young to old). In this approach GC runs more often in younger heap area and less often in older heap area. That makes sure that short lived objects are reclaimed quickly and long lived objects are not scanned/moved more often.
The default arrangement of generations is shown in below picture:
(picture from http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html)

Young generation is divided into one Eden and two Survivor spaces. Initially after object creation, it gets allocated into Eden space. If object from Eden survives after next minor garbage collection, then it moves to Survivor 1 and then to Survivor 2. After major garbage collection if the object survives then it is promoted to Tenured area.  The perm generation stores data needed by the virtual machine to describe objects that do not have an equivalence at the Java language level. Example: objects describing classes and methods.
6) Train(Incremental) Collectors
JVM usually pauses other running programs while running GC. Collecting larger heap area may take longer time, that will ultimately cause other programs to stop for a longer time. To address this issue, incremental approach is used. In this approach, GC runs on a portion of heap area in one run than running on whole heap memory. This approach avoids other running programs to pause for longer time and does garbage collection incrementally.


JVM provides parameters to set JVM memory areas to different values. We can also set JVM parameters to set Heap memory components to different values to tune GC. Depending on our java application requirements, we may need more younger heap memory and lesser tenured memory or vice versa. I am not covering the JVM parameters used for these purposes.

Please let me know your feedback on this post.

4 comments:

  1. Very good explanation.....

    ReplyDelete
  2. Simple and crisp, that is what i would like to describe this article. This deals with one of the most important concepts of OOPS in a simple yet detailed manner.

    ReplyDelete