Saturday, October 20, 2012

Timer vs ScheduledThreadPoolExecutor

Many time we come across scenarios, when we need a background thread to keep running and execute tasks at scheduled time. JDK provides java.util.Timer and java.util.concurrent.ScheduledThreadPoolExecutor for this purpose.

java.util.Timer
Timer was introduces in jdk 1.3. 
There is a single background thread for every Timer object that executes all the assigned timer's tasks We can create an instance of TimerTask(which implements runnable) and add it to Timer object. The Timer object will take care of running the TimerTask at scheduled time. Timer runs the tasks sequentially. So, if there are 4 tasks to be executed by the same Timer object at the same time, and any one of those task takes longer time, the other remaining tasks may get delayed. We should keep this in mind while designing our system.


code example:


import java.util.Timer;

public class TimerTest {

/**
* @param args
*/
public static void main(String[] args) {
Timer timer = new Timer();
timer.schedule(new MyTask(), 1000);
}

}

class MyTask extends java.util.TimerTask{
public void run(){
System.out.println("Time task is running");
}
}


In above code, I scheduled a task to run after 1000 milliseconds(1 second). 

A task can be scheduled for single or recurring execution. You can refer javadoc for various implementation of Timer.schedule() method.
Timer class also provides cancel() method to terminate any scheduled task.

Timer class internally maintains a queue and all the TimerTask objects are inserted inside that queue. The TimerTask object maintains different states : VIRGIN, SCHEDULED, EXECUTED, CANCELLED. When a task is created, its initial state will be VIRGIN. Once it gets picked up by Timer for execution, its state will be changed to SCHEDULED. After execution its state becomes EXECUTED. If we mark a Task status to CANCELLED (by calling cancel() method), then it will never be picked by the Timer for execution.

java.util.concurrent.ScheduledThreadPoolExecutor
You have seen above that in  case of Timer you have single working thread which executes multiple tasks sequentially. But it may not fulfill your purpose, if you want to have parallel execution or if you have long running tasks. 

ScheduledThreadExecutor provides multiple working threads by using Thread pool. This class extends ThreadPoolExecutor class and uses thread pool to have a pool of threads to execute multiple tasks in parallel.
It provides similar kind of scheduling methods as Timer. You can refer javadoc for method details.
ScheduledThreadPoolExecutor was introduced in JDK 1.5. 


Code example:


import java.util.concurrent.ScheduledThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

public class
ScheduledThreadPoolExecutorTest
{

/**
* @param args
*/
public static void main(String[] args){
//creates thread pool of size 2
ScheduledThreadPoolExecutor threadPool = new ScheduledThreadPoolExecutor(2);
threadPool.schedule(new MyTask1(), 1, TimeUnit.SECONDS);
threadPool.schedule(new MyTask2(), 1, TimeUnit.SECONDS);
}

}
class MyTask1 implements Runnable
{
   public void run()
   {
      System.out.println("Task1 is running");
   }
}
class MyTask2 implements Runnable
{
   public void run()
   {
      System.out.println("Task2 is running");
   }
}


Above code runs MyTask1 and MyTask2 in parallel after 1 second with 2 threads in threadpool.

If you look at javadoc of ScheduledPoolExecutor, you can observe that it provides more richer implementation of methods than Timer.
It internally maintains java.util.concurrent.DelayQueue to keep all the runnable tasks. In DelayQueue, an object can be popped only when its delay has been expired.
You can also use Future and Callable using its submit() function, which will return Future representing the pending results of the task.
It provides shutDown() method to terminate the future executions, but the currently running task will be completed before shutdown.

In my opinion, one should go with ScheduledThreadPoolExecotor when multiple worker threads are required for parallel execution. Also when the requirement is complex and a richer infrastructure is required then ScheduledThreadPoolExecotor would be a good choice. For simple kind of implementation, Timer and TimerTask will be sufficient.

Thursday, May 10, 2012

Java utility to invoke Restful web services using Spring RestTemplate


This is a utility/framework to invoke Restful web services. Internally it uses Spring RestTemplate to invoke web services. Apart from spring jars, it uses some external jars. If your application already has these jars included, then you do not need to include these again.

You can download the RestFramework.rar file containing all the resources from below link:
https://docs.google.com/open?id=0B8O-miA80x0gUTBlSHoxQ0paWGc

The zip file contains below files:
  1.   RestfulWS.jar – jar to be included in your application. This has all the classes of this utility.
  2.  RestfulWS-src.jar – jar containing source code
  3.  External lib – other external jars required for RestfulWS.jar. Include these, if not present in your application.
  4.  Test – folder having all the test classes. Refer this to know how to use this utility.
  5. Class diagram.jpg - class diagram

Class Diagram:



If you are not able to view this class diagram, then you can refer the class diagram.jpg file included in RestFramework.rar.
In above class diagram, you can see that there are couple of concrete classes as well as couple of abstract classes. The concrete classes you can instantiate directly and use it by passing url, http method, and other required parameters through constructor and invoker method. The abstract classes at the lower end of the class hierarchy are meant to be used as a template. You can extend it and pass the required parameters(url, httpmethod, etc) by overriding corresponding methods in your concrete class.
This utility supports json as well as xml based Rest web services. If your request/response involves some other kind of data(for eg- in some web services call you may get array of bytes as response), then you can extend any of the suitable class in this hierarchy and add message converters(marshaller/unmarsheller) for that.

Details of interfaces, abstract classes and concrete classes:

    1) RestfulWebServiceInterface<WI,WO>
This is the top most interface parameterized with:
WI - input class type for web service
WO – class type for web service response

It has only one method execute(), which can be used to invoke web service.

    2) GenericRestfulWSInvoker<WI,WO>
This is an abstract class which implements RestfulWebServiceInterface<WI,WO>. It has implemented invoke() method of the RestfulWebServiceInterface. It also has multiple protected methods, which is used by execute() method internally. These methods can be overriden in subclasses if required.

    3)    JsonBasedRestfulWSInvoker<WI,WO>
This is a concrete class which extends GenericRestfulWSInvoker<WI,WO>. It overrides some of the protected methods of GenericRestfulWSInvoker<WI,WO> which is supposed to behave differently if the request and response type of web service is json. This class can be instantiated and web service can be invoked directly using instance of this class.
Please refer Test2.java in Test folder to refer sample code to use this.



    4)   XMLBasedRestfulWSInvoker<WI,WO>
This is a concrete class which extends GenericRestfulWSInvoker<WI,WO>. It overrides some of the protected methods of GenericRestfulWSInvoker<WI,WO> which is supposed to behave differently if the request and response type of web service is xml. This class can be instantiated and web service can be invoked directly using instance of this class.
Please refer Test1.java in Test folder to refer sample code to use this.

    5)  RestfulWSInvokerTemplateInterface<I,O,WI,WO>
This interface provides more generic and extended way of invoking web service. This interface can be used if your web service layer is isolated from your application’s other layer and you don’t want to expose the VO’s of web service layer (for request and response) to be exposed to other layer.
For e.g.:

In above diagram you can see, we have separate business layer and web service layer. Business layer passes object of type I. Web service layer accepts instance of I and converts that into instance of WI. This WI can then be converted into request and send to invoke web service call. Again response will be mapped to an instance of WO, which then be converted into instance of O and sent back to business layer. In this scenario “Business layer” is completely unknown about WI, WO and other web service specific details. If you follow this template pattern, then you need to create one concrete class for each of the web service call which will contain all the details about that particular web service call.

This interface has only one method invoke() to invoke web service

    6)  GenericRestfulWSInvokerTemplate<I, O, WI, WO>
This abstract class implements RestfulWSInvokerTemplateInterface<I,O,WI,WO> interface and extends GenericRestfulWSInvoker<WI,WO>.  It provides implementation of invoke() method. It also defines some of the abstract methods required for this template.

    7)   JsonBasedRestfulWSInvokerTemplate<I,O,WI,WO>
This is an abstract class which extends GenericRestfulWSInvokerTemplate<I, O, WI, WO>.
It overrides some of the protected methods of GenericRestfulWSInvoker<WI,WO> which is supposed to behave differently if the request and response type of web service is json. This class can be extended to create the concrete template class and web service can be invoked directly using instance of the extended class.
Please refer Test4.java in Test folder to refer sample code to use this. It also has a template class JsonBasedRestfulTemplateImpl.java which is used by Test4.java.

    8)  XMLBasedRestfulWSInvokerTemplate<I,O,WI,WO>
This is an abstract class which extends GenericRestfulWSInvokerTemplate<I, O, WI, WO>.
It overrides some of the protected methods of GenericRestfulWSInvoker<WI,WO> which is supposed to behave differently if the request and response type of web service is xml. This class can be extended to create the concrete template class and web service can be invoked directly using instance of the extended class.
Please refer Test3.java in Test folder to refer sample code to use this. It also has a template class XMLBasedRestfulTemplateImpl.java which is used by Test3.java.





Please let me know your thoughts about this utility, that will help me to improve this to make it more generic.








Tuesday, May 1, 2012

Python link checker to find broken links in a website

Recently I was doing an online course "CS 101:Building A Search Engine" at www.udacity.com. In this course I learnt about basics of python programming, crawlers and search engine. I thought of building a small utility using crawlers to locate broken links in a web site if any. This utility crawls every page of your web site and checks if all the links are accessible. In case of any error it returns the url along with their http error code. This utility is suitable for that kind of web sites where you have multiple pages connected through links(using "href") and does not have too many form submission and AJAX calls.

Algorithm:
  • Accepts 2 parameters - 1) start page url 2) max depth to which crawler runs.
  • It first crawls root page and add all the links into to-be-crawled list.
  • once a page is crawled, it gets removed from to-be-crawled list and added to crawled list along with its status code (for accessible page the status is "OK", for http error status will be http status code and for wrong urls the status will be "invalid url".
  • It keeps crawling all the links until it reaches the max depth.
  • After finishing crawling, it writes a file with name "site-health.txt" which will have all the urls along with their status.

Note:
  • This utility could be more useful during release phase or during support phase of the project where after every new build you want to make sure that all the links are working.
  • It does not crawl pages which has AJAX calls.
  • It only crawls pages which has links using "<a href="<url>"></a>
  • It crawls only those links which are internal to the domain name of the root url. It does not not crawl links external to root domain. For e.g: if your root url is www.a.com, and your website has link to an external site www.b.com. Then it will crawl all the links inside www.a.com domain, www.b.com, but it won't crawl links available on the site www.b.com. If you want to add some more domains for crawling then you may need to edit source code as follows:
    • find the statement domain = get_domain(root) in the source code, change this line to 
                   domain = get_domain(root, "b", "c", <other domains>)
                  then it will crawl root url domain and domain b, c and other domains given in above statement.
  • I have tested it using Python 2.7.2. Please make sure that you have Python 2.7.2 or later version installed in your machine.
Source code:
https://docs.google.com/open?id=0B8O-miA80x0gS2JnSkVqTkZtTWs

How to use:
  • Download  check-web-health.py from above link and open the source code in edit mode.
  • Go to the last line of the code. It has the line: check_web_health('http://google.com',2)
  • Edit this line to check_web_health(<url of start page>,<max depth of crawling>)
  • Save and run this program.
  • After this program exits, find a file with name "site-health.txt" in the same directory where the  check-web-health.py file is present. Each line in this file will have url along with its status.

My knowledge of Python programming is of intermediate level. So, probably there may be some issues with this utility. Please use it at your own risk :)

Please let me know the issues you faced while using this utility.

Thanks

Wednesday, February 15, 2012

Masking confidential information in log files with logback framework


Recently I came across one logging related issue where in I was supposed to mask some confidential information before logging. I was using logback framework. The simplest approach would be to programmatically mask the information before log statement.

E.g: Suppose we have a method logCCDetails(), which logs credit card details. Below is the pseudo code to do logging with masking.

logCCDetails(){
       LoggerFactory logger = LoggerFactory.getLogger(MyClass.class);
       logger.info(mask(accountNumber));
       logger.info(accountType);
}


//method to mask the confidential number
String mask(String acctNum){
//logic to replace all the digits of account number to 'X'
//then return the masked account number
}


So, basically we are using an utility method mask() to mask the confidential information before logger.info() statement. This approach works well if logger.info() statements for confidential information is in our application code. In those cases where we are using some external jars and that jar's code is having these logging statement, we can't add mask() method call before log statement, so this approach will not work.
In my case I was using Spring framework RestTemplate(available in spring-web jar) and apache HttpClient(available in commons-httpclient jar) to invoke web service calls using Json request/response.  Apache HttpClient internally logs every json request/response and headers before and after invoking web service call. There were some web service calls which had some confidential information in request/response. My task was to mask those confidential information in logs.
Below is the entry defined in logback.xml for web service calls:

<appender name="RESTSERVICE"
class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>restservice.log</file>
<append>true</append>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>restservice-%d{yyyy-MM-dd}.log</fileNamePattern>
<MaxHistory>1</MaxHistory>
</rollingPolicy>
<layout class="ch.qos.logback.classic.PatternLayout">
<Pattern>%d %-5p [%X{IPAddress}] [%X{SessionId}] %c - %m%n</Pattern>
</layout>
</appender>


As per the logback documentation, there is a conversion specifier called "replace", that can be used to replace strings in log statement. The format of this specifier is as follows:
replace(p){r, t} : Replaces occurrences of 'r', a regex, with its replacement 't' in the string produces by the sub-pattern 'p'. For example, "%replace(%msg){'\s', ''}" will remove all spaces contained in the event message. This can be used in above configuration inside <Pattern> tag.
<Pattern>%d %-5p [%X{IPAddress}] [%X{SessionId}] %c - %replace(%msg){'\s', ''}%n</Pattern>


I tried this in multiple ways, but it didn't work as expected. Probably I might be doing something wrong.

Then the solution I tried was to add my own custom Pattern Layout class. In above configuration the configured pattern layout class is ch.qos.logback.classic.PatternLayout provided by logback. I extended this class and overriden the doLayout() method as follows:


public class RestServicePatternLayout extends PatternLayout{


@Override
public String doLayout(LoggingEvent event) {
   String message=super.doLayout(event);
   if(event.getLoggerRemoteView().getName().equalsIgnoreCase("httpclient.wire.content")){
    message = MaskingUtil.maskConfidentialInfo( message );
   }
   return message;
 }
} 


public class  MaskingUtil {
     public static String maskConfidentialInformations(String message) {
      //  first match the pattern of confidential information
     // for e.g. - the pattern inside json for confidential info could be "confInfo":"1234".
    // Once match is found, replace all the matching strings with "X"
    }
}


This method returns the String back to the calling method, that will be printed in log files. In above code, I am comparing the logger name with string "httpclient.wire.content". This is the name which apache httpclient jar uses to initialize LogFactory for logging web service requests/responses. You can find this string in org.apache.commons.httpclient.Wire.java file.
Once the pattern matches, we mask the confidential information with X's. In above code I have written, a separate utility method to do this masking task.

After this, I configured this custom PatternLayout class in logback.xml as follows:

<appender name="RESTSERVICE"
class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>restservice.log</file>
<append>true</append>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>restservice-%d{yyyy-MM-dd}.log</fileNamePattern>
<MaxHistory>1</MaxHistory>
</rollingPolicy>
<layout class="com.my.logging.RestServicePatternLayout">
<Pattern>%d %-5p [%X{IPAddress}] [%X{SessionId}] %c - %m%n</Pattern>
</layout>
</appender>


One of the advantage what I could see is that, in logback.xml we can configure our custom layout to run for some specific appenders to improve performance. The appender should have logger name configured in such a way that it runs only for the log statements where we can expect the confidential information. We don't need to run it for all the log statements. Since we are doing String comparison every time before actual logging, it may impact performance if we run it for every log statement. So, to avoid this, I configured the "RESTSERVICE" appender only for below logger:



<logger name="httpclient.wire" additivity="false">
<level value="debug" />
<appender-ref ref="RESTSERVICE" />
</logger>
The logger name "httpclient.wire" is used only for web service requests/responses, so our custom layout will run only for those log statements. This way we can avoid unnecessary String comparison for other log statements.







Wednesday, February 8, 2012

JAVA Garbage Collection


JAVA Garbage Collection
I was reading some of the articles about java garbage collection, so thought of posting it. This post is completely based on my personal understanding of garbage collection and can’t claim that everything I am writing is correct. So, please read it at your own risk ;)

From the name it is clear that Garbage Collection is removing garbage from your neighborhood. By reading garbage word itself, we must be feeling to clean that out. Obviously we all like to live in clean place (except few who prefers to live in mess :) ). Same applies to a JAVA applications, there having too much of garbage can cause OutOfMemory error and our application may crash. Our first task is to find out what could be the garbage for a java application.
Lets look into JAVA memory model first.




- PC Register: It stores the pointer to the next JVM instruction to be executed. It will keep on changing as the execution progresses to next JVM instruction. So, it does not look like garbage.
- Method Stacks: In above diagram I have mentioned 2 stacks - JVM and native method stacks.
JVM stacks keeps frames for every thread. Thread adds one frame for every method execution. Every frame stores local variables, intermediate calculations of a method execution, method parameters and return value. Once the thread execution completes, the respective frame will be removed from stack. So, this memory area also does not seem to be a candidate for garbage collection. Native Method Stacks is also similar to normal stacks except that It keeps frames for a native method execution.
- Method Area: The method area stores class structure. That includes methods, constructors, constants and static fields.This can be garbage collected sometimes. If a class becomes unreferenced by application, then JVM will unload this class by using Garbage collection. This can happen in some specific scenarios, I will not be discussing this type of garbage collection in this blog.
- Heap: Heap stores class instances(objects) and arrays. Throughout execution of any JAVA application, we create numerous objects. These objects are of varying size depending on class structure. This is the area which consumes maximum of memory among all memory areas. JVM does not provide control to programmer to explicitly remove any object from Heap, once that object is no longer in use. Due to large sizes of objects we must need to deallocate this memory time to time once we feel that some of the unused objects are no longer in use. To reclaim the memory occupied by the unused object JVM uses it’s own Garbage Collector, which runs as a daemon thread by JVM and reclaims unused objects from the heap memory.

Now we know the memory area which requires to be garbage collected.

Now we can formally say:
Garbage Collection is a process using which JVM reclaims the object from heap memory, once the object is assumed to be no longer needed.
In C++, It was programmer’s responsibility to manually release these dynamically allocated objects, so it was more error prone. In case where due to programmer’s error if lots of unused objects are present in heap memory and not reclaimed, then it may lead to Out Of Memory error and ultimately the application will crash. In JAVA, this task is done automatically by JVM, so it frees programmer from this complicated task and let him concentrate more on logic implementation.
JVM uses a daemon thread called Garbage Collector(GC) to execute the garbage collection task. Before reclaiming an object from Heap memory, it invokes finalize() method. The finalize() method is declared in the java.lang.Object class. It has below signature:
protected void finalize () throws throwable { //  }
finalize() method is called just before garbage collection. It is not called immediately after the object goes out of scope. So, we should not rely on this method for normal program operation. The intent of this method is to release system resources such as open files or open sockets before getting collected.

When an object will be garbage collected ?
An object will be eligible for garbage collection when it is unreachable from application code. Below are the 4 scenarios, which can make an object unreachanble:
1) When reference to an object is set to null explicitly.
method()1{
Object obj = new Object();
obj = null;
-----------
-----------
}
In above code, we create an instance of class Object. After that we explicitly set the reference obj to null. After this statement, we can’t reach the object created in 1st statement, so this object is eligible for garbage collection.
2) When object reference goes out of scope.
method2(){

for(int i=0;i<10;i++){

Object obj = new Object();

break;

}

----------------------------------

----------------------------------
}
In above code, once execution comes out of for loop, the object created inside the loop becomes unreachable. So, it became eligible for garbage collection.
3) Island of Isolation
If an object obj1 has internal reference to another object obj2 and obj2 has internal reference to obj1 but none of them has any outside reference. In this case both of the objects will not be reachable, then both obj1 and obj2 will be eligible for garbage collection.
class A{
B b;;
}
class B{
A a;
}
class C{
public static void main(String[] args){
A a = new A();
a.b = new B();
B b = new B();
B.a=new A();
a =null;
b=null;
//after above 2 lines a and b both are set to null. The objects referenced by a and     
// b has internal references to objects of type B and A respectively, but we can’t
// access those 2 objects. So those will be garbage collected.
}
}
4) Weak references
Weak reference does not prevent referred object from being garbage collected. Any object which does not have any other reference apart from weak reference, can be garbage collected.

Can we programmatically invoke GC?
We can't force JVM to run GC. We can only request JVM to do garbage collection, after that it is up to JVM whether to run GC immediately or after some time. System.gc() method is used for this purpose.


Now lets discuss some of the approaches of garbage collectors:
1) Reference Counting Collectors
In this approach JVM keeps count of references of every object in heap. Once  the count becomes zero the object will be reclaimed from heap. One of the disadvantage is overhead of incrementing/decrementing the counter. Another disadvantage is that it cannot identify those objects for garbage collection which comes under Island of Isolation category.
2) Tracing Collectors
In this approach, JVM first traces out the graph of object references starting with root node and marks all the referred objects as reachable. Other objects are assumed to be unreachable and later be reclaimed from heap. One of the tracing algorithm is “Mark and Sweep”, in which first phase is marking the object as reachable and in sweep phase it frees up the memory occupied by unmarked objects.
3) Compacting Collectors
Compacting collectors moves live objects towards one end of heap area. So, other end of heap become contiguous free area, from which new objects can be allocated.  This process could be slow because of overhead of moving live objects to one end of heap. This includes marking the objects as live, copying live objects to other end of heap and updating the object references to point to new location. Below picture shows the status of heap memory before and after GC run:
4) Copying Collectors
In this approach, all the live objects are copied to a new area in heap as contiguous area similar to compacting collectors. But it does not use mark and sweep algorithm. It copies live objects on the fly and updates the referencing pointers. This algorithm is known as “Stop and Copy”. It also divides heap into two parts similar to compacting collectors. It uses only one heap area out of two areas at once. Once one area is full, it moves all the live objects from this area to another heap area. One of the disadvantage of this approach is that it requires more heap area, since it uses only half of the heap area at once.
5) Generational Collectors
In above algorithms every object in being scanned and/or moved in every GC run.  With large number of objects in memory, these algorithms may not perform good. Also in practical scenario we may have observed that some objects have very short life and some objects have very long life. So, copying those long lived objects multiple times is not a good strategy from performance point of view. Generation collectors are designed to address these performance issues. Generational collectors divides the heap into multiple young and old heap areas. After surviving few GC runs the object is promoted to next level(from young to old). In this approach GC runs more often in younger heap area and less often in older heap area. That makes sure that short lived objects are reclaimed quickly and long lived objects are not scanned/moved more often.
The default arrangement of generations is shown in below picture:
(picture from http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html)

Young generation is divided into one Eden and two Survivor spaces. Initially after object creation, it gets allocated into Eden space. If object from Eden survives after next minor garbage collection, then it moves to Survivor 1 and then to Survivor 2. After major garbage collection if the object survives then it is promoted to Tenured area.  The perm generation stores data needed by the virtual machine to describe objects that do not have an equivalence at the Java language level. Example: objects describing classes and methods.
6) Train(Incremental) Collectors
JVM usually pauses other running programs while running GC. Collecting larger heap area may take longer time, that will ultimately cause other programs to stop for a longer time. To address this issue, incremental approach is used. In this approach, GC runs on a portion of heap area in one run than running on whole heap memory. This approach avoids other running programs to pause for longer time and does garbage collection incrementally.


JVM provides parameters to set JVM memory areas to different values. We can also set JVM parameters to set Heap memory components to different values to tune GC. Depending on our java application requirements, we may need more younger heap memory and lesser tenured memory or vice versa. I am not covering the JVM parameters used for these purposes.

Please let me know your feedback on this post.

Tuesday, January 17, 2012

Server Push/Comet Programming/Long Polling


What is Server Push?
In normal client server architecture, client requests server for web pages and gets back the response. So, client actually tries to pull data from server. In some cases, we may want server to push the data back to browser once it is available without client explicitly requesting it.
Examples:
1) In email sites when new email arrives in inbox, server pushes the new mail back to browser without user refreshing the page.
2) In social networking sites, showing the new notifications to user without page refresh.
3) In chat applications, pushing the chat message back to targeted user.

In all above scenarios, server pushes the data back to browser without explicit client request. This approach is called Server Push.

In the current scenario what I am explaining, client requests server to execute some task and the status of that task execution was expected to be available in 2-5 minutes. The actual task execution happens at an external system. The web application submits the task to that external system and waits until that external system notifies our application with status message.
With Server Push approach, after receiving client request server returns the request thread back without any status and let the “Server Push” handle the flow of pushing back the data(task execution status) to browser once it is available after task execution.
I shall explain the solution of above stated problem in detail in below sections.

Why Server Push?
While finding out solution for these kind of long running tasks, I found below 3  approaches:

Approach 1: One of the approach may be to use normal client-server request-response cycle. In this approach client requests server for response. After receiving request, server creates a thread and keep it waiting until the response is available. With this traditional approach the execution will take place as below:

step1: Server creates a thread after receiving client request
step2: Server starts task execution.
step3: Inside service() method of Servlet
do{
waiting for task completion
waiting....
waiting....
waiting...
still waiting...... :(
timeout occured
}while(task completed);
step4: sends the response back to browser and releases the thread.

This approach keeps one server thread busy for the total task execution time(in this case 2-5 minutes).  That may not be scalable solution.

Approach 2: Another approach for this kind of long running task may be to use page refresh using <meta> tag. In this approach, we do page refresh multiple times and keep polling the server for task execution status. Each new page refresh maybe done after certain time interval. But even this approach does not look very scalable. We are opening and closing connection multiple times and during the connection time one server server thread is busy to poll the status.

Approach 3(Server push/Comet Programming/Long polling/Reverse Ajax): In this approach after receiving client request server submits task for execution and returns the current client thread immediately. It saves HttpResponse object in its memory. Once status is available, it uses the HttpResponse object(saved earlier) to send the response back to browser. On browser side, javascript captures the status message and displayed the message in browser. The execution flow is explained below:

step 1: Server creates a thread after receiving client request
step 2: Server starts task execution.
step 3: Server returns the thread back to thread  pool.
step 4: Once status is available, server takes 1 of the thread from thread pool and commits the  status in HttpResponse object.
step 4: Javascript uses this status message to display status text in browser.

Looking at the execution steps of all the 3 approaches discussed above, approach 3 seems more scalable.


JBoss support for Server Push
I used JBoss 6 as the application server for this feature. JBoss web can use Apache Portable Runtime(APR) to provide superior scalability, performance, and better integration with native server technologies. APR is highly portable library and it also provides advanced I/O functionalities. Server push can be done using this APR connections, because it supports asynchronous connection without blocking (most likely responding to some event raised from some other source).
JBoss web has org.jboss.servlet.http.HttpEventServlet interface which can be used to support server push functionality. The server push servlet implements HttpEventServlet and overrides event() method of this interface. This interface allows servlets to process I/O asynchronously. In this servlet event() method is invoked than invoking service() method of HttpServlet.
Following are the different types of events exist:
-       EventType.BEGIN - will be called at the beginning of the processing of the connection.
-       EventType.READ - indicates that input data is available.
-       EventType.END - called to end the processing of the request. After this event has been processed, the request and response objects, as well as all their dependent objects will be recycled.
-       EventType.EOF – event is used for chunked data transfer.
-       EventType.ERROR – called in case of IO error. Request and response objects, as well as all their dependent objects will be recycled and used to process other requests.
-       EventType.TIMEOUT - connection timed out according to the timeout value which has been set. This timeout value is set in jboss configuration(in jbossweb/server.xml) and can be overriden inside event() mthod.

Typical lifecycle of a request will consist in a series of events such as:
BEGIN -> READ -> READ -> READ -> TIMEOUT -> END.

In this case, lifecycle will be:
BEGIN -> “Push Data to Browser” -> END
        Or,
BEGIN -> TIMEOUT -> BEGIN -> TIMEOUT BEGIN -> TIMEOUT……. until data is available to push back to browser.

For details about APR connection and HttpEventServlet, you can go through below JBoss documentations:
http://docs.jboss.org/jbossweb/3.0.x/apr.html
http://docs.jboss.org/jbossweb/3.0.x/aio.html



Work-flow explanation






Step # 1 – Client connects to Server
-        Client JavaScript will make asynchronous HTTP request to server, once user requests any of the long running task.
-        Server will store the HttpResponse object (key=any unique id to identify user) in application memory, but returns thread immediately.
-         
Step # 2 – Task execution status is received from external source
-        When status is received from external source, server creates a new thread.
-        This thread will retrieve status details from the external source.
-        Then, this thread will retrieve the stored HttpResponse object from application memory using “unique id” as key and push status data to the response object.
-        After pushing data, connection is closed.
     
Step # 3 – Client displays data
-        Client displays data to user inside the correct “div” in the web page.
-        Client re-connects to server if there are any more task status is pending and the loop repeats again.

Note- The HttpResponse object need to be kept in application memory shared across whole application. We can use Cache memory or some other kind of storage for this purpose. It interacts both with browser requests and external source requests. Depending on the complexity of your requirement you can go with different kind of component for storing HttpResponse objects.

Pseudo code with explanation

Step 1: JBoss configuration
Add below entry inside jbossweb.sar/server.xml file to enable APR protocol:
<Connector connectionTimeout="20000" port="8484" protocol="org.apache.coyote.http11.Http11AprProtocol"
enableLookups="false" address="${jboss.bind.address}" redirectPort="${jboss.web.https.port}" />
With above configuration, every request which goes through 8484 port number, will use APR connection. The timeout for connection will be 20000 mili seconds. This timeout value can be overriden programatically inside HttpEventServlet implementation.
After this, tomcat native package needs to be installed in server. The details about installation of package is available at http://docs.jboss.org/jbossweb/3.0.x/apr.html.



Step 2: Creating Http APR Connection:
From browser java script invokes APR connection to server. Below is the java script function which does this task:
openConnection: function () {
       $.ajax({
           type: "GET",
           dataType: 'jsonp',
           async: true,
           jsonp: 'jsonp_callback',
           url: '/myApp/taskStatusServlet'
       });
   }
In this function, data type I have given as ‘jsonp’. Use this data type if you are using 8484 port for APR connection and some other port for normal Http connection. Due to some security features some of the browsers does not support  requesting data from a server in a different domain. This JSONP is the solution to this problem.

After this, we need to write servlet which implements HttpEventServlet. This servlet handles all the APR connection requests.

public class TaskStatusServlet extends HttpServlet implements
HttpEventServlet {



/**
* Process the given event.
*
* @param event
*            The event that will be processed
* @throws IOException
* @throws ServletException
*/
public void event(HttpEvent event) throws IOException, ServletException {
HttpServletRequest request = event.getHttpServletRequest();
HttpServletResponse response = event.getHttpServletResponse();
HttpSession session = request.getSession(false);
// If session has expired, return from here
if (null == session) {
event.close();
return;
}



switch (event.getType()) {
case BEGIN:
Logger.info("BEGIN for session: " + sessionId, this.getClass());
String uniqueId = (String) session.getAttribute(“uniqueId”); //get the unique id from session
//creates Event response object and add it into application memory
memoryManager.addToMemory(uniqueId , response );
event.setTimeout(120000);//setting the  connection timeout to 2 minutes
break;
case ERROR:
Logger.info("ERROR for session: " + sessionId, this.getClass());
event.close();
break;
case END:
Logger.info("END for session: " + sessionId, this.getClass());
event.close();
break;
case EOF:
Logger.info("EOF for session: " + sessionId, this.getClass());
event.close();
break;
case TIMEOUT:
Logger.info("TIMEOUT for session: " + sessionId, this.getClass());
userId = (String) session.getAttribute(“uniqueId”);
//removes the unique id and HttpResponse object from application memory
memoryManager.removeFromMemory(uniqueId);
PrintWriter writer = response.getWriter();
response.setContentType("text/javascript");
//call java script to create another long polling connection
writer.println("openConnection()"); //again calling javascript function send another APR connection
writer.flush();
writer.close();
event.close();
break;
case READ:
Logger.info("READ for session: " + sessionId, this.getClass());
// This event will never occur in our scenario
/*
* InputStream is = request.getInputStream(); byte[] buf = new
* byte[512]; while (is.available() > 0) { int n = is.read(buf);
* //can throw an IOException if (n > 0) { log("Read " + n +
* " bytes: " + new String(buf, 0, n) + " for session: " +
* sessionId); } else { //error(event, request, response); return; }
* }
*/
}
}

}

Step 3: Receiving response from external source
For this component, We exposed a web service that is called from that external source to push task exeution status.

HttpServletResponse res = memoryManager.getFromMemory(uniqueId);
//send the response back to browser
res.setContentType("text/javascript");
try {

PrintWriter writer = res.getWriter();
//calls the javascript to update the status in browser
riter.println("updateStatus(\'"+status+"\')");
writer.flush();
writer.close();

} catch (IOException ioe) {
Logger.error("Exception while ..", ioe, this.getClass());
}

//remove the http response object from application memory after sending back response to browser
memoryManager.removeFromMemory(uniqueId);


Step 4: Displaying status message in browser
In above step we invoked java script function updateStatus() to display data in browser. I am not giving code snippet of this java script function, as it depends completely on the html structure of your web page. You only need to select the correct <div> and add the message passed from server side inside this div.