Thursday, May 10, 2012

Java utility to invoke Restful web services using Spring RestTemplate


This is a utility/framework to invoke Restful web services. Internally it uses Spring RestTemplate to invoke web services. Apart from spring jars, it uses some external jars. If your application already has these jars included, then you do not need to include these again.

You can download the RestFramework.rar file containing all the resources from below link:
https://docs.google.com/open?id=0B8O-miA80x0gUTBlSHoxQ0paWGc

The zip file contains below files:
  1.   RestfulWS.jar – jar to be included in your application. This has all the classes of this utility.
  2.  RestfulWS-src.jar – jar containing source code
  3.  External lib – other external jars required for RestfulWS.jar. Include these, if not present in your application.
  4.  Test – folder having all the test classes. Refer this to know how to use this utility.
  5. Class diagram.jpg - class diagram

Class Diagram:



If you are not able to view this class diagram, then you can refer the class diagram.jpg file included in RestFramework.rar.
In above class diagram, you can see that there are couple of concrete classes as well as couple of abstract classes. The concrete classes you can instantiate directly and use it by passing url, http method, and other required parameters through constructor and invoker method. The abstract classes at the lower end of the class hierarchy are meant to be used as a template. You can extend it and pass the required parameters(url, httpmethod, etc) by overriding corresponding methods in your concrete class.
This utility supports json as well as xml based Rest web services. If your request/response involves some other kind of data(for eg- in some web services call you may get array of bytes as response), then you can extend any of the suitable class in this hierarchy and add message converters(marshaller/unmarsheller) for that.

Details of interfaces, abstract classes and concrete classes:

    1) RestfulWebServiceInterface<WI,WO>
This is the top most interface parameterized with:
WI - input class type for web service
WO – class type for web service response

It has only one method execute(), which can be used to invoke web service.

    2) GenericRestfulWSInvoker<WI,WO>
This is an abstract class which implements RestfulWebServiceInterface<WI,WO>. It has implemented invoke() method of the RestfulWebServiceInterface. It also has multiple protected methods, which is used by execute() method internally. These methods can be overriden in subclasses if required.

    3)    JsonBasedRestfulWSInvoker<WI,WO>
This is a concrete class which extends GenericRestfulWSInvoker<WI,WO>. It overrides some of the protected methods of GenericRestfulWSInvoker<WI,WO> which is supposed to behave differently if the request and response type of web service is json. This class can be instantiated and web service can be invoked directly using instance of this class.
Please refer Test2.java in Test folder to refer sample code to use this.



    4)   XMLBasedRestfulWSInvoker<WI,WO>
This is a concrete class which extends GenericRestfulWSInvoker<WI,WO>. It overrides some of the protected methods of GenericRestfulWSInvoker<WI,WO> which is supposed to behave differently if the request and response type of web service is xml. This class can be instantiated and web service can be invoked directly using instance of this class.
Please refer Test1.java in Test folder to refer sample code to use this.

    5)  RestfulWSInvokerTemplateInterface<I,O,WI,WO>
This interface provides more generic and extended way of invoking web service. This interface can be used if your web service layer is isolated from your application’s other layer and you don’t want to expose the VO’s of web service layer (for request and response) to be exposed to other layer.
For e.g.:

In above diagram you can see, we have separate business layer and web service layer. Business layer passes object of type I. Web service layer accepts instance of I and converts that into instance of WI. This WI can then be converted into request and send to invoke web service call. Again response will be mapped to an instance of WO, which then be converted into instance of O and sent back to business layer. In this scenario “Business layer” is completely unknown about WI, WO and other web service specific details. If you follow this template pattern, then you need to create one concrete class for each of the web service call which will contain all the details about that particular web service call.

This interface has only one method invoke() to invoke web service

    6)  GenericRestfulWSInvokerTemplate<I, O, WI, WO>
This abstract class implements RestfulWSInvokerTemplateInterface<I,O,WI,WO> interface and extends GenericRestfulWSInvoker<WI,WO>.  It provides implementation of invoke() method. It also defines some of the abstract methods required for this template.

    7)   JsonBasedRestfulWSInvokerTemplate<I,O,WI,WO>
This is an abstract class which extends GenericRestfulWSInvokerTemplate<I, O, WI, WO>.
It overrides some of the protected methods of GenericRestfulWSInvoker<WI,WO> which is supposed to behave differently if the request and response type of web service is json. This class can be extended to create the concrete template class and web service can be invoked directly using instance of the extended class.
Please refer Test4.java in Test folder to refer sample code to use this. It also has a template class JsonBasedRestfulTemplateImpl.java which is used by Test4.java.

    8)  XMLBasedRestfulWSInvokerTemplate<I,O,WI,WO>
This is an abstract class which extends GenericRestfulWSInvokerTemplate<I, O, WI, WO>.
It overrides some of the protected methods of GenericRestfulWSInvoker<WI,WO> which is supposed to behave differently if the request and response type of web service is xml. This class can be extended to create the concrete template class and web service can be invoked directly using instance of the extended class.
Please refer Test3.java in Test folder to refer sample code to use this. It also has a template class XMLBasedRestfulTemplateImpl.java which is used by Test3.java.





Please let me know your thoughts about this utility, that will help me to improve this to make it more generic.








Tuesday, May 1, 2012

Python link checker to find broken links in a website

Recently I was doing an online course "CS 101:Building A Search Engine" at www.udacity.com. In this course I learnt about basics of python programming, crawlers and search engine. I thought of building a small utility using crawlers to locate broken links in a web site if any. This utility crawls every page of your web site and checks if all the links are accessible. In case of any error it returns the url along with their http error code. This utility is suitable for that kind of web sites where you have multiple pages connected through links(using "href") and does not have too many form submission and AJAX calls.

Algorithm:
  • Accepts 2 parameters - 1) start page url 2) max depth to which crawler runs.
  • It first crawls root page and add all the links into to-be-crawled list.
  • once a page is crawled, it gets removed from to-be-crawled list and added to crawled list along with its status code (for accessible page the status is "OK", for http error status will be http status code and for wrong urls the status will be "invalid url".
  • It keeps crawling all the links until it reaches the max depth.
  • After finishing crawling, it writes a file with name "site-health.txt" which will have all the urls along with their status.

Note:
  • This utility could be more useful during release phase or during support phase of the project where after every new build you want to make sure that all the links are working.
  • It does not crawl pages which has AJAX calls.
  • It only crawls pages which has links using "<a href="<url>"></a>
  • It crawls only those links which are internal to the domain name of the root url. It does not not crawl links external to root domain. For e.g: if your root url is www.a.com, and your website has link to an external site www.b.com. Then it will crawl all the links inside www.a.com domain, www.b.com, but it won't crawl links available on the site www.b.com. If you want to add some more domains for crawling then you may need to edit source code as follows:
    • find the statement domain = get_domain(root) in the source code, change this line to 
                   domain = get_domain(root, "b", "c", <other domains>)
                  then it will crawl root url domain and domain b, c and other domains given in above statement.
  • I have tested it using Python 2.7.2. Please make sure that you have Python 2.7.2 or later version installed in your machine.
Source code:
https://docs.google.com/open?id=0B8O-miA80x0gS2JnSkVqTkZtTWs

How to use:
  • Download  check-web-health.py from above link and open the source code in edit mode.
  • Go to the last line of the code. It has the line: check_web_health('http://google.com',2)
  • Edit this line to check_web_health(<url of start page>,<max depth of crawling>)
  • Save and run this program.
  • After this program exits, find a file with name "site-health.txt" in the same directory where the  check-web-health.py file is present. Each line in this file will have url along with its status.

My knowledge of Python programming is of intermediate level. So, probably there may be some issues with this utility. Please use it at your own risk :)

Please let me know the issues you faced while using this utility.

Thanks