Monday, December 21, 2009

Java Collections : Which one to use

When I initially started using Java and to be frank, until quite recently, I was never sure whether I am using the right collection for in my code.
Initially though, I never bothered about it and was using anyone which suites my needs (eg: Vector to store list of values and HashMap to store key value pair). I rarely used other implementations, unless my senior asked me to use it.

But later I did realise that using the right implementation for the right piece of work is important and can improve your code and the overall performance of your application.

I have put down a segment here for java developers who are having trouble making the decision on the right collection. to make it simpler for a java developer, I have put it in our language syntax, with if / else loop (The one we use most when we are starting up)

//Choose the collection
if ("Need to store key value pair")
{
  Use Map;
   //To choose the implementation
   if ("Map must be sorted by keys" && "Dont need to store null key")
   {
     Use TreeMap; //If speed is a major criteria, then you better use HashMap and sort it by your self.
   }
   else if ("Map must be fully synchronised" && "Must not allow null key and values")
   {
     Use HashTable;
   }
   else if ("Map must be thread safe while insertion" && "Must allow null key and values")
   {
     Use ConcurrentHashMap;
   }
   else if ("Map need not be thread safe" && "Must allow null key and values" && "Needs to be fast")
   {
     Use HashMap;
   }
   else if ("Need to store multiple values per key")
   {
     Use MultiMap;
   }
   else if ("Map must be fully synchronised" && "Must not allow null key and values" && "The order of insertion must be maintained")
   {
     Use LinkedHashMap;
   }
}
else if ("Must not store duplicates" && "Order of elements stored does not matter") //also, dont need key value pair, but have to store single values
{
   Use Set;
   //To choose the implementation
   if ("Set needs to be sorted")
   {
     Use TreeSet; //If speed is a major criteria, then you better use HashSet and sort it by your self.
   }
   else if ("Set need not be sorted" && "Speed is important")
   {
     Use HashSet;
   }
}
else if ("Will need to store duplicates" || "Order of elements stored matters" || "Need to have control over where the element is added")
{
   Use List;
   //To choose the implementation
   if ("Need to add and remove more frequently" || "Need the functionality of a Queue/Stack/Double-ended Queue")
   {
     Use LinkedList;
   }
   else if ("List must be synchronized")
   {
     Use Vector;
   }
   else if ("List need not be synchronized" && "Speed is important")
   {
     Use ArrayList;
   }
}
else if ("Amount of data is predictable" && "Amount of data is less")
{
   Use Array; //If searching is more important, use ordered array, else if insertion speed is important, use unordered array
}

There are ofcourse a lot more implementations of the collections. But I have tried to put in a few which are frequently used by us. And I believe, most of the operations can be handled by those mentioned above.

To know more about each implementation or collection, you can go through the Java API.

Please note that the above is based on my knowledge and if you see any mistakes, I am ready to correct myself.

2 comments:

  1. My opinion for synchronized List is to use Collection.synchronisedList().

    I is more efficient and faster than Vector.
    As all the methods are synchronized in vector, and it must be avoided.

    what do say?

    Yateesh

    ReplyDelete
  2. Good point Yateesh.

    Both Vector and SynchronizedList are synchronized.

    In case of vector the synchronization is at method level, Example:
    public synchronized E get(int index) {
    return (E)elementData[index];
    }

    And in case of SynchronizedList, the statements within the methods are synchronized on the list object. Example :
    public E get(int index) {
    synchronized(mutex) {return list.get(index);}
    }
    where mutex is the collection object.

    When you do Collections.synchronizedList(new ArrayList()); you end up creating 2 instances of List. Whereas, in case of Vector, you are creating only one instance.

    I agree that Vector is a legacy class and ArrayList is the modern version of it. But I don’t see much advantage of using SynchronizedList when compared to Vector in most of the cases. If there is, I would like to know, so that I can correct my mistakes.

    If too much synchronization is a worry in the project, the concurrent package in Java 5+ is a thing to look at. It has a good collection of new classes which allow multiple threads to access the collection object at the same time and locks only when elements are added to / removed from the collection object.

    ReplyDelete