Monday, October 15, 2007

Choosing the Right Collection

Here are my thoughts on choosing the right collection for your application written in Java although conceptually speaking I think the same can be applied to other programming languages as well because the concept of "data structures" are the same.

What are Collections?
-------------------------
Collections are essentially a group of elements in data structure terms and in Java its a group of objects which are provided by the java.util package. Essentially they are containers which hold a collection of objects and also provide functionality to modify the data structure dynamically.

Types of Collections
------------------------
Note:- Iteration order means the way in which the objects will be iterated over or the way in which the elements of the collection will be accessed or inserted into the collection.

List
  • ArrayList - Indexed Collection and Random Access List. Iteration order is insertion order.
  • LinkedList - A Doubly Linked List and Sequential Access with FIFO behaviour. Iteration order is insertion order.
  • Vector - Synchronized version of the ArrayList with FIFO behaviour. Iteration order is insertion order.

Set

  • HashSet - A Collection that does not hold any duplicates. Iteration order is undefined.
  • LinkedHashSet - A collection with a doubly linked list with FIFO behaviour. Iteration order is insertion order.

Map

  • HashMap - Key-Value Pair collection with no synchornization and permits nulls. Iteration order is undefined.
  • LinkedHashMap - Key-Value Pair Collection with a doubly linked list with FIFO behaviour. Iteration order is insertion order of keys or access order of keys.
  • Hashtable - Key-Value Pair collection with synchronization and does not permit nulls. Iteration order is undefined.

Generally speaking for most general purposes you can use ArrayList, Arrays and LinkedList for most applications however certain scenarios and use cases in an application demands that you use the right collection for efficiency purposes. The following are some scenarios I can think of where I choose an appropriate collection to get the job done.

Email Service: -

In some applications you may choose to send the email asynchronously in this kind of a scenario we can use an ArrayList where we add the email message to the list to be sent later.

System Properties : -

During an application startup most applications look for System properties defined in a properties file which allow the application to be initialized with the necessary system defined properties for the application to function properly. In this scenario we can use HashMap or even Hashtable according to the situation. HashMap is preferred except when you need to be thread safe or you dont want nulls.

User Management :-

Imagine you have a User who may have many roles in a system and the application must ensure that he must not have duplicate roles or in others words the same role should not be assigned to him twice. HashSet is a good candidate for such scenarios.

Guidelines:-

1) ArrayList and Vector give best performance because they access objects using index. Vector takes slightly more time but it is negligible.

2) LinkedList gives worst performance when accessing objects at the middle because it has to scan nodes to access objects.

3) HashSet gives better performance than TreeSet. Use TreeSet when you want ordering.

4) Use Hashtable when you want to have a thread safe collection otherwise use HashMap. Use TreeMap when you want ordering.

What does the readers of my blog have to say about this? Please feel free to post your comments on this topic.

No comments: