In IT application world we see that applications are not performing well, due to this we spend lot of money in hardware, support by operations team and fire fighting in production environment. But have we analyzed that why we are ending up on these situations? While introspecting of some of my previous systems I found that there are few common mistakes by developers, which drag the system performance. Here are some common mistakes, which we as a developer perform often and which can be avoided easily.

performance1: Poor Code Structure: Often we start a construct and end up with improper closure. It is very common that when we initiate a process we create an instance but at the end do not close the instance and leave it for the garbage collection or database intelligence to handle, ?Yes? garbage collection and database auto close session management is made for handle such issues but not for mistakes committed by developers. These Garbage collections and other utilities should never be assumed explicit because they are meant for the exception cases only. So for every such construct we should have proper opening and closure of statement.

Connection conn = null;
PreparedStatement ps = null;
ResultSet rs = null;
try {
 // Do stuff
} catch (SQLException ex) {
 // Exception handling stuff
} finally {
 if (rs != null) {
 try {
 } catch (SQLException e) { /* ignored */}
 if (ps != null) {
 try {
 } catch (SQLException e) { /* ignored */}
 if (conn != null) {
 try {
 } catch (SQLException e) { /* ignored */}

Finally block could be improved to avoid the null check.

finally {
try { rs.close(); } catch (Exception e) { /* ignored */ }
try { ps.close(); } catch (Exception e) { /* ignored */ }
try { conn.close(); } catch (Exception e) { /* ignored */ }

Finally block should be closed properly by closing connections.

finally {
DbUtils.close (rs);
DbUtils.close (ps);
DbUtils.close (conn);

Now generally developers avoid exception, closure of connection and code start misbehaving and it result into following:

  • Loading up the connections in database and no of connections increases
  • Database connection pools exhausts and no more connection for database operation
  • Unwanted memory objects are being active to fill the heap.

2: Not using the PreparedStatement

This mistake is very common and it has been years that this developers are writing code in Java using JDBC, C# and any other language but still same repeated mistake.

Let me explain that almost all cases(except some cases) you should write PreparedStatement rather than static statement. Following are reasons which prevails us to write PreparedStatement.

It removed syntax errors originating from bad string concatenation when inlining bind values.

  • It removed SQL injection vulnerabilities from bad string concatenation when inlining bind values.
  • It can avoid mistakes for the ?sophisticated? data types, such as TIMESTAMP, binary data, and others.
  • Statement can be kept open as PreparedStatements around for a while, reusing them with new bind values instead of closing them immediately (useful in Postgres, for instance).
  • It can make use of adaptive cursor sharing(Oracle-speak) in more sophisticated databases. This helps prevent hard-parsing SQL statements for every new set of bind values.

It appears that some developers have reservation from using PreparedStatements for any of these reasons:

  • Developers are not aware about PreparedStatements
  • Developers think that PreparedStatements are slower
  • Developers assume that writing a PreparedStatement takes more effort

Static statement:

Stmt=UPDATE EMPLOYEE SET SALARY= '2000' where ID='?1234';


PreparedStatement pstmt = con.prepareStatement('UPDATE EMPLOYEES
pstmt.setBigDecimal(1, 153833.00)
pstmt.setInt(2, 110592)

Select All Columns in Queries

This mistake is very common and frequent and effect both the aspects one is database execution plan and your performance of Java application.

Bad effects on the Java application:

If developer are selecting * or use default of ORM and database set of X columns(could be 10, 20 50), which they are reusing among various methods, which means it is transferring lots of data from the database into a JDBC ResultSet. Even if system is not using the data from the ResultSet, it has been transferred over the network and loaded into your memory by the JDBC driver. This is a waste of I/O and memories specially when there are only few columns are in use and have multiple joins of various multi column tables.

Now this has two effects, one network traffic and second is unwanted usage of java application memory.

Bad effects on the database execution plan:

Effect of selecting all columns may actually be much worse than the effects on the Java application. Enterprise databases like Oracle, DB2, other such databases perform a lot of SQL transformation when calculating the best execution plan for ?select *? query. It may well be that some parts of select query can be ?transformed away?, knowing that they won?t contribute to the projection (SELECT clause) or to the filtering predicates.

In nutshell never ever use the Select * query. Never use the same projection for various queries try to reduce the projection to the data that is really required.

4: Using of NOT as Boolean reverse of any ststement and use of In clause

Generally we think that two negative will become one positive and vice versa but that is not always true so developer should choose the equation very carefully because one change can take code into wrong direction. Here is one example:

Can we see what A IN (a,b) really means:

?>A IN (a, b)
?>  A = ANY (a, b)
?>A = a OR A = b

and as per developer we reverse it like NOT (A IN (a, b)) and see what really means:

?>NOT (A IN (a, b))
?>A NOT IN (a, b)
?> A != ANY (a, b)
?> A != a AND A != b

When we look on the above expression it seems that it is inverse of one another but actually it is not and this can lead to different magical result which developer cannot thing and program can behave. So we have to choose the expressions very carefully. Choose more steps but in direct way.

In the queries generally it is seen that users give in clause for the very long array or use the nested query which is not good at all for the processing of data. It has multifold effect and causes slowness of queries and poor execution plan on database.

4: Using too many functions in the queries

Generally developers are tend to use functions in queries for the faster and desired result but choosing appropriate function is very important. Here is example:

Select * from employee_details where to_char(salary_credit_date,?dd/mm/yyyy?)=?01/10/2013?

In this query function will be applied on the column and then it will be compared with the string which will have huge cost on database. Instead of this it should be as follows:

Select * from employee_details where salary_credit_date=to_date(?01/10/2013?,?dd/mm/yyyy?)

So user and reviewer should keep in mind of such mistakes which drag the database performance down.

5: Caching of Data

While doing the programming it is very important to reduce the network calls and no of query hits to database. For that it is suggested to cache the data in application wherever it is possible. Let us see what are problems:

  • Call same query multiple times in one transaction for different columns.
  • Call same query with same result in different transactions and transaction rate is very high.

To reduce the network calls following can be done

  • Make sure all the columns from the same query must be executed in single run. But it does not mean that one should use the select *
  • For the smaller tables(like codes list of country, state, city), cache the entire table in memory in order to avoid the network calls.

Also wherever required use the normalization to define the codes and cache these tables in memory.

6: Caching of Sequence

Sequence is used various counters in tables like transaction id, employee id etc. But using these from the sequence is costly because every time it increment then it commits the value on database and if same sequence is used for the transactional system, it may impact the performance badly. In order to avoid this if possible cache the sequence. Here is example.

MAXVALUE 999999999999999999999999999

It might possible that it may give the gap in sequence but it is OK to have if that is not the case it is suggested to use the

7: Assuming that 50ms is fast query execution

When a query is returning results in 50ms developer think it is sufficient and my program will work but it not enough, think twice that how many queries you have in program, if in typical scenario if there are 20 queries in one end to end transaction so it takes one second to process the entire transaction which mean per thread only 1 transactions could be processes. Whereas if we reduce the time of query to microsecond our transaction processing capability increases multifold.

As a conclusion if developer and reviewer both take care of small aspects of coding it can be high performing scalable code.


SQL Mistakes which Drag System Performance Java, JDBC, Hibernate Part 2


Batch processing is a highly data incentive operation and most of architects recommend to execute it near data using database in built capability like PL/SQL in oracle. This is true but real challenge came when Architects has to conform organization standard to follow Java and ORM layer like Hibernate to have their product database agnostic.

In this blog I will talk about similar challenge where use case is to deal with more than 20 million records in database and have to do processing on each record and then save the output into database. Below is typical requirement in batch processing.

  • Pickup data records from database
  • Apply some business logic on each record
  • Save the manipulated data into database.

The obvious solution is to use database technologies like Pl/SQL or ETL tools for better performance. But using a Java application and use hibernate ORM layer and all this with a performance target of 10M record processing in 30 minutes, task seems challenging.

The challenges with Java+ Hibernate based processing are:

  • Limited Heap Size and out of memory error during processing
  • Network overhead in pulling data from database to application.
  • Hibernate overhead as compared to native JDBC calls.

Let?s quickly jump to solution part and talk something positive about Java.

Java performance can be scaled using its multithreading capability and that is hint to solve the puzzle. The problem can be solved if input data can be partitioned into independent data sets and each set can be processed parallel. The following picture illustrates the partitioning approach:

SQL Data1

There can be many partitioning approaches: The following lists some of the possible partitioning approaches. Selecting a partitioning approach has to be done on a case-by-case basis.

  1. Fixed and Even Break-Up of Record Set
  • This involves breaking the input record set into an even number of portions (e.g. 10, where each portion will have exactly 1/10th of the entire record set). Each portion is then processed by one instance of the batch/extract application.
  • In order to use this approach, preprocessing will be required to split the recordset up. The result of this split will be a lower and upper bound placement number which can be used as input to the batch/extract application in order to restrict its processing to its portion alone.
  • Preprocessing could be a large overhead as it has to calculate and determine the bounds of each portion of the record set.
  1. Breakup by a Key Column
  • This involves breaking up the input record set by a key column such as a location code, and assigning data from each key to a batch instance.
  1. Addition of a Processing Indicator
  • This involves the addition of a new column to the input table, which acts as an indicator. As a preprocessing step, all indicators would be marked to non-processed. During the record fetch stage of the batch application, records are read on the condition that that record is marked non-processed, and once they are read (with lock), they are marked processing. When that record is completed, the indicator is updated to either complete or error. Many instances of a batch application can be started without a change, as the additional column ensures that a record is only processed once.
  • With this option, I/O on the table increases dynamically. In the case of an updating batch application, this impact is reduced, as a write will have to occur anyway.

Still problem remain same as large data sets cannot be pulled into Java memory. For this Hibernate provides scroll() to take advantage of server-side cursors for queries that return many rows of data.

Session session = sessionFactory.openSession();Transaction tx = session.beginTransaction();
ScrollableResults customers = session.getNamedQuery("GetCustomers?)    .setCacheMode(CacheMode.IGNORE)
int count=0;
while ( ) {
 if ( ++count % 20 == 0 ) 
 //flush a batch of updates and release memory:        session.flush();        

In this code snippet, Inside while loop, create batches for say 500 records and spawn a new thread from thread pool dedicated to batch program. The size of batch and no of concurrent threads processing batches shall be tuned based on memory allocated to java instance.

In my next blog, I will talk about Batch processing Strategies and Best Practices.