In IT application world we see that applications are not performing well, due to this we spend lot of money in hardware, support by operations team and fire fighting in production environment. But have we analyzed that why we are ending up on these situations? While introspecting of some of my previous systems I found that there are few common mistakes by developers, which drag the system performance. Here are some common mistakes, which we as a developer perform often and which can be avoided easily.
1: Poor Code Structure: Often we start a construct and end up with improper closure. It is very common that when we initiate a process we create an instance but at the end do not close the instance and leave it for the garbage collection or database intelligence to handle, ?Yes? garbage collection and database auto close session management is made for handle such issues but not for mistakes committed by developers. These Garbage collections and other utilities should never be assumed explicit because they are meant for the exception cases only. So for every such construct we should have proper opening and closure of statement.
Finally block could be improved to avoid the null check.
Finally block should be closed properly by closing connections.
Now generally developers avoid exception, closure of connection and code start misbehaving and it result into following:
- Loading up the connections in database and no of connections increases
- Database connection pools exhausts and no more connection for database operation
- Unwanted memory objects are being active to fill the heap.
2: Not using the PreparedStatement
This mistake is very common and it has been years that this developers are writing code in Java using JDBC, C# and any other language but still same repeated mistake.
Let me explain that almost all cases(except some cases) you should write PreparedStatement rather than static statement. Following are reasons which prevails us to write PreparedStatement.
It removed syntax errors originating from bad string concatenation when inlining bind values.
- It removed SQL injection vulnerabilities from bad string concatenation when inlining bind values.
- It can avoid mistakes for the ?sophisticated? data types, such as TIMESTAMP, binary data, and others.
- Statement can be kept open as PreparedStatements around for a while, reusing them with new bind values instead of closing them immediately (useful in Postgres, for instance).
- It can make use of adaptive cursor sharing(Oracle-speak) in more sophisticated databases. This helps prevent hard-parsing SQL statements for every new set of bind values.
It appears that some developers have reservation from using PreparedStatements for any of these reasons:
- Developers are not aware about PreparedStatements
- Developers think that PreparedStatements are slower
- Developers assume that writing a PreparedStatement takes more effort
Select All Columns in Queries
This mistake is very common and frequent and effect both the aspects one is database execution plan and your performance of Java application.
Bad effects on the Java application:
If developer are selecting * or use default of ORM and database set of X columns(could be 10, 20 50), which they are reusing among various methods, which means it is transferring lots of data from the database into a JDBC ResultSet. Even if system is not using the data from the ResultSet, it has been transferred over the network and loaded into your memory by the JDBC driver. This is a waste of I/O and memories specially when there are only few columns are in use and have multiple joins of various multi column tables.
Now this has two effects, one network traffic and second is unwanted usage of java application memory.
Bad effects on the database execution plan:
Effect of selecting all columns may actually be much worse than the effects on the Java application. Enterprise databases like Oracle, DB2, other such databases perform a lot of SQL transformation when calculating the best execution plan for ?select *? query. It may well be that some parts of select query can be ?transformed away?, knowing that they won?t contribute to the projection (SELECT clause) or to the filtering predicates.
In nutshell never ever use the Select * query. Never use the same projection for various queries try to reduce the projection to the data that is really required.
4: Using of NOT as Boolean reverse of any ststement and use of In clause
Generally we think that two negative will become one positive and vice versa but that is not always true so developer should choose the equation very carefully because one change can take code into wrong direction. Here is one example:
Can we see what A IN (a,b) really means:
and as per developer we reverse it like NOT (A IN (a, b)) and see what really means:
When we look on the above expression it seems that it is inverse of one another but actually it is not and this can lead to different magical result which developer cannot thing and program can behave. So we have to choose the expressions very carefully. Choose more steps but in direct way.
In the queries generally it is seen that users give in clause for the very long array or use the nested query which is not good at all for the processing of data. It has multifold effect and causes slowness of queries and poor execution plan on database.
4: Using too many functions in the queries
Generally developers are tend to use functions in queries for the faster and desired result but choosing appropriate function is very important. Here is example:
In this query function will be applied on the column and then it will be compared with the string which will have huge cost on database. Instead of this it should be as follows:
So user and reviewer should keep in mind of such mistakes which drag the database performance down.
5: Caching of Data
While doing the programming it is very important to reduce the network calls and no of query hits to database. For that it is suggested to cache the data in application wherever it is possible. Let us see what are problems:
- Call same query multiple times in one transaction for different columns.
- Call same query with same result in different transactions and transaction rate is very high.
To reduce the network calls following can be done
- Make sure all the columns from the same query must be executed in single run. But it does not mean that one should use the select *
- For the smaller tables(like codes list of country, state, city), cache the entire table in memory in order to avoid the network calls.
Also wherever required use the normalization to define the codes and cache these tables in memory.
6: Caching of Sequence
Sequence is used various counters in tables like transaction id, employee id etc. But using these from the sequence is costly because every time it increment then it commits the value on database and if same sequence is used for the transactional system, it may impact the performance badly. In order to avoid this if possible cache the sequence. Here is example.
It might possible that it may give the gap in sequence but it is OK to have if that is not the case it is suggested to use the
7: Assuming that 50ms is fast query execution
When a query is returning results in 50ms developer think it is sufficient and my program will work but it not enough, think twice that how many queries you have in program, if in typical scenario if there are 20 queries in one end to end transaction so it takes one second to process the entire transaction which mean per thread only 1 transactions could be processes. Whereas if we reduce the time of query to microsecond our transaction processing capability increases multifold.
As a conclusion if developer and reviewer both take care of small aspects of coding it can be high performing scalable code.