JPA performance tip – 1: surrogate keys.

This post is the first in series of JPA performance tips.

The tips are result of my working with JPA and facing issues which are not documented.

These are non-obvious performance pitfalls which if you do not take care from beginning may result in lot of rework or worse non resolvable issues making you grin and bear.

So, please use my experience and build your systems correctly from the beginning.

Surrogate key: The surrogate key is the primary key with no business meaning. Generally the surrogate key type of primary key is of two types:

  • Usually generated with database sequence (but not always)
  • GUID generator.

The primary key generated with sequence is of type  Number (translating to long in java).We will discuss how a ‘long’ type of primary key can improve the performance in a database.

Let us take an  example tables

‘Person’ : contains the details of a person

‘Person_phones’ : One or more phones for a user

The Person entity class is created as follows

@Entity
public class Person {

@Id
private String id;

@OneToMany(cascade = CascadeType.ALL, fetch = FetchType.EAGER, mappedBy = "person")
private Set<Person_phones> phones = new HashSet<Person_phones>();

private String firstName;
private String lastName;
private String type;

The Person_phones class is:

@Entity
public class Person_phones {

@Id
private String id;

private String phoneNumber;

@ManyToOne
@JoinColumn(name="person", referencedColumnName="id")
private Person person;

 

To create the person and person phone following code snippet is given. The id in both person and person_phone table is assigned using setter methods.

@Transactional
public String create(String firstName, String lastName, String type)
{
Person per = new Person();
per.setId("3093");
per.setFirstName(firstName);
per.setLastName(lastName);
per.setType(type);

Person_phones phone = new Person_phones("9873692304", per);
phone.setId("firstPhone");

Person newPer = persRepository.save(per);
System.out.println("New ID: " + newPer.getId());

return "success";
}

 

Here the CRUD operations are performed using the CRUDRepository interface provided by spring.

If we run the code with show_sql true lets see the output

<!– HTML generated using hilite.me –>

Hibernate: select person0_.id as id1_1_1_, person0_.first_name as first_na2_1_1_, person0_.last_name as last_nam3_1_1_, person0_.type as type4_1_1_, phones1_.person as person3_1_3_, phones1_.id as id1_2_3_, phones1_.id as id1_2_0_, phones1_.person as person3_2_0_, phones1_.phone_number as phone_nu2_2_0_ from person person0_ left outer join person_phones phones1_ on person0_.id=phones1_.person where person0_.id=?
Hibernate: insert into person (id, first_name, last_name, type) values (null, ?, ?, ?)
Hibernate: select person_pho0_.id as id1_2_0_, person_pho0_.person as person3_2_0_, person_pho0_.phone_number as phone_nu2_2_0_ from person_phones person_pho0_ where person_pho0_.id=?

Hibernate: insert into person_phones (person, phone_number, id) values (?, ?, ?)

 

We see that there is a Select query before an insert !!!!!

Now if the person and person_phone contains millions of records, consider the cost of one extra select query. But the point to explore is why did this happen ??

For getting the answer, we need to go into the code of CRUDRepository by Spring. When we call the repository.save() the control goes to the following class and code snippet

@Transactional(readOnly = true)
public class SimpleJpaRepository<T, ID extends Serializable> implements JpaRepository<T, ID>,
JpaSpecificationExecutor<T> {

@Transactional
public <S extends T> S save(S entity) {

if (entityInformation.isNew(entity)) {
em.persist(entity);
return entity;
} else {
return em.merge(entity);
}
}

 

The entityInformation.isNew(s entity), checks the primary key type. if the primary key type is primitive, i.e. long, int etc it returns true and e.persist is called.

In case pf PK being String the String (as in our case)  or any other class (as in composite key) the  entityInformation.isNew(s entity) returns false and em.merge() called.

In case of em.merge(), the hibernate checks if the entity is already existing by making the select query. Hence we get select query in above example.

if the  primary key is primitive and surrogate (generation type auto, sequence or table) i.e. the not the application data, the em.new() is called and  and there is only insert query saving on valuable IO.

Conclusion: Always have surrogate keys for all tables. The relationship (oneToMany, manyToOne) can be maintained outside the primary  keys.  The non-primitive primary keys and composite keys will have a performance impact while inserting new rows.

This article can also be viewed at https://bootcamptechblog.wordpress.com/2015/09/25/jpa-performance-tip-1-surrogate-keys/

Working with Quartz Scheduler Part 1

Introduction

This article gives the guidelines to use quartz scheduler for a transnational application.

This article focuses on basic setup of quartz, holiday calendar, job dependency .

The part II of the article will delve on misfire handling, exception management and configuration setup.

There are several tasks in applications especially transnational applications which should run at a scheduled date and time.

There are following concepts or requirements which should be taken care by the scheduler

  1. EOD process: The scheduled tasks should run at the ?End Of Day? every day
  2. Cut-off time: The EOD for a current day is considered to be when cut-off day is passed. The EOD processes are triggered only after Cut-off time us passed. The transaction happening the system after cut-off time are considered as ?next days? transactions and would not be considered in current days EOD process E.g. the cut-off day for the current day us set as 8:00 PM .
  3. Holiday calendar: Some of the EOD process should not run on following days:
    1. Bank Holidays
    2. Weekends

 

Any scheduler should satisfy above conditions.

The recommended scheduler is based on ?Quartz? framework (http://quartz-scheduler.org/).

Reasons for recommending quartz:

  1. Open source.
  2. Active community support.
  3. Most well-known scheduler for Java used by many financial applications.
  4. Proven performance scalable to clustered environment
  5. Ability to incorporate holiday calendar.
  6. Abilities can be extended by using Terracotta extensions (with License fee) for
    1. Graphical scheduler
    2. Capability to deploy different jobs at different nodes
    3. Run jobs in memory taking terabytes(like interest calculation of million of wallets ) using ?BigMemory?

Basic setup

There are following components to setup the scheduler

Maven

Include following maven dependencies

<dependencies>
  <dependency>
    <groupId>org.quartz-scheduler</groupId>
    <artifactId>quartz</artifactId>
    <version>2.2.1</version>
   </dependency>
</dependencies>

 

Web.xml: add following

<?xml version="1.0" encoding="UTF-8"?>
<web-app version="3.0" xmlns="http://java.sun.com/xml/ns/javaee"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd">
	<context-param>
		<param-name>quartz:config-file</param-name>
		<param-value>quartz.properties</param-value>
	</context-param>
	<context-param>
		<param-name>quartz:start-on-load</param-name>
		<param-value>true</param-value>
	</context-param>
	<context-param>
		<param-name>quartz:wait-on-shutdown</param-name>
		<param-value>true</param-value>
	</context-param>
	<context-param>
		<param-name>quartz:shutdown-on-unload</param-name>
		<param-value>true</param-value>
	</context-param>
	<listener>
		<listener-class>org.quartz.ee.servlet.QuartzInitializerListener</listener-class>
	</listener>
	<servlet>
		<servlet-name>schedServlet</servlet-name>
		<servlet-class>com.mc.scheduler.SchedulerServlet</servlet-class>
		<load-on-startup>0</load-on-startup>
	</servlet>
	<servlet-mapping>
		<servlet-name>schedServlet</servlet-name>
		<url-pattern>/aa*</url-pattern>
	</servlet-mapping>
</web-app>

Key points

  1. The Quartz properties are given in ?Quartz.properties? file (Described later).
  2. The Listener class provided b y quartz distribution initializes the Scheduler factory based on the context parameters defined in the web.xml.
  3. The servlet, in its init() method, initializes the scheduler and puts the jobs into it

Quartz.properties:

Put the quartz.properties file in the resources folder in project structure ?src/main/resources? so that it can be picked up in the class path

# Main Quartz configuration

org.quartz.scheduler.skipUpdateCheck = true
org.quartz.scheduler.instanceName = MyQuartzScheduler
org.quartz.scheduler.jobFactory.class = org.quartz.simpl.SimpleJobFactory
org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount = 5

The meaning of elements in properties file is given here(http://quartz-scheduler.org/documentation/quartz-2.2.x/configuration ) . We will update the properties file based on performance benchmarking. For development, the above properties file suffices.

Important classes:

SchedulerFactory: This class is created in the Listener provided by Quartz. Factory class creates a Scheduler.

Scheduler : An instance Scheduler interface maintains a registry of ?JobDetail? and ?Triggers?. It also provides the APIs for a adding the Jobs, triggers, calendars etc. The Scheduler instance is produced by the SchedulerFactory class.

Job: This is the interface to be implemented by classes that intend to do processing at a scheduled time. An example of Job is ?InterestAccrualManager?, GLPostingManager etc. The Job instance implements the execute() method which is the entry point of the processing that needs to be performed.

JobDetail: conveys the details of a given Job instance. Quartz does not actually stores the Job instance in the Scheduler, but adds a JobDetail in its registry which allows to create an instance of Job. Jobs have ?name? and ?Groups? attributes, which allow them to be uniquely added into a scheduler.

Trigger: Trigger is the mechanism by which a Job in the scheduler gets invoked. There can be more than one trigger for a single Job instance, but one trigger can be associated with only one Job instance via JobDetail. There are various type of triggers like:

  • SimpleTrigger which helps in setting up simple Jobs such as trigger a job on a frequency.
  • CronTrigger: Uses OS cron like syntax for setting up the trigger for a Job. The details of the cron setup are given here (http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/crontrigger)

Builder classes: The JobDetail and Trigger instances are created by the Builder classes. There are different Builders for jobs and Triggers.

Calendar: The Quartz Calendar classes are different from Java Calendar. The Quartz Calendar interface instance implements two methods:

isTimeIncluded()

getNextIncludedTime().

There are various calendars available, the most important for interest and GL use cases are AnnualCalendar and HolidayCalendar.

Calendar instances, when added to Schedular and trigger, make trigger not to fire based on certain condition e.g. a trigger set to fire every day 8:00 PM will not be fired on a Holiday if holiday calendar is defined in the trigger.

Steps for setting up a scheduler

  1. Get an instance of StdSchedulerFactory

String key = "org.quartz.impl.StdSchedulerFactory.KEY";

		ServletContext servletContext = cfg.getServletContext();

		StdSchedulerFactory factory = (StdSchedulerFactory) servletContext.getAttribute(key);

		try {
			Scheduler quartzScheduler = factory.getScheduler("MyQuartzScheduler");
			quartzScheduler.start();


			//Scheduling top level jobs
			scheduleJob(quartzScheduler);
		} catch (SchedulerException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
  1. Create Scheduler: The scheduler is given a name

Scheduler quartzScheduler = factory.getScheduler("MyQuartzScheduler");

 

  1. Start Scheduler:

quartzScheduler.start();

 

  1. Add Job to a scheduler : Create a JobDetail

JobDetail jobDetail = JobBuilder.newJob(PrintInfoJob.class)
.withIdentity("printInfoJob",Scheduler.DEFAULT_GROUP)
.usingJobData(map)
.build();

 

 

  1. Create a trigger
CronTrigger trigger = (CronTrigger)TriggerBuilder.newTrigger()
.withIdentity("trigger1",Scheduler.DEFAULT_GROUP)
.withSchedule(CronScheduleBuilder.cronSchedule("0 * 0-23 ? * MON-FRI"))
.build();

 

Additional Steps

  1. Adding a Holiday calendar: create an instance of Holidaycalendar
HolidayCalendar hc = new HolidayCalendar();

 

  1. Add Holidays to holidaycalendar instance
Calendar gCal = GregorianCalendar.getInstance();
              gCal.set(Calendar.MONTH, Calendar.SEPTEMBER);
              gCal.set(Calendar.DATE, 10);   
              hc.addExcludedDate(gCal.getTime());
  1. Add HolidayCalendar to Scheduler

quartzScheduler.addCalendar("Holidays", hc, true, true);

 

  1. Modify the Trigger Builder to take into account the HolidayCalendar

CronTrigger trigger = (CronTrigger)TriggerBuilder.newTrigger()
.withIdentity("trigger1", Scheduler.DEFAULT_GROUP)
.withSchedule(CronScheduleBuilder.cronSchedule("0 * 0-23 ? * MON-FRI"))

 

Job Dependency

The requirement is to have an order of jobs or job dependency where one job runs only when another job is completed. If the first job fails the second job is not triggered at all.

e.g. Job 2 depends on Job1

Order execution : Job1 ? Job2

If Job1 fails Job2 is not triggred.

The quartz do not provide the direct interfaces to create the Job dependency.

However, the requirement can be met with a concept of ?Job Chaining? where ordered jobs are ?chained? to run together.

This can be achieved in two ways in Quartz.

  1. Use Job chaining with Quartz Listeners.
  2. Job chaining with JobDataMap

Both approaches require certain coding and there is no direct configuration to define the job dependency.

We will use the second option to implement the job dependency/chaining as it is less involved and more direct.

The JobDataMap is the data that is passed to the job from a trigger or another Job. We use the JobDataMap to pass the information on ?next? job to run in the chain after completion of current Job.

Following is the implementation details of Job chaining:

Suppose in an application there are two jobs that happen periodically

  • Print card job
  • Dispatch card Job

Clearly dispatch can happen only when card is printed. So there is a dependency.

Implementation details:

Create an abstract class ChainableJob. Having abstract method doExecute()

All jobs having the dependency will extend the ChainableJob and implement job logic in doExecute().

The abstract class implements following methods

Execute():

  • calls abstract method doExecute() (implemented by base class) to execute the current job
  • once the job is executed, calls the chain() method to execute the dependent job.

@Override
	public void execute(JobExecutionContext context) throws JobExecutionException 
	{
		doExecute(context);	 

		// if chainJob() was called, chain the target job, passing on the JobDataMap

		
		if (context.getJobDetail().getJobDataMap().get(NEXT_JOB_CLASS) != null) 
		{	
			try 
			{		
				chain(context);
			} catch (SchedulerException e) {
				e.printStackTrace();

			}
		}
	}

 

Chain():

The chain method does the following.

  • Gets the details of the job which is next in chain (dependant job) from the JobDataMap of the top level Job
  • Creates the JobDetail of the dependent job.
  • Schedules the job for immediate running using ?SimpleTrigger?

/**
	 * Schedules the next job in line
	 * @param context
	 * @throws SchedulerException
	 */
	private void chain(JobExecutionContext context) throws SchedulerException {

		JobDataMap map = context.getJobDetail().getJobDataMap();
	
		Class jobClass = (Class) map.remove(NEXT_JOB_CLASS);
		String jobName = (String) map.remove(NEXT_JOB_NAME);
		String jobGroup = (String) map.remove(NEXT_JOB_GROUP);


		JobDetail jobDetail = JobBuilder.newJob(jobClass)	
				.withIdentity(jobName, jobGroup)	
				.usingJobData(map)	
				.build();

		SimpleTrigger trigger = (SimpleTrigger) TriggerBuilder.newTrigger()
				.withIdentity(jobName + "Trigger", jobGroup + "Trigger")
				.startNow()     
				.build();

		System.out.println("Chaining " + jobName);
		StdSchedulerFactory.getDefaultScheduler().scheduleJob(jobDetail, trigger);

	}

Steps for defining the top level Job job

  1. Define the dependent Job details in JobDataMap. Following details will need to be definef
    1. Dependent Job class
    2. Dependent job name
    3. Dependent job group
JobDataMap map =  new JobDataMap();

 map.put("chainedJobClass", DispatchJob.class);

map.put("chainedJobName", "DispatchJob");

map.put("NEXT_JOB_GROUP", Scheduler.DEFAULT_GROUP);
  1. Define the top level job e.g. printCardJob using the JobBuilder code snippet also add the JobDataMap while crating the job

JobDetail jobDetail = JobBuilder
.newJob(PrintInfoJob.class)
.withIdentity("printInfoJob",Scheduler.DEFAULT_GROUP)
.usingJobData(map)
.build();
  1. Create Trigger using triggerBuilder
  2. Scehdule top level job