Processing of Large Volume of Data

Introduction

What is best way of processing the large volume of data in OLTP application? Weather it should be”Near Data Processing” or “Far data Processing”? Process Near the data (database procedures, PLSQL, ETL, ELT, OOD etc) – Process away from the data ( Java coding, C++ coding, etc.) Both the systems has some pros and cons:

Processing Near Data

Generally two type of processing occur one is batch processing and other is real time processing. Batch processing where a group of data is being processed and entire processing is very similar, like salary calculation and disbursement, loyalty calculation, interest calculation and posting trading day end processing, settlement of bills, bill generation etc. Real time processing are those processing which occurs realtime like online fund transfer, money withdrawal, purchase of any item and it’s payment. Batch processing should be carried out near the data and real time should be carried out near the objects. While certain systems perform batch processing far from the data(like social networking sites) but those systems have some valid reasons of distributed processing and time does not matter to those systems like but there we time is crucial factor and window is very small it is preferred to move the processing near the data. Overhead of network, chances of data loss, disconnection of process, and availability of systems are main reasons to move the processing near data.

Processing Far Data

Far Large Volume of data processing is not very encouraging in the data analytical world. There are some cases where far data processing is beneficial like form processing, bulk entry(offline) processing, social networking sites data analysis for the presentation to users. It is also essential when system is processing individual transaction based out of bulk transaction processing far data processing is really good to have.

 

 

High Availability of 99.999 is overrated

While everyone is looking for the high availability and service uptime five nine is real difficult to have effectively we are talking about the five minute downtime. When it comes to five minutes of downtime, it does not comes free, it has cost associated with it. Everyone is asking for the 99.999 uptime we need to ask few basic questions

  • Is it really required? it is too expensive.
  • Can it be achieved with the budget allocated.
  • Is there alternative of this which covers the risk and requirement of five nines.

Sustaining five nines is too expensive.

When comes to real world sustaining the high availability is too costly. It required more physical or cloud infrastructure and along with this software, it’s configuration and manpower to maintain it, all of them adds complexity to it. More moving parts adding the more complexity to the system and points of failure. These additional components can fail due to misconfiguration, bugs, and interoperability issue.

Better process management can give high availability with limited resource.

Generally there are many processes which can enhance the high availability but we do not consider in any deployment. Here are few things which we miss a lot.

Do we have test servers?

Do we monitor logfiles?

Do we have network wide monitoring in place?

Do we verify backups?

Do we monitor disk partitions?

Do you monitor your server system logs for disk errors and warnings?

Do we watch disk subsystem logs for errors? (the most likely component in hardware to fail is a disk)

Do we have server analytics? Do you collect server system metrics?

Do we perform fire drills?