While everyone is looking for the high availability and service uptime five nine is real difficult to have effectively we are talking about the five minute downtime. When it comes to five minutes of downtime, it does not comes free, it has cost associated with it. Everyone is asking for the 99.999 uptime we need to ask few basic questions

  • Is it really required? it is too expensive.
  • Can it be achieved with the budget allocated.
  • Is there alternative of this which covers the risk and requirement of five nines.

Sustaining five nines is too expensive.

When comes to real world sustaining the high availability is too costly. It required more physical or cloud infrastructure and along with this software, it’s configuration and manpower to maintain it, all of them adds complexity to it. More moving parts adding the more complexity to the system and points of failure. These additional components can fail due to misconfiguration, bugs, and interoperability issue.

Better process management can give high availability with limited resource.

Generally there are many processes which can enhance the high availability but we do not consider in any deployment. Here are few things which we miss a lot.

Do we have test servers?

Do we monitor logfiles?

Do we have network wide monitoring in place?

Do we verify backups?

Do we monitor disk partitions?

Do you monitor your server system logs for disk errors and warnings?

Do we watch disk subsystem logs for errors? (the most likely component in hardware to fail is a disk)

Do we have server analytics? Do you collect server system metrics?

Do we perform fire drills?

 

One Thought on “High Availability of 99.999 is overrated

  1. Abhishek on August 18, 2014 at 1:22 pm said:

    Five 9s are hard to achive as the 9s are in terms of the entire system not a single component. The database may be up the whole year but if the storage tier goes down,application server ,load balancer any of the componnents go awry it effectively brings down the 9s. To put it in perspective achieving five 9s mean only 5.26 minutes of downtime per year.

    If we have deployed a HA system which fails over automatically in one minute and if 6 times in a year the failure happens then also our 5 9s are gone. and HA system with one minute automatic failover will take a lot of effort.

Post Navigation