While everyone is looking for the high availability and service uptime five nine is real difficult to have effectively we are talking about the five minute downtime. When it comes to five minutes of downtime, it does not comes free, it has cost associated with it. Everyone is asking for the 99.999 uptime we need to ask few basic questions
- Is it really required? it is too expensive.
- Can it be achieved with the budget allocated.
- Is there alternative of this which covers the risk and requirement of five nines.
Sustaining five nines is too expensive.
When comes to real world sustaining the high availability is too costly. It required more physical or cloud infrastructure and along with this software, it’s configuration and manpower to maintain it, all of them adds complexity to it. More moving parts adding the more complexity to the system and points of failure. These additional components can fail due to misconfiguration, bugs, and interoperability issue.
Better process management can give high availability with limited resource.
Generally there are many processes which can enhance the high availability but we do not consider in any deployment. Here are few things which we miss a lot.
Do we have test servers?
Do we monitor logfiles?
Do we have network wide monitoring in place?
Do we verify backups?
Do we monitor disk partitions?
Do you monitor your server system logs for disk errors and warnings?
Do we watch disk subsystem logs for errors? (the most likely component in hardware to fail is a disk)
Do we have server analytics? Do you collect server system metrics?
Do we perform fire drills?