With cyber incidents now commonplace, it is no longer a matter of if but when they will occur. Software issues, system outages, phishing, ransomware, or natural disasters can all compromise an organization’s most valuable asset, their data, at any given time. Businesses are entirely dependent on their technology solutions to function. These solutions enable organizations to process data, service customers, and conduct business transactions necessary to generate revenue. Downtime can have a significant financial and reputational impact on the business. A disruption can be the result of several potential causes including a hardware or software failure, a natural disaster, human error, or even a cyberattack. The question isn’t if unexpected downtime occurs, the real question is when it will occur and do we have a plan to recover quickly; disasters are inevitable.
When disaster strikes it is crucial to have the right strategy with the proper and right-sized recovery solution to reduce the time to recovery to a level acceptable to the business. A disaster recovery (DR) site is a crucial component of a DR strategy. A good DR strategy enables organizations to minimize the financial and operational impact of unplanned downtime. The primary types of DR sites include:
- Cold – This type of site is the most barebones and typically the least expensive of the three. In the event of a disaster this type of site would require the greatest level of effort and require the longest recovery time. A cold site is a great choice for an organization that can afford to have a prolonged outage measured in weeks or even months. The site has power, cooling, and usually some network infrastructure in place. There is no compute or storage infrastructure. An organization would need to deploy and configure compute, storage, and possibly network infrastructure prior to recovering any data. Once the basic data infrastructure is online in the cold site, you can then recover data and applications. Backups of production systems and workloads would need to be transferred to the site to recover from. Cold DR sites often have no recovery plan in place—the entire recovery process is performed ad-hoc at time of need. Few IT teams are agile enough to perform such a recovery before a business simply ceases to exist.
- Hot – This type of site is the most comprehensive and expensive of the three. In the event of a disaster, this type of site would require minimal effort and provide the ability to be back online almost immediately. A hot site is a great choice for an organization that needs to minimize downtime and can incur higher maintenance costs. The site has power, cooling, and network infrastructure in place as well as compute and storage infrastructure. Backups of production systems and workloads are performed in real-time and data from the production site is synchronized. Hot DR sites are typically associated with a well-documented recovery action plan. This plan gets tested at least annually to ensure the organization can recover at any time without facing delays in recovery due to unknown surprises. This level of planning and testing comes with increased costs when compared to a cold DR site, but the costs are accepted because the success rate of recovery is substantially higher.
- Warm – This type of site is a balance between cold and hot sites. In the event of a disaster this type of site would require less effort and time than a cold site but more than a hot site. A warm site is a great choice for an organization that can afford to have an outage that lasts from hours to days and wants to reduce costs over a hot site. The site has power, cooling, and network infrastructure in place as well as compute and storage infrastructure. Backups of production systems and workloads may be replicated to the site on intervals ranging from days to weeks. Warm DR sites trade real-time replication for slower to recovery from offsite backups. This increases the amount of data that may be lost due to the disaster as well as increases the amount of time required to restore data. The loss of data and increased recovery time might be justified by the reduced costs. Like a hot DR site, a well-written and regularly tested recovery action plan is critical to a successful recovery. Unfortunately, many organizations using a warm DR site fail to document a recovery action plan or fail to test the plan regularly. Thus, they reduce their recovery success.
Downtime has the potential to be costly. The cost of a Disaster Recovery site varies depending on the type of site. Organizations must balance the cost of a DR site against the cost of downtime. To accomplish this, an organization needs to understand the cost associated with each type of site, the length of time it will take to recover for each type of site, the probability of a disruption, and the cost associated with the duration of an outage based site’s recovery time.
To identify the optimal type of DR site for your organization, you need to determine where the intersection lies between cost of a DR site and the cost of an outage. A shorter recovery time comes at a greater cost, but the longer the outage the greater the cost for the downtime. You must also consider how much data is at risk by considering how often data is backed up or replicated.
To determine where this intersection occurs conduct a risk assessment to understand the annual rate of occurrence and the impact of a disaster. Let’s look at the following simplified theoretical scenario. Organization Z is reviewing DR site options and determines the following annual costs:
- Cold site – Would cost $50,000 annually and allow them to recover their systems in approximately two weeks.
- Warm site – Would cost $125,000 annually and allow them to recover their systems in approximately 24 hours.
- Hot site – Would cost $300,000 annually and allow them to recover their systems in approximately an hour.
The organization determines that there is a 20% probability in any given year they may experience a disaster. Organization Z also determines that they would lose $50,000 for every hour their systems were down. To determine which solution will provide the best balance let’s calculate the annual per hour cost by taking the annual probability by the hourly cost of an outage. This would be $50,000 x 20%, which equals $10,000.
Using that information the annual cost of leveraging each site type when factoring the site cost and the impact of downtime from a disaster would be the following:
- Cold site – Annual site cost is $50,000, potential downtime cost based on a recovery time of two weeks (14 days) x 24 hours a day equals 336 hours x $10,000 per hour equals $3,360,000. The annualized cost is $50,000 + $3,360,000, which equals $3,410,000.
- Warm site – Annual site cost is $125,000, potential downtime cost based on a recovery time of twenty-four hours (24 hours) x $10,000 per hour equals $240,000. The annualized cost is $125,000 + $240,000, which equals $365,000.
- Hot site – Annual site cost is $300,000, potential downtime cost based on a recovery time of one hour (1 hour) x $10,000 per hour equals $10,000. The annualized cost is $300,000 + $10,000, which equals $310,000.
Based on the calculations a hot site appears to be the most appropriate choice to balance the cost of a Disaster Recovery site with the cost Organization Z would incur from an outage. When determining a DR strategy, it is important to ensure requirements related to any regulatory or contractual obligations are considered as well. If your organization is providing a critical service and regulatory or contractual obligations stipulate the need to be able to recover quickly a hot site may be the only option.
In conclusion, IT departments need to understand the numerous options available when developing a Disaster Recovery strategy and costs can vary significantly. Balancing a sound DR strategy is critical to a quick recovery. Many factors need to be considered so that an organization can make a prepared and informed decision choosing the solution that most appropriately meets its needs while being cost-effective. It is important to periodically reassess to ensure that the DR strategy is still aligned to the organization’s needs and provides the best value. As the saying goes “an ounce of prevention is worth a pound cure.”
Additional 11:11 Systems Resources:
Disaster Recovery Product Overview
Compromised Data Recovery Strategy Options
Disaster Recovery: Make the Case to Add It to Your IT Budget