Regardless of size, complexity, or industry, every business will experience an event that puts its operations at risk. It is not a matter of if it will happen, it’s a matter of when. The key is how well your company recovers. The goal of disaster recovery (DR) planning is to provide a structured approach for responding to unplanned incidents and to minimize their negative impact on business operations.
1. Conduct a thorough inventory.
The first step in developing a disaster recovery plan is to thoroughly identify and document your enterprise’s devices, systems and facilities. Many organizations perform annual inventory auditing, but real asset management is often neglected, leaving businesses with limited knowledge of what is in their data centers. Collect a list of hardware, such as servers, network, and security appliances. Compile a list of software platforms and applications as well as data repositories. Document the physical infrastructure, listing all on- and off-site facilities, including equipment and power.
2. Define business needs.
Understanding business requirements is essential to creating an effective DR strategy, so your DR plan should be developed in conjunction with your business continuity plan, which includes an analysis of business units and operations and interviews with key personnel. To determine DR priorities, put each important information system through a business impact analysis to identify and evaluate how they will affect business operations.
Consider the organization’s Recovery Time Objective (RTO, its limit for downtime). And Recovery Point Objective (RPO, its tolerance for data loss) and While some enterprises have zero tolerance for data loss, resulting in a low RPO. Others, with a higher RPO, can tolerate a loss of several days of data. For most businesses, downtime costs money; therefore, their RTO will be low. Work with key decision makers to determine how long your organization can afford to be without service if a disaster occurs.
Gather technical requirements by identifying critical software applications and data along with the hardware required to run them. Prioritizing applications based on business needs is imperative for developing recovery strategies that minimize loss. It is important to rank your applications by importance. Which of your applications are 1, tier 2, tier 3 in terms of importance to your business? Understand and define application interdependencies. What feeds your mission-critical applications?
3. Define roles and responsibilities.
Define the key roles and responsibilities involved in disaster recovery, including which tasks should be completed and by whom. The DR team may include roles such as recovery manager, facilities coordinator, technical coordinator, administrative coordinator, network coordinator, applications coordinator and computer operations coordinator. The recovery manager acts as the authority during the DR process, while each of the other roles leads a specialized team in recovery tasks. Ensure that each defined role has a primary and backup person assigned to it and that the responsibilities are communicated to all parties involved.
4. Establish a communications plan.
Next, develop a plan for communicating information to relevant parties before, during and after a disaster. Establish a protocol for notifying management, identifying which situations require immediate notification and determining who will act as the spokesperson to keep the enterprise updated on DR progress. Develop similar notification procedures for other stakeholders.
Your business must be able to respond promptly, accurately and confidently, communicating the appropriate information to different audiences based on their interests and needs. Be sure contact information for each audience is continuously updated and accessible during an event. Stakeholders may include customers, employees, media, suppliers, investors and government regulators. Identify scenarios that would require communication to each audience, and prepare scripted messages that can be modified and used during an event.
During a disaster, communication systems, such as phone and email, may be disrupted. Establish alternate methods for disseminating information until services are restored.
5. Coordinate with providers.
When developing a DR strategy, review existing Service Level Agreements (SLAs) with current external resource providers to ensure that business needs will be met during a DR event. If utilizing a colocation, determine what measures are in place for network and power failures. Also, confirm that IT personnel are authorized and able to enter the facility at any time if necessary. Work with cloud providers to hold images of critical applications that are ready to go should the primary site go down. While warm images are more costly than cold, these pre-provisioned resources and mirrored data stores are essential for high-priority workloads. Determine the level of support guaranteed from other vendors, such as networking at your primary site, in a DR event.
6. Validate and revisit your backup strategy on a regular basis.
The DR plan should include a strategy to ensure that all critical systems are backed up, and it is essential to accurately identify the frequency of backups. High availability is pricey, and not all assets are worth the cost of protecting them. The enterprise’s RPO and prioritized data should drive the decisions regarding which data, systems and applications should be backed up and which storage solution should be used.
7. Develop multiple plans to address different business interruption events.
Planning for every kind of event is impossible and cost-prohibitive, but it is possible to prepare for the interruptions that are most likely to impact your business. Define disaster scenarios, considering which area of the company will be affected and the short- and long-term consequences for each. For example, categories may include isolated areas of operation (e.g., a fire in a switch closet), entire operations (e.g., connectivity to a service provider), an isolated geographic area (e.g., flooding in data center) or a large geographic area (e.g., earthquake). Determine how to respond to each potential disruption to your business, giving priority to the most likely scenarios.
8. Implement the right technologies and services.
Traditional DR planning requires that you account for everything from capacity to network infrastructure to bandwidth. Current technologies have made it much easier to recover from a disaster, but there’s no one-size-fits-all solution. Low RPOs and RTOs, for example, require DR solutions that use data replication technology. Remote replication of data between primary and secondary data centers may require WAN optimization tools that offer high bandwidth and high reliability. If a business demands minimal downtime and high availability of data, they will need the budget to pay for cutting-edge technologies to meet those requirements.
Many businesses are moving data operations to the cloud, which has fueled disaster recovery as a service (DRaaS). DRaaS has helped more organizations be better prepared for disasters since resources (e.g., CPU and RAM), VM replication, load balancing and network setup and configuration can all be pre-configured as part of the service. With cloud-based disaster recovery, IT organizations can automate more tasks, increasing productivity while decreasing DR testing time. Moreover, the DRaaS functionality can be tested more regularly. When choosing this approach, businesses must pay attention to the SLA and understand the provider’s commitment to business continuity.
9. Test. Test. Test.
Once the DR plan has been implemented, the next step is to test the plan extensively and regularly. Without thorough testing, a DR plan is ineffective. All roles should be involved in the testing, so each person can practice the procedures. Simulate the failure and recovery, noting the difference between the expected RPO and RTO and the actual restore point and time achieved. Determine what, if anything, needs to be changed to meet the objectives. If implementing DRaaS, ask the provider how recovery data will be tested and validated then confirm that the company is testing as promised. In an ideal situation, a company should failover to their DR environment and run their entire business there for a period of time. This is considered best practice.
10. Keep your documentation updated and current.
A good DR plan considers all technologies, systems and applications currently in use. Because technology and configurations are continually changing, a DR plan is not complete unless it is reviewed and updated regularly. Reasons to modify a DR plan include hardware and software updates, new or retired applications, staff changes and new facilities. Live DR events can help you see where there is room for improvement in the DR plan based on staff input and lessons learned from the event. Scheduling regular reviews of the plan is beneficial, but DR plans should ideally be updated when an essential factor in your organization changes. Frequent updates create a more reliable plan.
By understanding your company’s assets and needs, prioritizing business systems, employing sensible technologies and planning carefully, you will be better positioned to recover operations with minimum impact when disaster strikes.