A disaster recovery (DR) plan is an essential part of your company’s Business Continuity Plan (BCP), the living document that details how your organization guards against future disasters—ranging from viruses to terrorist attacks to hurricanes and natural disasters. And while people often use the terms business continuity and disaster recovery interchangeably, the two mean very different things.
A BCP is business-centric, proactive, and addresses more than the IT infrastructure. Sound business continuity planning includes the identification and prioritization of the hundreds of functions an organization needs following a disruption, so it can continue to operate during and after a disaster with minimal downtime or outages.
A DR plan specifically addresses the processes your company will use to recover access to the software, data, hardware, etc., needed to resume your standard, business-critical functions. Your DR plan should provide for redundant data center infrastructure—servers, software, network connections, storage—to support your applications and enable your operations to function effectively.
While BC and DR planning have different focuses, they share the same overarching goal, which is to prepare your business to withstand a disaster and to be able to recover quickly with the least possible damage. Planning for the unexpected—whether it’s a technical failure, violent weather, cyberterrorism, or human error—helps ensure that business remains up and running, even amid the most extreme challenges.
Here are some vital elements to include in your DR planning, with action steps to get you on the right track and tips to help you avoid big mistakes along the way.
Element 1: Business Impact Analysis, RPO, and RTO
A good first step is conducting a Business Impact Analysis (BIA) to identify your most critical systems and processes, as well as the effect of their malfunction. A BIA will determine the functions or activities in your organization considered essential and those which are non-critical. Critical functions include any business activity that’s mandated by law, fulfills a financial obligation, maintains cash flow, safeguards an irreplaceable asset or plays a key role in maintaining market share. Once you have identified which processes are essential, you will assign the following metrics to calculate your company’s level of tolerance for loss and the target time you set for recovery after a disaster has struck.
The first, your Recovery Point Objective (RPO), is focused on data and your company’s loss tolerance in relation to your data. RPO is determined by looking at the time between data backups and the amount of data that could be lost in between backups.
The second, your Recovery Time Objective (RTO), is the target time you set for the recovery of your IT and business activities after you’ve experienced a disaster. The goal of the RTO is to calculate how quickly you need to recover, which then dictates the type of preparations you need to implement and the budget you should allocate towards business continuity.
If for example, you find that your RTO is five hours, meaning your business can survive with systems down for this amount of time, then you will need to ensure a high level of preparation and a larger budget to make sure that you will be able to recover your critical systems quickly. On the other hand, if the RTO is two days, then you can probably budget less and invest in less advanced solutions.
You must define your acceptable recovery time. How quickly you must restore your data and critical systems to resume operations is a serious decision. Understanding how long you can wait to access and apply your data will yield clarity about which solution—data center, cloud, onsite, or Disaster Recovery as a Service (DRaaS)—is best for your company.
Element 2: Risk Assessment
With this business impact analysis in place, you can establish and set priorities as part of your DR plan by conducting a risk assessment. Your risk assessment is a vital step in the DR planning process and identifies potential hazards and the high-value assets—customer information and other sensitive data—and how they align with critical business functions. As you develop your DR plan, and as part of the risk assessment, you must be able to answer the following questions:
- What types of hazards or disasters (man-made or natural) could occur to disrupt the business?
- How could each of these disasters impact the IT functions the business relies on to operate?
The greater the potential impact, the greater the resources that should be allocated to restore a system or process. While you may never be able to plan for all contingencies, it’s imperative to have solutions for the most critical functions that are at risk in a disaster.
Element 3: Establish Priorities
To establish priorities, assemble an appropriate team for your impact analysis, keeping in mind that everyone thinks their area of responsibility is the most important. Gather leaders from IT and various divisions to make the hard decisions about the real operational priorities. Your disaster recovery plan will only be as good as your answers to the following: What applications and infrastructure must be restored immediately if disaster strikes? What is essential for productivity?
One strategy is to divide your applications into levels or tiers. Tier 1 should include the mission critical applications you need immediately. Tier 2 covers applications you need within 12 to 24 hours. Tier 3 includes apps that can wait to be restored for a few days. In addition to data and information systems, your risk assessment should focus on communications infrastructure, communications strategy (both internal and external), secure access and authorization to critical systems, and re-establishing a suitable work environment.
Avoid this mistake: Do not fail to consider the needs of the people who will be carrying out your disaster recovery plan—usually under severe stress. Establish an emergency chain of command and communication strategy, so everyone is in the loop. Also, make sure food and sustenance are readily available, and provide lodging when necessary.
Element 4: Ensure Adequate Resources
Managing disaster recovery on your own requires significant investment in capital, time, and expertise. Even resource-rich companies have to decide how much internal effort to focus on disaster recovery planning vs. growing the business.
Many companies choose an experienced partner to help disaster-proof their systems. A vendor can bring expertise and a programmatic approach to ensure your disaster recovery solutions meet the needs of your business and your IT capabilities.
DR experts today advise that backup data be kept offsite in a secure location, preferably a data center that is unlikely to be affected by the same disaster. Modern technology also offers the option to secure your organization’s data and critical applications in a hosted cloud environment. Either option allows applications and data to be delivered on-demand.
Element 5: Disaster Recovery as a Service (DRaaS)
Some organizations implement a hybrid disaster recovery scenario. With this approach, managed solutions are used to reduce complexity, and remote managed backup services minimize operational impact and risks. You can pair your internal disaster recovery efforts with a DR provider with on premise, data center, and cloud recovery options.
Because cloud-based disaster recovery as a service (DRaaS) solutions can lower your DR costs, they have become an attractive option. A solid DRaaS plan will use virtual machine (VM) replication to move an entire application to the provider’s cloud.
These virtualization and automation advances allow for plenty of flexibility; suppliers can now let companies choose services to support all or only some of their applications, based on their needs. DRaaS solutions also provide improved testing. With the cloud, your infrastructure is highly stable and available, so you can test your services more regularly and overcome the inertia of infrequent or inadequate recovery testing.
Avoid this mistake: Don’t stop short. Disaster recovery solutions require a well-written, researched plan and alignment to business risk and regular testing—all of which require leadership commitment and adequate resources.
The ramifications of downtime—being without the applications and infrastructure—can be severe both in the short-term and long-term, and can include loss of revenue, loss of productivity, reputation degradation, and potential compliance violations.
This article was first published on Onramp.
As OnRamp’s Marketing Manager, Carolina leads the content strategy, SEO, product launch, and communication efforts at OnRamp. With experience in managed hosting, cloud computing and VoIP, she translates complex concepts into simple terms that potential customers and partners can understand and use to build compliant IT solutions.
Connect with Carolina Curby-Lucier on LinkedIn