In my role as an Inside Solutions Architect at 11:11 Systems, my objective is to match 11:11 Systems range of solutions to meet customer’s data protection needs, from simple off-site BaaS to fully managed DRaaS solutions. That objective is easier to achieve when customers come to the table with an effective and reliable recovery plan. So how do you make an effective and reliable recovery plan? Here are a couple of key consideration points when constructing your recovery plan.
1. Understand your valuable data assets, core systems and processes.
“Kind of obvious!” I hear you say. Well yes and no…it is not always the usual suspects. Everyone would immediately point to production data as their most valuable data, and it is not wrong. It just there’s more to valuable data than just production data.
Companies need to be able to take that step back from production and look at their data estate holistically. Yes, we want production data/systems protected, up and running ASAP, but what about data/systems that are with upstream or downstream from production data? Even going a step further and looking at the potential impact upstream or downstream suppliers outside the business potentially impact your valuable data assets, core systems and processes.
For example, you are a manufacturing business, taking online orders and you have encountered a major issue. Invoking your recovery plan, you fail over your business to run from your DR site. Only to discover the dispatch department can’t print off address labels, because their label printing PC and courier scheduling software was also affected and currently offline. Or you fail over but neglected to include your Payroll (not production system …right?), I am sure you would want the ability to pay your employees (and I am sure the employees would too !!)
Tip: Before writing a recovery plan, run a tabletop exercise with stakeholders from each area of the business and run scenarios:
“What if we lost x/y/z system?”
“What data is impacted?”
“Who is impacted?”
“What upstream/downstream services are affected?”
2. Ensure your data is clean.
Another obvious one, but not point having a data protection/recovery plan if the data you are recovering isn’t clean. So how do you know your data is clean? There is an old sys admin saying, “backups are your last line of defence,” and never more so that in today’s environment.
Backups should be one part of a multi layered data protection strategy, leverage the features of your backup vendor to allow you to perform checks on the data. For example, Veeam offer SureBackup/SureReplica and Secure restore or Zerto test failover, all help to build that confidence that your data is clean and ready to be recovered, should the need arise.
Tip: Incorporate regular testing/scanning of your backup data, it should also be considered part of your valuable data assets. With that in mind, leverage features such as immutability and air gapped backups, to ensure that valuable backup data is also protected.
3. Have a clean place to put your recovered data.
Little point having clean data to recover, if you do not have a clean place to recover to. Consideration for your recovery plan: “Do I have adequate resources available to recover?” Again, not always the usual suspects when it comes to resources:
Something we cannot buy from a vendor or create, especially when the CEO is stood by our desk during an incident shouting “GET IT BACK ONLINE NOW.” Look to incorporate automation to speed up recovery
Not everyone has infrastructure ready to go. Look to leverage 3rd party MSPs and/or hyperscale cloud providers to provision the required resources
Do the staff have the necessary skills/process knowledge to recovery within the timescales required?
Tip: Include your DR infrastructure/systems in your security scanning/patching schedule
4. Have a backup plan for your recovery plan.
To quote Mike Tyson “Everyone has a plan, until they get punched in the mouth,” which is a subtle way of saying you can do all the planning but until the event you are planning for happens, you will never know. Fortunately, technology today means we can simulate most scenarios and look to automate recovery where possible.
However, automation should not mean teams are off the hook for not knowing the full recovery process. If you have a 100-step automated recovery plan, and it fails at step 57 during a live event, your staff need to know how to progress that recovery plan from step 57 onwards…. have a backup for your recovery plan!
Ok you have your recovery plan, how often are you testing the plan? Once a year? Every 6 months? Are you performing full failovers? Much like your backups, you should consider incremental recovery failovers. Still perform your annual/bi-annual “big bang” failover, but also look to run regular single server/system recovery. It will be less overwhelming for staff to run, and easier to run on a more frequent basis. This should ensure the failover testing schedule is not neglected. Also, ick a different service/system each time.
Tip: Your recovery plan should *NOT* be a static process/document. It should evolve with your environment, so tie it in with your Change Management system. Add a consideration to your change process “Does this impact our ability to recover without existing recovery plan?”
For example, you create a recovery plan in January, but need to “dust it off” in September to recover from an incident. That is 9 months’ worth of infrastructure/application/process changes that could impact the ability to recover. Learn to keep your recovery plan up to date.
If you need help getting started, aligning stakeholders, or even testing recovery, the 11:11 Consulting Services team have a lot of experience helping organizations like yours put together business and disaster recovery plans.