{"id":33514,"date":"2019-06-22T18:02:50","date_gmt":"2019-06-22T14:02:50","guid":{"rendered":"https:\/\/www.msp360.com\/resources\/?p=33514"},"modified":"2023-12-13T16:09:58","modified_gmt":"2023-12-13T12:09:58","slug":"disaster-recovery-testing","status":"publish","type":"post","link":"https:\/\/www.msp360.com\/resources\/blog\/disaster-recovery-testing\/","title":{"rendered":"Disaster Recovery Testing: Best Practices and Scenarios"},"content":{"rendered":"<p>Disaster recovery testing is the process to ensure that an organization can restore data and applications and continue operations after an interruption of its services, critical IT failure or complete disruption. It is necessary to document this process and review it from time to time with their clients. It will ensure that you know how to save your client in the event of any fail. Keep reading to learn more about disaster recovery testing scenarios and disaster recovery testing best practices. <!--more--><\/p>\n<h2>Introduction<\/h2>\n<p>It\u2019s important to ensure that you and your customers are on the same page, not only to manage expectations, but to make sure that you can point to a list of requirements that you have fulfilled if something (or everything) goes wrong.<\/p>\n<p>This should include a plan for regular testing of disaster recovery scenarios, again so you can demonstrate that you\u2019ve done your due diligence, as well as finding and eliminating potential problems before they become real problems.<\/p>\n<p>There are several variables that will affect how much disaster recovery testing you\u2019ll need to do, as well as what you\u2019ll be charging, and what expectations your clients will have. The size of the company you\u2019re supporting, their budget for a DR solution, the complexity of their data structures and networks, (whether all contained on your network, or some internally on their network, or more at additional service providers), amount of data to be backed up, and so forth.<\/p>\n<div class=\"call-to-action\">\n<div class=\"call-to-action__left\" style=\"width: 65%;\">\n<div class=\"call-to-action__tag\">FREE whitepaper<\/div>\n<div class=\"call-to-action__title\">Backup and Disaster Recovery on AWS<\/div>\n<div class=\"call-to-action__text\">Every minute of downtime means money lost.<br \/>\nPlan your perfect disaster recovery strategy on AWS:<\/div>\n<!--HubSpot Call-to-Action Code --><span class=\"hs-cta-wrapper hs-cta-deferred\" id=\"hs-cta-wrapper-1877aad4-920c-45f8-a82a-42c9c41af323\" data-portal=\"5442029\" data-id=\"1877aad4-920c-45f8-a82a-42c9c41af323\"><span class=\"hs-cta-node hs-cta-1877aad4-920c-45f8-a82a-42c9c41af323\" id=\"hs-cta-1877aad4-920c-45f8-a82a-42c9c41af323\"><!--[if lte IE 8]><div id=\"hs-cta-ie-element\"><\/div><![endif]--><a href=\"https:\/\/cta-redirect.hubspot.com\/cta\/redirect\/5442029\/1877aad4-920c-45f8-a82a-42c9c41af323\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"hs-cta-img\" id=\"hs-cta-img-1877aad4-920c-45f8-a82a-42c9c41af323\" style=\"border-width:0px;\" src=\"https:\/\/no-cache.hubspot.com\/cta\/default\/5442029\/1877aad4-920c-45f8-a82a-42c9c41af323.png\" alt=\"CTA\"><\/a><\/span><\/span><!-- end HubSpot Call-to-Action Code -->\n<\/div>\n<div class=\"call-to-action__right\" style=\"width: 35%;\"><img decoding=\"async\" src=\"https:\/\/www.msp360.com\/resources\/wp-content\/uploads\/2019\/10\/Backup-and-DR-WP-icon.png\" alt=\"whitepaper icon\" \/><\/div>\n<\/div>\n<h2>Disaster Recovery Testing Scenarios<\/h2>\n<p>There are many potential disasters, but we can categorize them into several major groups:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-31501 size-full\" src=\"https:\/\/www.msp360.com\/resources\/wp-content\/uploads\/2019\/06\/DR_scenarios-Equipment_failures.png\" alt=\"DR scenarios - Equipment failures\" width=\"168\" height=\"188\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Equipment failures<\/strong><\/p>\n<p>These range from server meltdowns, to storage failures to communications breakdowns, to power failures.<\/p>\n<div class=\"clear\" style=\"clear: both;\"><\/div>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-31503 size-full\" src=\"https:\/\/www.msp360.com\/resources\/wp-content\/uploads\/2019\/06\/DR_scenarios-User_errors.png\" alt=\"DR scenarios - User errors\" width=\"168\" height=\"188\" \/><strong style=\"font-size: 1rem;\">User errors<\/strong><\/p>\n<p>Probably the most common type of disaster, a user accidentally deleting anything from one file to a whole database, or an update applied to a database that erases data or crashes the database server. While these might seem more backup rather than DR issues, migrating servers from one cloud to another or to an offsite server make them disaster recovery scenarios.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-31504 size-full\" src=\"https:\/\/www.msp360.com\/resources\/wp-content\/uploads\/2019\/06\/DR_scenarios-Natural_disasters.png\" alt=\"DR scenarios - Natural disasters\" width=\"168\" height=\"188\" \/><\/p>\n<p><strong>Natural disasters<\/strong><\/p>\n<p id=\"last\">One thinks first of flooding or hurricanes, but wildfires, earthquakes, tsunamis, landslides, even things like cicadas bursting out from beneath the basement have all caused data loss or system unavailability.\u00a0The end results might include power loss for extended periods, destruction of the data center, evacuation of personnel (or personnel unable to get to work), loss of network connectivity, or even destruction of a large area, including branch offices, power, phone and other utilities.<\/p>\n<div id=\"slidebox\"><a class=\"close\">\u00a0<\/a><!--HubSpot Call-to-Action Code --><span class=\"hs-cta-wrapper hs-cta-deferred\" id=\"hs-cta-wrapper-a8864b01-95db-44e6-b545-031f240c4fbc\" data-portal=\"5442029\" data-id=\"a8864b01-95db-44e6-b545-031f240c4fbc\"><span class=\"hs-cta-node hs-cta-a8864b01-95db-44e6-b545-031f240c4fbc\" id=\"hs-cta-a8864b01-95db-44e6-b545-031f240c4fbc\"><!--[if lte IE 8]><div id=\"hs-cta-ie-element\"><\/div><![endif]--><a href=\"https:\/\/cta-redirect.hubspot.com\/cta\/redirect\/5442029\/a8864b01-95db-44e6-b545-031f240c4fbc\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"hs-cta-img\" id=\"hs-cta-img-a8864b01-95db-44e6-b545-031f240c4fbc\" style=\"border-width:0px;\" src=\"https:\/\/no-cache.hubspot.com\/cta\/default\/5442029\/a8864b01-95db-44e6-b545-031f240c4fbc.png\" alt=\"CTA\"><\/a><\/span><\/span><!-- end HubSpot Call-to-Action Code --><\/div>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-31505 size-full\" src=\"https:\/\/www.msp360.com\/resources\/wp-content\/uploads\/2019\/06\/DR_scenarios-Staff_loss.png\" alt=\"DR scenarios - Staff loss\" width=\"168\" height=\"188\" \/><\/p>\n<p><strong>Loss of key staff<\/strong><\/p>\n<p>You should know how to get the network passwords in case your admin is hit by a bus, but also know who has the password to your cryptocurrency wallet, or the password to make changes to your network connection, or to order supplies.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-31506 size-full\" src=\"https:\/\/www.msp360.com\/resources\/wp-content\/uploads\/2019\/06\/DR_scenarios-Malware_risks.png\" alt=\"DR scenarios - Malware risks\" width=\"168\" height=\"188\" \/><\/p>\n<p><strong>Malware risks<\/strong><\/p>\n<p>This is a growing category, that has gone from viruses that were originally about amateur hackers showing how clever they were to financially-motivated worms, Trojans, and ransomware, engineered by sophisticated professional hackers, to malware designed to steal data, which has gone beyond even the expert programmers, to systems run by nation-states.<\/p>\n<p>These threats are not only pervasive and persistent, but constantly evolving. Retrofitting a building to protect from earthquakes might only need to be done once, but if you don\u2019t keep your malware protection and DR recovery infrastructure updated, you\u2019ll be at risk within days.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-31507 size-full\" src=\"https:\/\/www.msp360.com\/resources\/wp-content\/uploads\/2019\/06\/DR_scenarios-Unexpected_events.png\" alt=\"DR scenarios - Unexpected events\" width=\"168\" height=\"188\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Other unexpected events<\/strong><\/p>\n<p>This need not be an alien invasion, it could be a simple as a distracted driver taking a shortcut through your lobby (or your server room).<\/p>\n<p>&nbsp;<\/p>\n<p><span class=\"further-reading \">Further reading<\/span>\u00a0<a href=\"https:\/\/www.msp360.com\/resources\/blog\/disaster-recovery-scenario-example\/\">Real-Life Disaster Recovery Scenario<\/a><\/p>\n<h2>Methods for Disaster Recovery Testing<\/h2>\n<p>This isn\u2019t as simple as picking one of the methods below. You might need to use all of them. Some cover ensuring that business practices align with the disaster recovery plan, some cover ongoing changes to your systems (or your customer\u2019s systems), and some cover testing the hardware and software by simulating a disaster and restoring a file or system or data center to full functionality.<\/p>\n<p>All of these plans should be reviewed and tests should be ongoing. This doesn\u2019t necessarily mean running through a full plan once a month \u2013 you might run through some part of each plan on a weekly basis, a bigger part once a month, and a full test once a year. The important part is to perform disaster recovery testing regularly, and ensure that any additions to the business are reflected in the DR plan.<\/p>\n<h3>Walkthrough test<\/h3>\n<p>This is a step-by-step review of the plan with the client, reading the plan to ensure that everyone is aware of all the steps and that nothing has been overlooked or added since the last review.<\/p>\n<h3>Tabletop test<\/h3>\n<p>This kind of test is a \u2018what if\u2019 scenario. Lay out a specific kind of disaster, and ask each team member what they would do. A representative of every department should attend, and knowledge of business processes is critical. This may reveal gaps in the plan, which can be addressed before they cause a DR failure.<\/p>\n<h3>Technical Tests<\/h3>\n<h4>Parallel<\/h4>\n<p>A parallel test restores a system that hasn\u2019t actually broken down to an alternate location. The real system continues to run and there\u2019s no interruption to business services. This is safe, and not only tests the functionality of backup and restore systems, but can reveal potential problems. An inexpensive way to do this is to run the restore in a virtual machine in the cloud, rather than having to dedicate a physical server somewhere.<\/p>\n<p><span class=\"further-reading \">Further reading<\/span> <a href=\"https:\/\/www.msp360.com\/resources\/blog\/how-to-perform-physical-to-virtual-restores\/\">How to Perform Physical to Virtual Restores with MSP360 Backup<\/a><\/p>\n<p>For instance, if a new version of a server is spun up in the cloud, and it\u2019s not exactly the same software version and OS version as the operational system, the restored system might malfunction, or a user or service on the restored system might not have the proper credentials, and cause problems. These can all be revealed by attempting to restore the production system somewhere else.<\/p>\n<p>However, a parallel test is not a full test. A parallel system can test backup and restore functionality, and help with ironing out permissions and other issues, but since the restored system is not actually being put into place, with users accessing it, other issues like ensuring that the domain name service (DNS) entries are redirected to the proper place aren\u2019t tested, and without production loads, it also won\u2019t be clear whether the new system has the necessary capacity to run the applications.<\/p>\n<h4>Live, or \u201cfull interruption\u201d testing<\/h4>\n<p>This actually downs the main system and attempts to recover it. It\u2019s a more thorough test, but if the recovery attempts fail, it can cause serious and expensive downtime, and in some cases, may not be possible due to public safety or regulatory concerns. An alternative is to migrate the main system to an alternate location, perhaps from a Virtual Machine on the main server to an alternate VM on another server. This still has the potential to cause disruptions, but migrating back to the original server would normally be faster than bringing up the original server from scratch, if the restore fails or has connectivity or other problems.<\/p>\n<p>A third alternative is to do a restore to an alternate server or VM, without bringing down the main server, then change network addresses or DNS entries to move traffic to the alternate server, leaving the main server online, but with no traffic.<\/p>\n<p>This can even be carried a step further, by using a load balancer to spread traffic across the main server and the alternate, with either one dropping out of the pair if necessary, or after the test. This can be carried out without service interruption, but the load balancer capability will add cost and complexity to the system as a whole.<\/p>\n<p><span class=\"further-reading \">Further reading<\/span> <a href=\"https:\/\/www.msp360.com\/resources\/blog\/disaster-recovery-faq\/\">Disaster Recovery FAQ: Essential Definitions for IT Pros and MSPs<\/a><\/p>\n<h2>Disaster Recovery Testing Best Practices<\/h2>\n<p>Disaster recovery testing best practices are influenced by budget. It\u2019s possible to put a clustered, multi-node data system in place that can recover from one service, one server, or even a whole data center in one location going down. The issue is cost.<\/p>\n<p><span class=\"further-reading \">Further reading<\/span> <a href=\"https:\/\/www.msp360.com\/resources\/blog\/data-recovery-best-practices-for-msps\/\">Data Recovery Best Practices<\/a><\/p>\n<p>Migrating a service from one server to another is easy and cheap. Migrating servers is more costly, and migrating whole data centers is much more expensive. It\u2019s a question of what you (or the client) is willing to spend. There\u2019s a balance that you\u2019ll have to find between cost and availability. This may not be the same for all lines of business or departments: archived accounting records don\u2019t necessarily need to be available within less than a second in the event of failures, while the web site, e-commerce system or production database may need to be available 24x7x365.<\/p>\n<h3>Perform disaster recovery testing frequently. Create a schedule for testing<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-31508 alignleft\" src=\"https:\/\/www.msp360.com\/resources\/wp-content\/uploads\/2019\/06\/Disaster_recovery_testing_schedule.png\" alt=\"Disaster recovery testing schedule\" width=\"168\" height=\"189\" \/>This is critical to maintaining service in the event of a disaster. Many, many organizations have only found out that a system wasn\u2019t functioning properly after a disaster took their systems down and they weren\u2019t able to restore them. The only way to find these kinds of problems and fix them before they bankrupt the business is to test, regularly, and thoroughly.<\/p>\n<h3>Thoroughly document your test<\/h3>\n<p>D<img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-31509 alignleft\" src=\"https:\/\/www.msp360.com\/resources\/wp-content\/uploads\/2019\/06\/Documenting_disaster_recovery_test.png\" alt=\"Documenting disaster recovery test\" width=\"168\" height=\"189\" \/>ocumentation is your friend. There is often resistance to documenting business practices as well as disaster recovery testing. However, these records will not only help you find gaps in protection at the next review, but also document your efforts to keep things running, essential if there\u2019s an actual problem and everyone is pointing fingers at someone else.<\/p>\n<h3>Test both your DR solution and your people<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-31510 alignleft\" src=\"https:\/\/www.msp360.com\/resources\/wp-content\/uploads\/2019\/06\/Testing_DR_solution_and_people.png\" alt=\"Testing DR solution and people\" width=\"168\" height=\"189\" \/>The tests should include both the equipment and software, but also the people. Give department heads a scenario like this: customer ABC Enterprises has lost their entire data center in a mudslide that took out the building. We need to restore their data center to AWS instances, and find terminals for their employees to configure services and get work done for at least the next three months, until the building can be evaluated and systems purchased. What do we need to do, where is the documentation for their systems, and what\u2019s our first step?<\/p>\n<h3>Review and update your DR plan regularly<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-31511 alignleft\" src=\"https:\/\/www.msp360.com\/resources\/wp-content\/uploads\/2019\/06\/Updating_DR_plan.png\" alt=\"Updating DR plan\" width=\"168\" height=\"189\" \/>Even if a plan is in place and has been successfully tested, it still needs to be reviewed and updated regularly. It\u2019s so easy for any user with a credit card and a little knowledge to bring up a new server in the cloud, or clone a database and store it in a different service. It\u2019s up to you to regularly review their systems to ensure that everything critical is covered and secured.<\/p>\n<p>You can put policies in place to forbid people from branching out on their own, but if they\u2019re not aware of the policy, you could still lose critical data. You need to review, get buy-in from users and departments, develop more disaster recovery testing best practices, and ensure that everything is covered.<\/p>\n<p>Learn more about <a href=\"https:\/\/www.msp360.com\/resources\/blog\/disaster-recovery-planning\/\">disaster recovery planning<\/a>:<br \/>\n<span class=\"further-reading \">Further reading<\/span> <a href=\"https:\/\/www.msp360.com\/resources\/blog\/disaster-recovery-plan-checklist\/\">Disaster Recovery Plan Checklist<\/a><\/p>\n<h2>Conclusion<\/h2>\n<p>Disaster recovery plans cannot remain static. They have to evolve to include addition to the business, and must be tested and checked for gaps in coverage. It\u2019s also critical to ensure that all of the relevant managers and IT personnel understand the plan and know where to get the necessary information in the event of a disaster. There are so many ways to fail \u2013 the only way to not fail is to update, test and retest the plan.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Disaster recovery testing is the process to ensure that an organization can restore data and applications and continue operations after an interruption of its services, critical IT failure or complete disruption. It is necessary to document this process and review it from time to time with their clients. It will ensure that you know how [&hellip;]<\/p>\n","protected":false},"author":46,"featured_media":44454,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[886,889,878],"tags":[922],"class_list":["post-33514","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-backup-and-dr-guides","category-msp-business-guides","category-msp-university","tag-draas"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.msp360.com\/resources\/wp-json\/wp\/v2\/posts\/33514","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.msp360.com\/resources\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.msp360.com\/resources\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.msp360.com\/resources\/wp-json\/wp\/v2\/users\/46"}],"replies":[{"embeddable":true,"href":"https:\/\/www.msp360.com\/resources\/wp-json\/wp\/v2\/comments?post=33514"}],"version-history":[{"count":2,"href":"https:\/\/www.msp360.com\/resources\/wp-json\/wp\/v2\/posts\/33514\/revisions"}],"predecessor-version":[{"id":56958,"href":"https:\/\/www.msp360.com\/resources\/wp-json\/wp\/v2\/posts\/33514\/revisions\/56958"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.msp360.com\/resources\/wp-json\/wp\/v2\/media\/44454"}],"wp:attachment":[{"href":"https:\/\/www.msp360.com\/resources\/wp-json\/wp\/v2\/media?parent=33514"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.msp360.com\/resources\/wp-json\/wp\/v2\/categories?post=33514"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.msp360.com\/resources\/wp-json\/wp\/v2\/tags?post=33514"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}