Contrary to what some IT professionals believe, backups are not ”set it and forget it”. You need to plan their execution carefully, and build and set up a solution. You should then expect to have to fix issues from time to time, until your setup is flawless.
And even once your backup setup is flawless, major parts of your infrastructure may change (such as, for example, you decide to switch from on-premises solutions to cloud applications and cloud-native data); and so you start all over again.
Your backups need attention and they can fail. And, while you may be angry with the vendor of the backup solution, you are still responsible for returning the data to a working state. In this article, we define the typical areas for backup failures and provide guidance on how you should deal with them.
1. Damaged Backups
Damaged backups may not be the most frequent cause of losing backup data, but it's for sure one of the most frightening. Just imagine a situation where your solution reports that backups have successfully been uploaded to the storage – and then, one day, when you need to recover them, your data is corrupt.
Why is this a terrifying scenario? Because, without proper recovery tests, you are never sure that your data is recoverable. As an IT pro managing backups, you should either check your data consistency frequently and perform recovery tests, or you should be aware that your data could be corrupt.
But what are the typical reasons for data to be damaged? Here are the most common cases:
- Ransomware. Modern-day crypto-lockers can recognize backup storage and backup data and encrypt it.
- Damaged backup media. Typically, this happens with on-premises backup media. For example, you might fail to notice that one of your drives in a RAID array has gone down and, while your storage is still safe, it's one step closer to disaster.
- Failed backup solution. Today, you can find an established backup vendor with a proven and tested backup solution that will get your data to the storage in one piece. However, if you use outdated or freeware backup solutions, there is a chance that some of the data on your storage is inconsistent and you won't be able to recover at least some of it.
So how do you protect yourself against damaged backup? First of all, you need to employ a modern-day backup solution with a proven, stable history. Some backup solutions even include automated data consistency checking in their offering.
Secondly, you should test the recoverability of your backups from time to time, to be 100% sure you can get your data back.
2. Missing or Failed Backups
Sometimes, although people might think that their backups are fine, they are not merely damaged, but do not exist at all. This typically happens when a system administrator relies on his/her memory in order to start the backup, rather than on setting up a schedule.
The second reason for missing backups could be that the administrators forget to, or do not know how to, set up alerts about any issues with backups.
So, to avoid losing your backups completely, you should:
- Create an automated schedule
- Make sure that your backups are not interrupted
- Set up notifications about backups failing to complete
Further reading Backup Retention Policy and Scheduling Best Practices
3. Backup Is Too Slow
A less dramatic technical issue with backups is that they might take ages to actually finish, thus slowing down your progress and messing with your recovery time and recovery point objectives.
The main reasons for slow backups are:
- Slow Internet connection, network performance issues. Here you should find an appropriate time and settings that will allow you to use the maximum-possible network throughput without affecting your company’s operations. If there is no way you can enhance your upload speeds, then you should think about backup data prioritization and choose which mission-critical data to upload first.
- Slow performance of the backed-up computer. It might turn out that the computer you are trying to back up is overloaded with tasks in the given time frame, which might result in performance issues during the backup or even downtime for the whole computer. So, when choosing a time frame to perform the backup, you should check the peak workloads and avoid interfering with normal computer operations.
- Wrongly chosen backup type. There are different backup types. Some are more suitable for file-level backup and will work great with single files; others will upload a full copy of your machine or selected partitions to storage. So you should choose the most suitable backup type for the given dataset.
4. Backup Is Inaccessible
Lastly, and quite surprisingly, you could lose access to your backup storage or backup media, thus losing precious time during a disaster. There are several main reasons for losing access:
- Forgotten or lost credentials. First and foremost, you should keep your passwords safe, in one place, where you can access them. Also, don't share your administrative credentials, and when you rotate your passwords (which is not recommended unless they have been compromised), you should not forget to update your password management system with the new data.
- Compromised credentials. Malefactors can change your credentials on the backup storage, so you cannot access backups and recover the data. This is the second reason why you should keep your passwords safe, watch the access logs to the backup media and storage for any suspicious activity, and restrict your users from accessing the backup storage. The fewer people have access, the smaller the chances that you will be successfully hit.
Diversify the Risks
To finish, we give one more piece of practical advice. You should stick to the 3-2-1 backup strategy to diversify the technical risks to your backups.
The 3-2-1 backup strategy means that you have at least three copies of your data, in two different datasets, where one dataset should be in offsite storage. Such a strategy allows you to mitigate the risk as much as possible. For sure, it's costly and hard to set up a backup strategy but, in the end, your backups should be safe, and we all know that.