Redundant array of independent disks or as it is better known — RAID — is a data storage virtualization technology that conflates multiple physical storage devices into a single logical unit. This setup enhances the integrity and reads & write speeds of the data by employing various data redundancy mechanisms, thereby giving you more certainty as to whether your data is intact at any given moment. Many people naturally wonder if RAID may altogether supersede the good old data backup; however, in this article, we will describe how RAID and backup actually complete each other rather than compete with each other.
Spoiler - no, RAID is not a backup. But it will be of a good help
What is RAID?
There are many RAID setups, each with its pros and cons. While we're not going to go at length on all of them, we will briefly explain the most popular setups, their application and how you can leverage them when designing backup plans.
The two main implementations of RAID are software-based and hardware-based. Hardware RAID represents a set of storage devices with a built-in controller that is connected to the disks. Conversely, software RAID is managed by the operating system and is by far the easiest and cheapest RAID implementation, requiring no additional hardware or proprietary firmware.
Now let's break down the most popular RAID setups:
RAID 0 (data striping)
RAID 0 splits your files into blocks and scatters them across your physical storage units, thereby increasing the overall performance (read and write speeds) due to a higher cumulative throughput of all the storage units. This type of RAID, however, offers no data redundancy and thus has no effect on data security. It's great if you're constantly reading or writing data, yet serves no purpose for backup strategies. Furthermore, if one of the disks fails for some reason, all of the data is permanently gone, as no file was stored on a single disk, but distributed in blocks across all of them.
Pros: high performance
Cons: No extra data security due to lack of redundancy mechanisms
RAID 1 (data mirroring)
RAID 1 simply mirrors your data between two drives within the RAID. So whenever you write some data to one of the drives in a pair, this data is automatically replicated on the other drive. Needless to say, you get no increase in performance since your data is just identically distributed between two disks within the RAID. The primary function of RAID 1 is to provide data redundancy so that if one disk fails, you always have its partner disk with the same information working properly. When you replace the faulty drive, the data from the partner disk is copied to the new disk, thereby restoring the original pair. This solution is not very cost-effective, as you need double the storage to store one file in contrast with the RAID 0 arrangement. However, it's perfect for storing critical or sensitive data since the probability of losing it is slim to none.
Pros: Disk failure tolerance
Cons: Higher storage costs
RAID 5 (data striping and parity)
Being the most balanced and thus most popular RAID configuration, RAID 5 offers good redundancy and relatively good performance. You can use minimum 3 drives in RAID 5 and 16 maximum. Blocks within this configuration are striped with distributed parity. In case of data loss, distributed parity allows recalculating the data of one of the blocks. That means that RAID 5 can withstand the loss of one drive and continue working without data loss.
Pros: Read transactions are fast, disk failure tolerance
Cons: Write transactions are relatively slow, scalability is limited to 16 drives
RAID 10 (data striping and mirroring)
RAID 10 conflates RAID 0 and RAID 1 into a nested setup that offers both high performance and disk failure tolerance. It is essentially a RAID 0 that consists of RAIDs 1. So the files you store on RAID 10 are first to split into blocks that are then placed on various RAIDs 1, and the said RAIDs 1 — which are basically subsets of the RAID 0 — then, in turn, duplicate all those blocks between two drives in a pair. This arrangement offers a higher performance of RAID 0 and data redundancy of RAID 1. But while RAID 0 and RAID 1 require at least two physical storage devices, RAID 10 requires at least four of them.
Pros: Higher performance, disk failure tolerance.
Cons: High cost, limited scalability
The list of other RAID setups comprises a handful of other implementations that each has its own application, advantages, and drawbacks. For the sake of brevity, we're not going to examine all of them. Yet the aforementioned cases should give you a clear picture of the essence and application of RAID when it comes to data storage. Now let's proceed to assess the role RAID setups in sensible backup designs.
RAID is not a Backup
RAID in its entirety cannot completely replace backup. RAID 0, for instance, does not even replicate the data within its disks, meaning that any corrupt data block will inevitably result in data loss. RAID 1, conversely, does store at least two instances of every file distributed between two disks, thereby ensuring that the failure of one of them will inflict no damage on the other. So in some sense RAID 1 (or RAID 5, which can also stand a failure of one drive from its structure) can be regarded as a form of local backup.
It, however, becomes a backup only when it is a copy of your files - that means that if you have the one and only copy of your files on the RAID - you cannot consider it a backup. So, if you have a copy of your files on RAID 1 or RAID 5 - you can consider it a backup. But having just one backup is never enough, not to mention the importance of having at least two local backups on two physically independent storage devices and at least one cloud backup, as per the industry's best practices.
Although the RAID is unable to replace backup by itself, it may very well serve as an auxiliary to your backup storage devices or services. For instance, you may back up your critical data directly to RAID 1, thus automatically creating two identical backups. Similarly, you can back up a RAID itself; that'll be fairly easy as your operating system regards RAIDs as regular hard drives, meaning that you can back up the data in the same manner as the data stored on your computer's internal storage.
Leveraging RAID for the 3-2-1 Backup Strategy
As we've already mentioned, the right backup strategy usually implies having at least two local backups on two independent storage devices and at least one cloud backup. This is the so-called 3-2-1 backup strategy. If you employ a RAID (or even NAS) setup for data storage purposes, here's how it can fit into your grand scheme of backup design:
So that is the role a RAID setup can play in your attempt to adhere to the 3-2-1 rule. Let us reiterate once again: RAID in itself is not a worthy replacement for backups, but rather a handy extension of your data storage arrangement. If used properly, RAID can ensure that your data is ever intact while offering higher performance on the read & write side of things. The appreciable convenience of RAID, however, is that it can be used as both as a backup destination and a backup target. So if you're intent on setting up a RAID at your workplace, be sure to take advantage of its benefits when designing backup plans.