What is RAID 5 — RAID parity explained
It’s hard to talk about servers without RAID coming up. If you’re considering RAID for your server and want to know if RAID 5 is right for you, or if you just want to learn more about RAID in general, you’ve come to the right place. We’ll go over that and more in this article.
Table of Contents
What is RAID?
RAID is an acronym meaning “Redundant Array of Independent Disks”. As the name implies, RAID creates an array of multiple hard disks in order to provide redundancy. An array simply means a collection of drives that are presented to the operating system as a single logical device. The “redundancy” in RAID is a key feature of most RAID types, used to provide additional reliability for storing data on less-than-perfect hard drives. As a side benefit, by combining many drives into one array, RAID also improves disk access speed and increases available disk space.
Why should I consider RAID for my server?
On a typical home computer, a potential drive failure is not something we always think about. As long as you have backups, it can be an annoying inconvenience, but that’s about it. Given that hard drives fail at between 1% and 10% a year, a typical home computer is unlikely to see a drive failure before the computer becomes obsolete anyway.
For servers, the picture is quite a bit different. Servers often have more than one hard drive, which multiples the chances that one of them will fail. If a server goes offline, it’s often more than a minor inconvenience to wait for data to be restored from backups. Depending what the server was used for, potentially thousands of users will be unable to reach the services they expect to be online 24/7.
Because RAID uses several disks to create one virtual array, it is possible to use RAID to keep multiple copies of data active at one time. Depending upon the RAID type selected, this added redundancy can allow your server to remain online even if one or more drives fail. This is an inexpensive way to improve the performance and reliability of your server.
Does RAID always improve reliability?
As alluded to already, some types of RAID improve reliability, by allowing one, or sometimes more, drives to fail without losing data. To understand why we use the words “some types” and “sometimes” here, it is important to understand the different types of RAID, also known as RAID levels. Some of these provide redundancy, some improve performance, and some do both. Understanding how they work can help you decide which RAID type is right for you.
What RAID types work best?
For server use, there are a few RAID types that are popular for their reliability, performance, and cost. What type works best for you depends upon your particular circumstances.
First off, every server should avoid RAID 0, because it provides no redundancy. If a single drive fails, all data will be lost. Raid 1, on the other hand, does provide data redundancy through mirroring, but none of the other advantages of RAID, and so is not often used for demanding server applications.
The most commonly used RAID levels for servers and web hosting are RAID 5, RAID 6, and RAID 10. In today’s article, we will be discussing RAID 5, a type of “parity RAID”. RAID 5 is ideal in situations where you want to store the most data for the least money, and still retain adequate data protection and performance. We will also touch upon RAID 6, as it is similar to RAID 5. To read about RAID 10, which offers better performance at the expense of available disk space, read our article on RAID 10.
How does RAID 5 work?
RAID 5 is a type of RAID that offers redundancy using a technique known as “parity”. Parity is a type of extra data that is calculated and stored alongside the data the user wants to write to the hard drive. This extra data can be used to verify the integrity of stored data, and also to calculate any “missing” data if some of your data cannot be read (such as when a drive fails).
To explain how it does this, think back to high school algebra class, with equations like “9 = X + 4. Solve for X”. In this case, “X” is unknown data that was previously stored on a drive that has failed. “4” meanwhile, is data that is stored on a drive you can read, and “9” is parity data stored on a third drive, that was previously calculated for redundancy purposes. By solving for X, we can re-construct that the missing data should have been “5”. This allows you to have redundancy without storing a full extra copy of your data, saving disk space compared to RAID 1 or RAID 10.
RAID 5 parity uses a conceptually similar mathematical function called “XOR” to calculate parity. This allows it to reconstruct data when one drive fails. RAID levels that use this type of redundancy are RAID 3, 4, 5, and 6, with RAID 5 and RAID 6 being the only commonly used types. RAID 5 can protect against a single drive failure, whereas RAID 6 can protect against two drive failures. RAID 5 and RAID 6 are otherwise nearly identical, offering similar performance, cost, compatibility, and reliability.
In order to perform this feat, a RAID 5 array sets aside “one drives worth” of disk space for parity data, whereas RAID 6 sets aside “two drives worth” of disk space for parity data. For this reason, RAID 5 requires fewer hard drives but RAID 6 can provide protection against more serious failures. This makes RAID 5 popular for smaller arrays (minimum of 3 drives), and RAID 6 popular for larger disk arrays (minimum of 4 drives).
When would I use RAID 5?
RAID 5 was more popular in the past than today, but still has a number of advantages:
- RAID 5 offers data redundancy, so if one drive fails, you can recover from this. Most RAID types offer this, except RAID 0 which does not.
- Because of its single-parity data storage, RAID 5 offers the most usable disk space of any redundant RAID type. You only lose “one drives worth” of disk space for a RAID 5 array, no matter how many drives it has in it.
- RAID 5 only requires 3 hard drives, whereas RAID 10 and RAID 6 require 4 or more drives.
- Disk read performance and “sequential write” performance on RAID 5 is at least as good, and sometimes superior, to other RAID levels.
- With SSDs becoming more popular, RAID 5 is seeing a new use, as SSDs are very fast but have very little disk space. This leverages the benefits of RAID 5 and minimizes its disadvantages.
- Because of its performance and disk space features, RAID 5 is ideal for storing backups, videos, or other large data that is not frequently updated.
Why shouldn’t I use RAID 5?
Although RAID 5 is popular, it has some important disadvantages which often make other RAID types more appropriate:
- RAID 5 (and other parity RAID types) suffer from very poor “random write performance”, needing to write to every single drive for every request. This is a problem for many server use cases, especially for databases, which are very “random write heavy”.
- RAID 5 is not supported (or performs very poorly) with most inexpensive “fakeraid” or “onboard” RAID controllers, which work best with raid 0 or 1.
- With very large arrays, rebuilding an array after a drive failure can take a very long time (sometimes several days). During the rebuild process, there is a good chance that a second drive will fail, or that part of a drive cannot be read. In either case, the array cannot be rebuilt and all data may be lost. RAID 6 is becoming more popular for this reason, as it can tolerate 2 drive failures. RAID 1 and RAID 10 meanwhile, can rebuild from a failure much more quickly.
- For decades, hard drives have gotten bigger and bigger, but their speed has increased much more modestly. Therefore, the advantages of RAID 5 (extra disk space) have become less important than their disadvantages (slow speeds). This makes RAID 10 a better option in most cases.
- To overcome some performance limitations of RAID 5, hardware RAID controllers sometimes include dedicated “XOR Processors”, large write caches, or both. Although this often improves RAID 5 performance, these types of RAID controllers are very expensive. Similar performance can be obtained from cheaper raid cards or software RAID when using RAID 10 instead.
As you can see, RAID 5 has advantages for large data that rarely changes or SSD based disk arrays. That said, RAID 6 is better for highly reliable large arrays, and RAID 10 is better for high performance arrays. What you ultimately choose should depend upon your specific needs.
RAID: Learning More
This should be a good primer on RAID and give you the information you need to decide if RAID 5 is right for you. However, RAID is a big topic, so if you’d like to learn more, check out one of our upcoming RAID articles:
- Understanding RAID levels: RAID 5, RAID 6, RAID 10, RAID 50, RAID 60, RAID 0, and RAID 1.
- Configuring RAID in CentOS
- Choosing between software and hardware RAID
- How to buy a Hardware RAID card
- Do I need backups, or is RAID good enough?
If you’d like an easy way to get started with a RAID-enabled dedicated server, IOFLOOD.com would be glad to help. Contact us today to see if and IOFLOOD server is right for you.