Hardware raid or software raid? Our take on the great religious debate
I get asked this question a lot: “Do I need hardware raid, or is software raid good enough?” You may have noticed I/O FLOOD only offers software raid, so it will come as no surprise to you that we think software raid is “good enough”. What might surprise you however, is that we believe that in nearly all cases, software raid is far better than hardware raid.
First, to put this in perspective, you have to keep in mind what we believe about technology. I/O FLOOD believes in efficient, effective technology. Specifically, technology that:
- Solves the actual problem you are having
Gives excellent bang for the buck.
Is low maintenance and reliable.
Is easy to set up and get working.
Judging our options against these benchmarks, we had to decide quite a while ago, whether we should support software raid, hardware raid, or both. In the process of researching this decision, we learned a few things, some of them unexpected:
- First of all, for raid in general, and raid 10 in particular, the stripe size matters to performance… A LOT. The bigger the stripe size, the better your random i/o performance, especially for larger files, and especially for reads. Hardware raid cards default to a low stripe size, and in many cases this cannot be raised to an optimal level. If your workload consists of reading large files for many simultaneous clients (common for serving videos), we found that a large stripe size can improve the performance of your drives by 4 fold or more. This limitation in most hardware raid cards was one strike against it.
Secondly, I’m sure you’ve heard that you should use enterprise or raid edition drives for your raid arrays, or you risk data loss. We’ve been very concerned about this and researched the issue extensively. What we found is that this problem stems from the fact that hardware raid cards will not wait longer than a few seconds for a drive to return data before they mark the drive as bad. Enterprise and raid edition drives are aware of this problem and will reply back to the raid controller that they simply were unable to read the data, and the raid card will rebuild the data from another drive without marking that drive as bad. Desktop class hard drives will attempt to read the data for an extended period and be dropped from the raid array. What we learned in our research was that software raid arrays do not have this problem, as linux software raid will patiently wait for a drive to finish its error recovery, and will only drop a drive from an array if it has truly failed. What this means is that, instead of using Western Digital Raid Edition drives for example, you can safely use Western Digital Black Edition drives, which are identical except for two key points. 1) The Raid Edition drives have TLER (a feature required for reliable operation in hardware raid), and 2) The Raid Edition drives cost a LOT more money. By using software raid, you can get the same performance and reliability of hardware raid, while buying dramatically less expensive hard drives.
The third thing we found is that hardware raid can be a real pain in the ass! In order to use hardware raid, you’ve got to first figure out what card you want to be using, and extensively test it’s quality and reliability. If that card is ever no longer being sold or is temporarily unavailable, you need to do the same thing again with a new card. You also will need to set up drivers to configure the card, and often times these drivers have to be compiled into the linux kernel. Every time you update the kernel, the drives connected to the card will disappear from the system until you rebuild driver support into the kernel. It’s also more difficult to boot off a hardware raid, so you either need to work around that problem during install time, which is a serious pain, or you need to install your OS onto a separate drive, at added hassle and expense. If you want to monitor your raid array or repair a broken array, you typically have two options: 1 is to use the software relevant for the raid card under linux, which in our experience is often difficult to install and unreliable, or 2, you have to reboot the server and do the work in the bios. We found both of these options to be extremely inconvenient for us and our customers. Finally, if you want to receive notifications about disk status changes or failures, you typically have to go through the same sets of hassles. Because of these hassles, often times drive failures go unnoticed for an extended period of time, making it much more likely to lose an entire raid array and all of the data on it. With software raid we don’t have any of these problems. Looking into the raid status is as simple as a one line shell command “cat /proc/mdstat”. Setting up an email address to receive raid status changes and failure notifications is similarly easy. Repairing an array takes just a couple of SSH commands, and does not require any software to be installed. And installing your OS onto a software raid in the first place does not require any special install-time drivers or additional hardware.
Next, there’s the issue of cost. Quite simply, a decent raid card is going to cost $300 or more, and then the cost of hardware-raid-compatible drives is at least 30% more than otherwise identical non-hardware-raid drives. Software raid is of course free. Simply upgrading to hardware raid would require us to charge at least 50% more for a typical server! If this massive expense provided a compelling improvement in speed, reliability, or other important metrics, we would consider offering it and recommending it despite the high cost. What we found is that it doesn’t offer added speed, reliability or anything else our users really need. In terms of bang for the buck, hardware raid fails miserably.
Finally, there’s the issue of reliability. How safe is my data? People get raid because they care about this a lot, and so it’s no wonder that someone might be willing to spend a little extra on hardware raid if it kept their data safer. Unfortunately, we found the opposite to be true here. Because linux software raid is a mature and stable product, we’ve had much fewer issues with linux raids losing data than hardware raids. With hardware raid, if the raid card goes pear shaped or forgets about your drive array, that’s it, there’s no coming back from that. If the hardware raid card fails outright, you can usually put in a new hardware raid card of exactly the same model and get your array back, but only if you have a spare card lying around, and only if the new card decides to play ball. In fact, the hardware raid card itself is a huge point of failure, often failing as often as the hard drives it is designed to protect. Because these are niche products with a limited lifespan, the software and firmware quality can sometimes leave something to be desired. This means that things might work well most of the time, but you never know if a bug is going to cause you to lose everything. With linux software raid, you don’t have this problem because it is open source and has been used and abused by a huge number of people over many years. The maturity of software raid means that you are far less likely to have raid related problems crop up, and, even if your entire system blows up, you can move the drives to another server and the array will show up with no issues. For all of these reasons, we consider software raid to be the far superior solution if you care about the reliability and safety of your data.
Getting back to our original point, we had to decide whether to offer software raid, hardware raid, or both. Our deciding factors were:
- Does it solves the actual problem you are having?
Does it gives excellent bang for the buck?
Is it low maintenance and reliable?
Is it easy to set up and get working?
In all of these areas, we feel that linux software raid is miles ahead of hardware raid. There are a small number of cases where hardware raid is the right solution, but for our typical customer who has 6 or fewer sata hard drives and is running the linux operating system, we find hardware raid to offer no benefit whatsoever, and many drawbacks. Because of this, we only offer software raid on our servers.
Not everyone agrees with us. There are many that think hardware raid is absolutely essential, and they won’t order a server without it. This means we don’t make every sale that we could. We could offer hardware raid for the simple reason that people want it, and we might make a few bucks off those people, but we’ve choosen not to. We feel that this is the right decision for us and for our customers, so it’s worth losing a sale every now and then to maintain that integrity.
Don’t agree with us? Feel we missed some important information when making our decision? Leave a comment below to set the record straight.