So, recently I started looking to see if there was any nice hardware around to provide a solid enclosure for a Freenas based home made NAS storage system. In looking into this, I ran across this page: Freenas Raid Overview. What really caught my eye was the statement “CAUTION: RAID5 “died” back in 2009″ and a link to this article: Why RAID 5 stops working in 2009. Worried that I had made a fatal error in my existing 12TB (6x2TB) RAID 5 setup, I read on and realized something wasn’t right. And it got worse; a follow up article in 2013 Has RAID5 stopped working by the same author continued on in error. “What’s the problem?” you might ask. Well, it is a failure to understand fundamental math.
See, the author (and, to be fair, lots of people) makes a mistake when looking at probability of separate events added together. They make the assumption that if you have six separate events each with a given probability of happening, and you put them all together, then as a whole you’ve increased your chance of that event happening. That’s completely wrong. Your overall probability is no greater than the individual probabilities. Each individual event has no effect on the other events. So since you have six 2TB disks with a max URE failure rate (probability of failure to read) of 1×10^14 you are still only looking at the failure of that 2TB disk, not of the 12TB of storage. If you really want to try to account for combined events, you can take the chances of having two drives fail with URE at the same time. This is done by multiplying the events together. So 1/(1×10^14) times 1/(1×10^14) equals 1/(1×10^28) probability of failure, that is a URE of 1×10^28! All failures probabilities are completely independent. And it gets better from there:
1. With the probability and statistics error stated above, you are only looking at the chance of failure for each individual disk, not the whole storage array. So you have a 1×10^14 probability of a read failure for a 2TB disk during the recovery of any disk. Yes, this technically gets worse as drive sizes increase, but you would need to read each individual, COMPLETELY FULL 2TB disk, in whole 6.25 times (for the needed 12.5TB of data) to hit this probability of failure point on that disk. For a 4TB disk you have to read the entire full disk 3.125 times, so worse odds, but in most setups this still is unlikely to occur during a rebuild (unless you’ve just got bad luck).
2. That 1×10^14 is the MAX unrecoverable read error rate. That means that you should get no more than that number of failures. You are actually likely to get less than that number of failures, so can expect to be able to read more data than 12.5 TB before a failure. See, more good news!
3. When RAID 5 is in recovery mode, you are not reading a full 2 TB of data off your full 2TB disks to rebuild your failed drive. The parity information to recover the drive is only the total usable storage divided by the number of drives in the array. For a 2TB x 6 array (12TB of raw storage) you get 10TB of usable storage. That 10TB is divided by 6 to give you about 1.67 TB of data needed to be read off each individual 2 TB drive to recover the failed drive in the array. So, again, your odds get better.
Yes, the chance of failure does go up as drives get larger (assuming URE doesn’t improve), and, yes, you should ALWAYS have offsite (a different raid box) backup for anything you don’t want to risk losing (good disaster recovery strategy anyways). But RAID 5 isn’t dead and is still an excellent choice for good performance, reliability, and cost.
And here is my real life example: I made the mistake of purchasing Seagate “green” 2TB drives for my original 6x2TB NAS box. These drives have a little bug, they report “failed” even when they haven’t really failed when they are used with some hardware raid solutions. For 4 months after I installed these drives, I had a drive failure just about every three weeks and had to do a rebuild of 5TB of data (take failed drive out, format it blank, stick it back in, rebuild). That’s about five RAID 5 rebuilds before I finally gathered the funds to replace all the drives with WD red NAS drives (no failures since). Oh, and each time I swapped out a red drive for a green drive, another Raid 5 rebuild, so six more rebuilds for a total of eleven. Guess what, I got lucky and there were no URE events during any of those rebuilds and no data was lost (yes I have off site backup as well). Of course when I say luck, I mean my odds were pretty good I wouldn’t have a catastrophic failure as the other author claimed I would. 😉
It’s nice to see some sense after all the FUD, mainly by FreeNAS’s lot.
RAID5, ECC, RAM – all are “imperative!!”. As mentioned by a 2nd gen ZFS developer, makes someone more likely to avoid it and pick a WORSE solution.
The other thing that people assume is: a failure will lose the entire pool. Maybe with old RAID cards, because these just gave up. But modern ones and specially software raid (specially with the increase of ZFS) you would only lose that area of data – MUCH easier to restore from backup. It’s not like you have to pull down a whole 10TB from your online backup of your pirate media.
Which is the next point – media – most people claiming RAID5 are dead are using their home setups for media – forgetting that RAID is for uptime not for a backup.
Pingback: RAID 5 Parity. What is it, and how does it work? - john kawakami
Late to the party, but I don’t quite agree. I think you got the math wrong here 🙂
First, I wouldn’t consider UREs while the redundancy is still operational. Then, it’s obvious that RAID5 has an incredibly high chance of being able to recover the URE. We’re not worried that RAID5 will fail on us while redundancy is working. We’re worried that rebuilding the redundancy after a disk failure event happened would not work.
When trying to read 5x2TB of data to recover a failed disk in a 6x2TB Raid5 array, each read operation fails with a probability of 1/(10^14). This error probability is not “per disk”, but per bit read, as you can see in the WD red datasheet (https://www.wdc.com/content/dam/wdc/website/downloadable_assets/eng/spec_data_sheet/2879-800002.pdf). And the chance having an URE while reading 10TB is therefore 10TB * 1/10^14 = 80%!
Experience shows that this is probably isn’t correct. The reason for this is that the actual chance for an URE is much less than what’s specified in the datasheet. This is the actual reason why raid5s aren’t failing left and right.
You can easily see this if you look at one of your 2TB disks. With an URE chance of 1/10^14, you could only read the data off the disk roughly 3 times before a bit-read would fail on average (50% expected value of an URE). Experience shows us that this is probably not true.
But, you are also making another wrong or at least very bold assumption in that article. You say that “all disk failures and probabilities are completely independant”. And this is, in my opinion, the reason why Raid5 is really dead, because they are not. Usually, people use disks of the same type in their raid. More than that, they buy them together. Those disks have most likely been produced by the same machine at the same factory, have been transported in the same trucks and ships (and experienced the same shocks), have experienced the same PSU ripples in the same thermal environment and have read and written exactly the same amount of data in their lifespan. Surely, it’s quite a bold statement to say that if one of those disks fails, the chance of all others being completely fine is rather a bold statement to make. In these scenarios, disk failures are NOT independant.
Sorry, my math above is utterly wrong. The core of the statement isn’t 🙂
The chance of an URE is 1/10^14 per bit read. The chance of success is therefore 1 – 1/10^14.
We’re interested in a complete read of 10TB (8 * 10^13 bits) without an URE. Basically, we’re tossing a coin that has an incredibly high chance for head (the chance is 99.999999999999%) many, many times (8*10^13 times). So the overall chance of success – means we never hit a tail while tossing the coin – is (1 – 1/10^14)^(8 * 10^13) = 45%. If the chance of an URE was 1/10^14, then a raid5 rebuild would only succeed in 45% of cases.
This is why raid5 is dead.
I have to disagree, the probability of independent events are still independent event probabilities. The URE failure chance is only based on the data of a single disk having a failure. The way probability works if you try to combine the chance of multiple independent probabilities happening at the same time your chance of an event goes WAY down. To demonstrate, let’s make the odds much worse for drives.
Let’s say a drive can have a URE in 1/100. Let’s say we have 5 of these drives. If we thought that the error rate of raid 5 comes from having all five drives having a chance of failure, then we take each drives chance of failure (URE is independent of other drives), and we multiple their chance of failure.
1/100 * 1/100 * 1/100 * 1/100 * 1/100 = 1/10000000000
Our odd of all those drive having a URE failure is TINY.
Reality is it only takes one drive to fail (URE) during recovery, it’s really just our original 1/100 failure chance, but that’s the same for any raid design (once it’s in degraded state). Nothing has become worse just because you are using raid 5.
PS. I’m still using that same Raid 5 array, another 5 years later with the same 6 2TB WD Red Drives (though I’m looking to upgrade to SSD now…)
Wow, I’ve just been researching this too! And I proved my initial belief to be wrong, that the more drives in RAID 5 array the less chance to fail, as there’d be fewer reads per drive. It works out to the same chances.
The math is as follows:
1) One URE in 1×10^14 reads (for the math, we’re considering it a random chance), and a URE results in failure
2) Reading 12.5TB of data gives 1×10^14 reads
3)This approaches the formula 1-(1/e), or 63.18% probability of failure, not almost 100% as that original “RAID 5 is Dead in 2009” article states.
4) You can prove that David’s math is wrong when stating “They make the assumption that if you have six separate events each with a given probability of happening, and you put them all together, then as a whole you’ve increased your chance of that event happening. That’s completely wrong. Your overall probability is no greater than the individual probabilities.” Think of it as reading 1 bit from 1×10^14 drives vs 1 bit from 1 drive. Once again, it approaches the 63.18% probability of failure. The examples work better with higher numbers, so a coin toss 50% chance isn’t really a good example, but even at a 10% chance of failure it can be calculated: 1-(((10-1)/10)^10)=.6513 (or 65.13%).
There is another article “Why RAID5 Still Works, Usually.” In the comments, nicko88 states that he rebuilt a 16TB RAID 5 system 20 times, without a failure, a total of 640TB reads. So I will agree that, based on anecdotal data, the chance of a URE must be much less than 1×10^14, some have stated closer to 1×10^18 or so.
Finally, I’m not sure how a hard drive reads data, but I thought it read in increments of bytes, not a per bit read, so that would dramatically decrease the number of reads. So I think assumption 2 isn’t correct for real-world situations. Which would mean that the 1 x 10^14 URE can be correct, but you just get much more than 1 bit of data per read.
My probability math isn’t wrong, probability math isn’t based on bits or bytes, it just a mathematical equation. The manufacture states a probability of an event happening for a single drive. When you consider multiple events of independent drives, the probability of two of those events happening at the same time goes down.