How can a single disk in a hardware SATA RAID-10 array bring the entire array to a screeching halt?

I hate to say “don’t use SATA” in critical production environments, but I’ve seen this situation quite often. SATA drives are not generally meant for the duty cycle you describe, although you did spec drives specifically rated for 24×7 operation in your setup. My experience has been that SATA drives can fail in unpredictable ways, often times affecting the entire storage array, even when using RAID 1+0, as you’ve done. Sometimes the drives fail in a manner that can stall the entire bus. One thing to note is whether you’re using SAS expanders in your setup. That can make a difference in how the remaining disks are impacted by a drive failure.

But it may have made more sense to go with midline/nearline (7200 RPM) SAS drives versus SATA. There’s a small price premium over SATA, but the drives will operate/fail more predictably. The error-correction and reporting in the SAS interface/protocol is more robust than the SATA set. So even with drives whose mechanics are the same, the SAS protocol difference may have prevented the pain you experienced during your drive failure.

Leave a Comment