The Different RAID Levels and their Applications

Original Article Date: 2006-02-27

You've all probably heard the word RAID in respect to computers. You probably also know that it's something to do with disks and protecting data.

In truth, RAID is becoming more famous. Once originally the preserve of high-end mission-critical enterprise computing, RAID can now be found on all but the most budget home computers.

But what exactly is it? And when is a specific type of RAID useful or appropriate for my specific computing application? And what pitfalls should I be aware of in trusting my valuable data to a RAID?

RAID is an acronym - Redundant Array of Independent (or Inexpensive) Disks. And with the exception of RAID Level 0, is a redundant system of data protection, which writes your data onto more than one physical disk, such that a loss of a physical disk will not mean the loss of your data. In fact, your computer will continue to operate as if nothing had happened (apart from a small warning that a failure had occured). This is why RAID has been so popular in the server market. Servers need to be up 24/7, and need to be protected again random disk failures. When a disk fails, it can be replaced, and the RAID "re-built" at a future date at a time that is most convenient to the user or IT admin.

Beyond this basic description are a number of different RAID levels, which are arbitrary numbers based upon standards that were defined nearly twenty years ago. The RAID level describes exactly how the data is saved to multiple disks. Each RAID level has specific benefits and drawbacks, the most important of these being:

  • Level of data protection
  • Read performance
  • Write performance
  • Storage capacity efficiency
  • Cost

So selecting the appropriate RAID level to your application is vital!

The three most commonly used RAID levels are 0, 1 and 5:

- RAID 0 provides the best performance at the lowest price but has NO redundancy
- RAID 1 provides the simplest and cheapest form of redundancy, but at the expense of poor capacity efficiency
- RAID 5 provides the most efficient storage capacity usage, but is more costly than RAID 0 or 1.

Let's go through each of these in more detail...

RAID 0 (Striping)

The lowest RAID level, RAID 0 (or striping), is actually a misnomer. It isn't actually a RAID, because there is no data redundancy ! Because your data is written across two or more disks without any duplication of this data, the loss of any of the physical disks will result in the loss of ALL your data across ALL drives. So in actual fact, the chance of losing your data is increased over that of a single drive. The more drives you add to your RAID 0 (the maximum number is limited only by the number of ports on the RAID controller card), the higher the risk of losing all your data.

So why on earth would anyone want to use RAID 0? Because RAID 0 writes data across more than one disk in a striped fashion, it can provide up to 30% better disk read and write performance than a single volume. And because disk failure rates these days are very low, so as long as your vital data is stored elsewhere, the use of RAID 0 merely carries with it the risk of inconvenience of replacing the failed unit and reinstalling any applications, should the unlikely occurrence of a disk failure occur.

A very good example of the use of RAID 0 is a scratch (or swap) volume, used by graphics, video editing and other high-intensity workstation applications. A scratch volume is a disk that is used by Photoshop, for instance, to store temporary uncompressed image data, just like a very large cache. Because this data is used frequently by the application, increasing the disk performance will result in a direct increase in application performance, so RAID 0, with its higher performance is highly beneficial here. And because the data isn't vital, is temporary in nature, then the loss of a disk, resulting in loss of all the data on the RAID, will merely be a nuisance, instead of cause for heart-failure.

RAID 1 (Mirroring)

The most commonly used RAID level, RAID 1 is often called mirroring, as data stored on one drive is mirrored onto its pair. So a RAID 1 is always comprised of two disks. Performance is about the same for a single drive on write, but faster on read, as the controller can access both disks at the same time when reading. The controller is usually cheap, and rebuilding the array (re-mirroring) after a disk failure is simple. The only drawback to RAID 1 is that you only get 50% of the storage capacity total of the the two drives, e.g. 2 x 36GB SCSI drives (73GB total physical storage) in RAID 1 only gives you 36GB as a useable data volume. It's also limited to 2 drives. So when you are using a large number of disks in RAID, you can't RAID them all together, and because of the poor capacity efficiency, it becomes expensive in terms of $ per GB.

That said, RAID 1 is so popular as it represents a cheap and convenient way of providing data redundancy in smaller systems and servers. Through the use of onboard RAID controllers, RAID 1 is now being used in home desktops around the world, in addition to millions of low-end servers which is its traditional preserve.

RAID 5 (Parity)

RAID 5 brings with it the hallmark of superb storage capacity efficiency with full redundancy. RAID 5 writes your data across multiple drives (minimum of 3 drives) using a relatively complex system of writing parity (bitwise checksums) information across the last drive in any given stripe. Each stripe starts at a different drive, so the checksum information is stored throughout the array. In this way, should ANY drive in the array fail, the other drives have enough information to keep going without losing data .

If you have four drives in the array, you will have the capacity of three drives as a useable volume. If you have 8 drives in the array, you can use the total volume of 7 of them, i.e.:

V = (n - 1) * v

where:
V = the total useable volume, in bytes
n = the total number of physical drives in the array
v = the individual drive size, in bytes.

This makes RAID 5 the ideal solution in any array of more than 4 physical drives, as it brings with it the convenience of a single large logical volume for data storage, combined with excellent data storage capacity efficiency.

The only drawback to RAID 5 is write performance. Because the RAID sub-system must write parity bits on each write instance, this can slow write performance somewhat. Most applications, however, are read-intensive, so it is unlikely most users will notice this downside.

Other RAID Levels

Other RAID Levels exist. For instance, what happened to RAID 2, 3 and 4? They were in the original RAID specification, but never caught on, and so I won't go into them here.

The other most commonly used RAID levels are actually combinations of other RAID units:

RAID 10 - Mirror followed by stripe. Allows pairs of mirrored drives (RAID 1) to be striped (RAID 0). This gives the best performance (both read and write) achievable by any RAID with full redundancy, and so is suitable for enterprise database applications, but is expensive. Four drives minimum.

RAID 0+1 - Stripe followed by mirror. Not commonly used, as it's usually more appropriate to stripe mirrored drives instead of mirroring striped drives. This is because RAID 10 is tolerant of failure in any of the RAID 1 units that make up the stripe, whereas RAID 0+1 will fail if a drive fails on more than one of the mirrors. Four drives minimum.

RAID 50 - Parity followed by stripe. The largest and most complex of all RAIDs, RAID 50 is basically a series of 2 or more RAID 5s striped. It provides improved storage capacity efficiency over RAID 10, and improved write-performance over RAID 5, but, like RAID 10, is expensive. Six drives minimum.

RAID 6 - The new kid on the block, and is currently being pushed by Intel. It's the same as RAID 5, except it can tolerate TWO drive failures in any given array, instead of one. However, one more additional drive is needed, i.e. the formula is V = (n - 2) * v, so is not as storage efficient. Also, many professional RAID solutions come with "hot-spare" capability (discussed below), making a RAID 5 a virtual RAID 6 anyway, so I'm not sure RAID 6 will catch on. Four drives minimum.

JBOD - "Just a Bunch Of Disks". This is not a true RAID, but is just a way of stringing a number of physical drives end-to-end to make a composite volume of a size equivalent to the total of all the disks in the JBOD. There are no performance enhancements, and brings no benefits or penalties to drive redundancy or failure. If one of the drives in the JBOD fails, only the data on the failed drive will be lost, unlike in a RAID 0, where all the data across all drives would be lost. Two drives minimum.

Hardware and Software RAID

It is very important to appreciate that some RAID controllers come with an on-board I/O processor and some do not. Virtually all "on-board" (included with the mainboard) RAID solutions and most RAID Controller cards under $150 do NOT come with their own processors. These software RAID controllers instead use a certain percentage of your CPU power through the OS and a "software" driver to perform the process of splitting and replicating data to the array.

If you have plenty of spare CPU power then this is not really a problem, but for premium RAID 0, 5, 10 and 50 performance you will need a professional hardware RAID controller card that comes with its own I/O Processor. These are easily visible as a large block of silicon through a physical inspection of the card. The I/O processor on these controllers performs all the necessary processing of stripe, mirror and parity operations in order to build and synchronize the RAID without calling on external resources such as CPU power.

Let's also not forget that some operating systems (notably Windows Server and Linux) come with their own software RAID implementations. The performance of these OS-driven RAIDs is essentially the same as that of an software RAID controller, but with the added convenience of a more friendly configuration environment within the OS. So if your OS has its own RAID solution, use that instead of the on-board software RAID controller.

Hot-Swap

You've probably heard the term "hot-swap" or "hot-plug" in relation to computers. It is actually a generic term that can be applied to any PC component. It means that the device can be plugged in or swapped out while the PC is still on, or "hot".

However, the term most commonly relates to hard-drives, specifically a physical sub-assembly that comes with the chassis, or added on, that allows hard drives to be attached to trays that can be slid in and out of a bay at the front of the machine. At the back of the bays is a backplane that receives the SATA/SCSI and power connectors on the drive and is connected to the disk controller via a SCSI cable or multiple SATA cables. (Compliance with hot-swap was the main reason behind the new SATA power connector, which allows hot-swapping without dangerous voltage arcing.)

Hot-swap is sometimes confused with the term "hot-spare" (see below), but the two concepts are disrelated. Hot-swap is integral to RAID for server applications, as it allows the replacement of failed drives and the re-building of arrays whilst the machine is on, and more importantly, while the OS is still running, to prevent loss of service to your users or customers.

Not all RAID Controllers support hot-swap, so it's worth checking if this is a feature that is vital. Most, if not all, hardware RAID controllers, however, will support it. Also bear in mind that most on-board controllers will only rebuild the degraded RAID within their own BIOS, and will not allow you to do this within the OS. So if 24/7 uptime is critical, you must use a professional hardware RAID solution.

Hot-Spare

Most professional hardware RAID controllers come with a "hot-spare" option that allows to specify a spare hard drive, that will not be used until a failure within the array occurs. Immediately upon this happening, the hot-spare will replace the failed drive within the array, as if the user or admin had just swapped out the drive physically. This has the advantage that the RAID will still remain redundant, and fault-resilient, should another drive fail in the time it takes to get the failed drive replaced.

Summary of RAID Level Pros and Cons

Level Fault Tolerance Performance Storage Capacity Efficiency Overall Value Application
RAID 0 Worse Excellent Good Cheap Video/Image/Engineering Workstation Applications
RAID 1 Good Good Read
Average Write
Poor Average Entry Servers, Desktops
RAID 5 Good Good Read
Average Write
Excellent Average Mid-Range Servers, especially DAS, NAS or high-capacity storage
RAID 10 Good Excellent Poor Very Expensive High-performance enterprise servers - e.g. database
RAID 50 Good Excellent Read,
Good Write
Good Expensive High-performance enterprise servers - e.g. database
JBOD None Average Good Cheap Non-critical large volume storage

As you can see from the above table, there is no "right" RAID solution. Each one has its pros and cons, and which one you pick will depend upon your priorities in regard to data security, performance, storage efficiency and cost.

All of our workstations and servers come with onboard software RAID 0 and 1 included. But if you require RAID levels beyond this, or want a professional hardware RAID solution, you can select to upgreade from a full range of SATA and SCSI RAID controller cards from the top four manufacturers - 3ware, Adaptec, LSI and Intel.

Understanding RAID, and which level is appropriate to your application is one of the most important decisions you can make when ordering a new system. So if you have any questions about which one to go for, just let us know, and we'll be glad to help!

Best regards,

Ben Ranson
Chief Systems Engineer