The Different RAID Levels and their Applications
Original Article Date: 2006-02-27
You've all probably heard the word RAID in respect to computers. You probably
also know that it's something to do with disks and protecting data.
In truth, RAID is becoming more famous. Once originally the preserve of high-end
mission-critical enterprise computing, RAID can now be found on all but the
most budget home computers.
But what exactly is it? And when is a specific type of RAID useful or
appropriate for my specific computing application? And what pitfalls should I
be aware of in trusting my valuable data to a RAID?
RAID is an acronym - Redundant Array of Independent
(or Inexpensive) Disks. And with the exception of RAID Level
0, is a redundant system of data protection, which writes your
data onto more than one physical disk, such that a loss of a physical disk will
not mean the loss of your data. In fact, your computer will continue to operate
as if nothing had happened (apart from a small warning that a failure had
occured). This is why RAID has been so popular in the server market. Servers
need to be up 24/7, and need to be protected again random disk failures. When a
disk fails, it can be replaced, and the RAID "re-built" at a future
date at a time that is most convenient to the user or IT admin.
Beyond this basic description are a number of different RAID levels,
which are arbitrary numbers based upon standards that were defined nearly
twenty years ago. The RAID level describes exactly how the data is saved to
multiple disks. Each RAID level has specific benefits and drawbacks, the most
important of these being:
-
Level of data protection
-
Read performance
-
Write performance
-
Storage capacity efficiency
-
Cost
So selecting the appropriate RAID level to your application is vital!
The three most commonly used RAID levels are 0, 1 and 5:
- RAID 0 provides the best performance at the lowest price but
has NO redundancy
- RAID 1 provides the simplest and cheapest form of
redundancy, but at the expense of poor capacity efficiency
- RAID 5 provides the most efficient storage capacity usage,
but is more costly than RAID 0 or 1.
Let's go through each of these in more detail...
RAID 0 (Striping)
The lowest RAID level, RAID 0 (or striping), is actually a
misnomer. It isn't actually a RAID, because there is no data redundancy
! Because your data is written across two or more disks without any
duplication of this data, the loss of any of the physical disks will
result in the loss of ALL your data across ALL drives. So in actual fact, the
chance of losing your data is increased over that of a single
drive. The more drives you add to your RAID 0 (the maximum number is limited
only by the number of ports on the RAID controller card), the higher the risk
of losing all your data.
So why on earth would anyone want to use RAID 0? Because RAID 0 writes data
across more than one disk in a striped fashion, it can provide up to 30%
better disk read and write performance than a single volume. And
because disk failure rates these days are very low, so as long as your vital
data is stored elsewhere, the use of RAID 0 merely carries with it the risk of
inconvenience of replacing the failed unit and reinstalling any applications,
should the unlikely occurrence of a disk failure occur.
A very good example of the use of RAID 0 is a scratch (or swap) volume,
used by graphics, video editing and other high-intensity workstation
applications. A scratch volume is a disk that is used by Photoshop, for
instance, to store temporary uncompressed image data, just like a very large
cache. Because this data is used frequently by the application, increasing the
disk performance will result in a direct increase in application performance,
so RAID 0, with its higher performance is highly beneficial here. And
because the data isn't vital, is temporary in nature, then the loss of a disk,
resulting in loss of all the data on the RAID, will merely be a nuisance,
instead of cause for heart-failure.
RAID 1 (Mirroring)
The most commonly used RAID level, RAID 1 is often called mirroring,
as data stored on one drive is mirrored onto its pair. So a RAID 1 is always
comprised of two disks. Performance is about the same for a single drive on
write, but faster on read, as the controller can access both disks at the same
time when reading. The controller is usually cheap, and rebuilding the array
(re-mirroring) after a disk failure is simple. The only drawback to RAID 1 is
that you only get 50%
of the storage capacity total of the the two drives, e.g. 2 x 36GB SCSI drives
(73GB total physical storage) in RAID 1 only gives you 36GB as a useable data
volume. It's also limited to 2 drives. So when you are using a large number of
disks in RAID, you can't RAID them all together, and because of the poor
capacity efficiency, it becomes expensive in terms of $ per GB.
That said, RAID 1 is so popular as it represents a cheap and convenient
way of providing data redundancy in smaller systems and servers. Through the
use of onboard RAID controllers, RAID 1 is now being used in home desktops
around the world, in addition to millions of low-end servers which is its
traditional preserve.
RAID 5 (Parity)
RAID 5 brings with it the hallmark of superb storage capacity efficiency
with full redundancy. RAID 5 writes your data across multiple
drives (minimum of 3 drives) using a relatively complex system of writing parity
(bitwise checksums) information across the last drive in any given
stripe. Each stripe starts at a different drive, so the checksum information is
stored throughout the array. In this way, should ANY drive in the array fail,
the other drives have enough information to keep going without losing data
.
If you have four drives in the array, you will have the capacity of three
drives as a useable volume. If you have 8 drives in the array, you can use the
total volume of 7 of them, i.e.:
V = (n - 1) * v
where:
V = the total useable volume, in bytes
n = the total number of physical drives in the array
v = the individual drive size, in bytes.
This makes RAID 5 the ideal solution in any array of more than 4 physical
drives, as it brings with it the convenience of a single large logical volume
for data storage, combined with excellent data storage capacity efficiency.
The only drawback to RAID 5 is write performance. Because the RAID sub-system
must write parity bits on each write instance, this can slow write performance
somewhat. Most applications, however, are read-intensive, so it is unlikely
most users will notice this downside.
Other RAID Levels
Other RAID Levels exist. For instance, what happened to RAID 2, 3 and 4? They
were in the original RAID specification, but never caught on, and so I won't go
into them here.
The other most commonly used RAID levels are actually combinations of other RAID
units:
RAID 10 - Mirror followed by stripe. Allows pairs of mirrored
drives (RAID 1) to be striped (RAID 0). This gives the best performance
(both read and write) achievable by any RAID with full redundancy,
and so is suitable for enterprise database applications, but is expensive.
Four drives minimum.
RAID 0+1 - Stripe followed by mirror. Not commonly used, as
it's usually more appropriate to stripe mirrored drives instead of mirroring
striped drives. This is because RAID 10 is tolerant of failure in any of the
RAID 1 units that make up the stripe, whereas RAID 0+1 will fail if a drive
fails on more than one of the mirrors. Four drives minimum.
RAID 50 - Parity followed by stripe. The largest and most
complex of all RAIDs, RAID 50 is basically a series of 2 or more RAID 5s
striped. It provides improved storage capacity efficiency over RAID 10, and
improved write-performance over RAID 5, but, like RAID 10, is expensive.
Six drives minimum.
RAID 6 - The new kid on the block, and is currently being
pushed by Intel. It's the same as RAID 5, except it can tolerate TWO drive
failures in any given array, instead of one. However, one more additional drive
is needed, i.e. the formula is V = (n - 2) * v, so is not as storage
efficient. Also, many professional RAID solutions come with "hot-spare"
capability (discussed below), making a RAID 5 a virtual RAID 6 anyway, so I'm
not sure RAID 6 will catch on. Four drives minimum.
JBOD - "Just a Bunch Of Disks". This is not a true RAID, but is
just a way of stringing a number of physical drives end-to-end to make a
composite volume of a size equivalent to the total of all the disks in the
JBOD. There are no performance enhancements, and brings no benefits or
penalties to drive redundancy or failure. If one of the drives in the JBOD
fails, only the data on the failed drive will be lost, unlike in a RAID 0,
where all the data across all drives would be lost. Two drives minimum.
Hardware and Software RAID
It is very important to appreciate that some RAID controllers come with an
on-board I/O processor and some do not. Virtually all
"on-board" (included with the mainboard) RAID solutions and most RAID
Controller cards under $150 do NOT come with their own processors. These software
RAID controllers instead use a certain percentage of your
CPU power through the OS and a "software"
driver to perform the process of splitting and replicating data to the
array.
If you have plenty of spare CPU power then this is not really a problem, but for
premium RAID 0, 5, 10 and 50 performance you will need a professional hardware
RAID controller card that comes with its own I/O Processor. These
are easily visible as a large block of silicon through a physical inspection of
the card. The I/O processor on these controllers performs all the necessary
processing of stripe, mirror and parity operations in order to build and
synchronize the RAID without calling on external resources such as CPU power.
Let's also not forget that some operating systems (notably
Windows Server and Linux) come with their own software RAID implementations.
The performance of these OS-driven RAIDs is essentially the same as that of an
software RAID controller, but with the added convenience of a more friendly
configuration environment within the OS. So if your OS has its own RAID
solution, use that instead of the on-board software RAID controller.
Hot-Swap
You've probably heard the term "hot-swap" or "hot-plug" in relation to
computers. It is actually a generic term that can be applied to any PC
component. It means that the device can be plugged in or swapped out while the
PC is still on, or "hot".
However, the term most commonly relates to hard-drives, specifically a physical
sub-assembly that comes with the chassis, or added on, that allows hard drives
to be attached to trays that can be slid in and out of a bay at the front of
the machine. At the back of the bays is a backplane that
receives the SATA/SCSI and power connectors on the drive and is connected to
the disk controller via a SCSI cable or multiple SATA cables. (Compliance with
hot-swap was the main reason behind the new SATA power connector, which allows
hot-swapping without dangerous voltage arcing.)
Hot-swap is sometimes confused with the term "hot-spare" (see below), but the
two concepts are disrelated. Hot-swap is integral to RAID for server
applications, as it allows the replacement of failed drives and the re-building
of arrays whilst the machine is on, and more importantly, while the OS is still
running, to prevent loss of service to your users or customers.
Not all RAID Controllers support hot-swap, so it's worth checking if this is a
feature that is vital. Most, if not all, hardware RAID controllers, however,
will support it. Also bear in mind that most on-board controllers will only
rebuild the degraded RAID within their own BIOS, and will not allow
you to do this within the OS. So if 24/7 uptime is critical, you must use a
professional hardware RAID solution.
Hot-Spare
Most professional hardware RAID controllers come with a "hot-spare" option that
allows to specify a spare hard drive, that will not be used until a failure
within the array occurs. Immediately upon this happening, the hot-spare will
replace the failed drive within the array, as if the user or admin
had just swapped out the drive physically. This has the advantage that the RAID
will still remain redundant, and fault-resilient, should another drive fail in
the time it takes to get the failed drive replaced.
Summary of RAID Level Pros and Cons
| Level |
Fault Tolerance |
Performance |
Storage Capacity Efficiency |
Overall Value |
Application |
| RAID 0 |
Worse |
Excellent |
Good |
Cheap |
Video/Image/Engineering Workstation Applications |
| RAID 1 |
Good |
Good Read,
Average Write |
Poor |
Average |
Entry Servers, Desktops |
| RAID 5 |
Good |
Good Read,
Average Write |
Excellent |
Average |
Mid-Range Servers, especially DAS, NAS or high-capacity storage |
| RAID 10 |
Good |
Excellent |
Poor |
Very Expensive |
High-performance enterprise servers - e.g. database |
| RAID 50 |
Good |
Excellent Read,
Good Write |
Good |
Expensive |
High-performance enterprise servers - e.g. database |
| JBOD |
None |
Average |
Good |
Cheap |
Non-critical large volume storage |
As you can see from the above table, there is no "right" RAID solution. Each
one has its pros and cons, and which one you pick will depend upon your
priorities in regard to data security, performance, storage efficiency and
cost.
All of our workstations and servers come with onboard software RAID 0 and 1
included. But if you require RAID levels beyond this, or want a professional
hardware RAID solution, you can select to upgreade from a full range of SATA
and SCSI RAID controller cards from the top four manufacturers - 3ware,
Adaptec, LSI and Intel.
Understanding RAID, and which level is appropriate to your application is one
of the most important decisions you can make when ordering a new system. So if
you have any questions about which one to go for, just let us know, and we'll
be glad to help!
Best regards,
Ben Ranson
Chief Systems Engineer