RAID (Redundant Array of Independent Disk) Fundamentals and Configuration

In late 1980’s, the rapid adoption of computers for the business processes increased the growth of applications and databases which increased the demand for the storage capacity. That time data was stored in large expensive disks called Single Large Expensive Drive (SLED). The single disk does not provide performance and flexibility.

HDDs are susceptible to failures due to mechanical wear and tear and other environmental factors. An HDD failure may result in data loss. The solutions available during the 1980s were not able to meet the availability and performance demands of applications.

The paper  “A Case for Redundant Arrays of Inexpensive Disks (RAID).”  described the use of small capacity, inexpensive disk drives as an alternative to large capacity drives common on mainframe computers.  The
term 
RAID has been redefined to refer to independent disks, to reflect advances in the storage technology. RAID storage has now grown from an academic concept to an industry standard.

RAID IMPLEMENTATION

You can implement RAID in two ways –

  • Hardware RAID
  • Software RAID

Software RAID – It uses host-based software to provide RAID functions. It can be implemented at the software level and not dependent on the hardware controller to manage the RAID array. Software RAID provides cost-effective and simple solution compared to hardware RAID. It has following limitations –

  • Performance – RAID can affect all over system performance because it uses additional CPU cycles.
  • Features – It does not support all RAID levels.
  • Operating System Compatibility – It is tied to the host operating system.

Hardware RAID – To implement hardware raid, specialized hardware is required on a host or on the array.  Controller RAID card connects disks to communicate using PCI bus. Manufacturers are also integrating RAID on motherboards also.  This integrated RAID reduces all over cost but does not provide the flexibility required for high-end storage systems.

RAID ARRAY COMPONENTS

A RAID array is an enclosure which contains a number of HDDs and supporting hardware and software RAID. HDDs inside a RAID usually contained in a small sub-containers. These sub enclosures or physical arrays holds the number of HDDs, may also contain other supporting hardware like power supplies. A subset of disks within the RAID array can be grouped to form logical associations called logical arrays also known as RAID group.

Logical arrays comprised of logical volumes (LV). The operating system recognizes the LVs if they are physical HDD managed by the RAID controller. The number of HDDs in the array depends on the RAID level.

RAID LEVELS

RAID can be defined on the basis of striping, mirroring and parity techniques. These techniques determine the performance and availability of data.

  • Stripping – As you know RAID is set of a group of disks.  In the group, each disk, a predefined number of contiguously block addressable disk blocks are defined as strips.  The set of aligned strips that spans across all the disks within the RAID set is called a stripe. Stripe size also defines the number of blocks in the stripe and is the maximum data that can be written to and read from the single HDD. Striped RAID does not protect data unless parity or mirroring is used. Striping may significantly improve I/O performance
  • Mirroring – Mirroring is a technique which holds data into two different HDDs, yielding two copies of data. If one disk fails, data will be available in the remaining disk(‘s), and controller continues to service the host’s data requests from the surviving disk of a mirrored pair. When the failed disk is replaced with the new disk, controller copies the data from the survived disk of the mirrored pair. This process is transparent to host. It provides redundancy, mirroring enables faster recovery but not suitable for data backup. Mirroring improves read performance because read requests can be serviced by both disks. Mirroring is considered expensive and preferred for the mission-critical applications.
  • Parity – Parity is a method of protecting striped data from HDD failure without the cost of mirroring. An additional HDD is added to stripe width to hold parity, a mathematical construct that allows recreation of missing data. Parity is a redundancy check that ensures full protection without maintaining a full set of duplicate data. Parity information can be stored on dedicated HDD’s or distributed. According to the above diagram, four disks are storing data and the fifth disk is storing parity (sum of elements of each row). if any disk fails, the missing value can be calculated by subtracting the sum of the rest of elements from the parity values. Parity calculation is Bitwise XOR operation. Calculation of parity is a function of RAID controller.

RAID 0   In a RAID 0 configuration, data is striped across the HDDs in a RAID set. It utilizes the full storage capacity by distributing strips of data over multiple HDDs
in a RAID set.

RAID 1–  In a RAID 1 configuration, data is mirrored to improve fault tolerance.  A RAID 1 group consists of at least two HDDs. As explained in mirroring, every write is written to both disks

Nested RAIDMost data centers require data redundancy and performance from their RAID arrays. RAID 0+1 and RAID 1+0 combine the performance benefits of RAID 0 with the redundancy benefits of RAID 1. They use striping and mirroring techniques and combine their benefits.  These types of RAID require an even number of disks, the minimum being four.  RAID 1+0 is also known as RAID 10 (Ten) or RAID 1/0. Similarly, RAID 0+1 is also known as RAID 01 or RAID 0/1. RAID 1+0 performs well for workloads that use small, random, write-intensive I/O. Some applications that benefit from RAID 1+0 include the following:

  • High transaction rate Online Transaction Processing (OLTP)
  • Large messaging installations
  • Database applications that require high I/O rate, random access, and high availability

RAID 3 –  RAID 3 stripes data for high performance and uses parity for improved fault tolerance. Parity information is stored on a dedicated drive so that data can be reconstructed if a drive fails. For example, of five disks, four are used for data and one is used for parity. RAID 3 always reads and writes complete stripes of data across all disks, as the drives operate in parallel. There are no partial writes that update one out of many strips in a stripe.

RAID 4 –  Similar to RAID 3, RAID 4 stripes data for high performance and uses parity for improved fault tolerance

RAID 5 –  RAID 5 is a very versatile RAID implementation. It is similar to RAID 4 because it uses striping and the drives (strips) are independently accessible. The difference between RAID 4 and RAID 5 is the parity location. In RAID 4, parity is written to a dedicated drive, creating a write bottleneck for the parity disk. In RAID 5, parity is distributed across all disks. The distribution of parity in RAID 5 overcomes the write bottleneck.  RAID 5 is preferred for messaging, data mining, medium-performance media serving, and relational database management system (RDBMS) implementations in which database administrators (DBAs) optimize data access.

RAID 6 –  RAID 6 works the same way as RAID 5 except that RAID 6 includes a second parity element to enable survival in the event of the failure of two disks in a RAID group.  RAID 6 implementation requires at least four disks. RAID 6 distributes the parity across all the disks. The write penalty in RAID 6 is more than that in RAID 5; therefore, RAID 5 writes perform better than RAID 6. The rebuild operation in RAID 6 may take longer than that in RAID 5 due to the presence of two parity sets.

RAID CONFIGURATION ON LINUX (CentOS)

RAID looks quite complicated but it is simple. As you know that RAID 0 requires 2 disks, RAID 1 requires 2 disks and RAID 5 requires 3 disks. First, prepare disk partitions and apply partition tag fd.  You have already to create partitions and how to change partition type.

Create different RAID’s

VERIFY RAID CREATION

If there are no problems, we can format md0 using mkfs.ext3, like so:

To attach the array to /var/cache, we just add the following line on /etc/fstab:

Now, you need to create a configuration file for mdadm so that RAID starts perfectly on boot up. You can do so manually, but you can just as easily create the file via themdadm tool itself by running the following command:

This command scans all available disks on the system and looks for RAID markers. It collects all of this information and places it in/etc/mdadm.conf. This information is then used by CentOS during booting to re-create the arrays.

 

 

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">