Rebuilding a RAID1 array

The following is an advanced topic for dedicated server users.

In this article we're replacing one of the drives in a RAID1 array. A RAID1 array mirrors the data from one drive to another so that should one drive fail, the data is safe on the other drive.

The commands are specific to CentOS 5 but it should work on most Linux distributions using software RAID.

First, let's check out our array with the following command ...

# cat /proc/mdstat

A working array looks something like this:

Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4] 
md0 : active raid1 sdb1[1] sda1[0]
      4200896 blocks [2/2] [UU]
      
md1 : active raid1 sdb2[1] sda2[0]
      2104448 blocks [2/2] [UU]
      
md2 : active raid1 sdb3[1] sda3[0]
      726266432 blocks [2/2] [UU]
      
unused devices: <none>

If a drive has failed, it will look something like this:

Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4] 
md0 : active raid1 sda1[0]
      4200896 blocks [2/1] [U_]
      
md1 : active raid1 sda2[0]
      2104448 blocks [2/1] [U_]
      
md2 : active raid1 sda3[0]
      726266432 blocks [2/1] [U_]
      
unused devices: <none>

Notice the underscore beside each U and that sdb1-3 are not listed. This shows that only one drive is active in the array. The drive responsible for partitions sdb1-3 is not in the array and we've no mirror.

You could also use the mdadm command to show the state of the array.

# mdadm -D /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Tue Oct 27 16:33:05 2009
     Raid Level : raid1
     Array Size : 4200896 (4.01 GiB 4.30 GB)
  Used Dev Size : 4200896 (4.01 GiB 4.30 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Feb 21 04:22:55 2010
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 20a50c8c:453c794a:776c2c25:004bd7b2
         Events : 0.68

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1

That's a working array. In our case of sdb dying, there would only be 1 device listed as active.

So, next thing to do is take down the server and replace the faulty drive and bring it back up again. This is typically a 15 minute job for data center staff.

When the server is back up again, we now need to recreate partitions on the replaced drive and add them back to the RAID array.

Run fdisk to find the partitions on the working drive.

# fdisk /dev/sda

Command (m for help): p

Disk /dev/sda: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1         523     4200997   fd  Linux raid autodetect
/dev/sda2             524         785     2104515   fd  Linux raid autodetect
/dev/sda3             786       91201   726266520   fd  Linux raid autodetect

Command (m for help): q

We now duplicate that on /dev/sdb with fdisk.

# fdisk /dev/sdb

Use "n" to create 3 primary partions, and then use "t" to change their type to "fd" which is the type Id for "Linux raid autodetect".

Once this is done, write the partition table out with "w" and quit fdisk.

Next we add the three partitions to the array with mdadm

# mdadm /dev/md0 --add /dev/sdb1
# mdadm /dev/md1 --add /dev/sdb2
# mdadm /dev/md2 --add /dev/sdb3

We can watch the rebuild using /proc/mdstat again:

Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4] 
md0 : active raid1 sdb1[1] sda1[0]
      4200896 blocks [2/2] [UU]
      
md1 : active raid1 sdb2[1] sda2[0]
      2104448 blocks [2/2] [UU]
      
md2 : active raid1 sdb3[2] sda3[0]
      726266432 blocks [2/1] [U_]
      [==>..................]  recovery = 12.4% (90387008/726266432) finish=141.6min speed=74828K/sec
      
unused devices: <none>

In the above output, the third partition can be seen recovering. There's another estimated 141.6 minutes to go. The first two have already recovered.

Knowledgebase

Categories

Categories

Related Articles

Support

Site Map

Info

Contact

Our Philosphy

Knowledgebase

Categories

Categories