The following is an advanced topic for dedicated server users.
In this article we're replacing one of the drives in a RAID1 array. A RAID1 array mirrors the data from one drive to another so that should one drive fail, the data is safe on the other drive.
The commands are specific to CentOS 5 but it should work on most Linux distributions using software RAID.
First, let's check out our array with the following command ...
# cat /proc/mdstat
A working array looks something like this:
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4] md0 : active raid1 sdb1[1] sda1[0] 4200896 blocks [2/2] [UU] md1 : active raid1 sdb2[1] sda2[0] 2104448 blocks [2/2] [UU] md2 : active raid1 sdb3[1] sda3[0] 726266432 blocks [2/2] [UU] unused devices: <none>
If a drive has failed, it will look something like this:
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4] md0 : active raid1 sda1[0] 4200896 blocks [2/1] [U_] md1 : active raid1 sda2[0] 2104448 blocks [2/1] [U_] md2 : active raid1 sda3[0] 726266432 blocks [2/1] [U_] unused devices: <none>
Notice the underscore beside each U and that sdb1-3 are not listed. This shows that only one drive is active in the array. The drive responsible for partitions sdb1-3 is not in the array and we've no mirror.
You could also use the mdadm command to show the state of the array.
# mdadm -D /dev/md0 /dev/md0: Version : 0.90 Creation Time : Tue Oct 27 16:33:05 2009 Raid Level : raid1 Array Size : 4200896 (4.01 GiB 4.30 GB) Used Dev Size : 4200896 (4.01 GiB 4.30 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sun Feb 21 04:22:55 2010 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 20a50c8c:453c794a:776c2c25:004bd7b2 Events : 0.68 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1
That's a working array. In our case of sdb dying, there would only be 1 device listed as active.
So, next thing to do is take down the server and replace the faulty drive and bring it back up again. This is typically a 15 minute job for data center staff.
When the server is back up again, we now need to recreate partitions on the replaced drive and add them back to the RAID array.
Run fdisk to find the partitions on the working drive.
# fdisk /dev/sda Command (m for help): p Disk /dev/sda: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 523 4200997 fd Linux raid autodetect /dev/sda2 524 785 2104515 fd Linux raid autodetect /dev/sda3 786 91201 726266520 fd Linux raid autodetect Command (m for help): q
We now duplicate that on /dev/sdb with fdisk.
# fdisk /dev/sdb
Use "n" to create 3 primary partions, and then use "t" to change their type to "fd" which is the type Id for "Linux raid autodetect".
Once this is done, write the partition table out with "w" and quit fdisk.
Next we add the three partitions to the array with mdadm
# mdadm /dev/md0 --add /dev/sdb1 # mdadm /dev/md1 --add /dev/sdb2 # mdadm /dev/md2 --add /dev/sdb3
We can watch the rebuild using /proc/mdstat again:
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4] md0 : active raid1 sdb1[1] sda1[0] 4200896 blocks [2/2] [UU] md1 : active raid1 sdb2[1] sda2[0] 2104448 blocks [2/2] [UU] md2 : active raid1 sdb3[2] sda3[0] 726266432 blocks [2/1] [U_] [==>..................] recovery = 12.4% (90387008/726266432) finish=141.6min speed=74828K/sec unused devices: <none>
In the above output, the third partition can be seen recovering. There's another estimated 141.6 minutes to go. The first two have already recovered.