Jun 12, 2010

Recovering Failed RAID Disk on Linux

Recovering Failed RAID Disk on Linux

Objective:

If primary disk failed, Boot the OS from secondary disk in software RAID 1 (Grub not installed in secondary disk).

Procedure:

1. If the disk is hot-swappable, simply remove it. If it isn't, you'll need to schedule downtime and remove the disk.

2. Replace the failed disk and restart your machine,

a. If your failed disk isn't the boot disk (skip to step 7).
b. If your failed disk is the boot disk

3. Boot to the rescue mode using the 1st cd media, mount the boot filesystem under a temporary mountpoint, and do the following:

# mkdir /tmp/recovery
# mount /dev/sda1 /tmp/recovery
# cd /tmp/recovery
# grub --batch
This may take a while as grub probes and tries to guess where all of your drives are

4. Once grub is finished probing, do the following at the "grub>" prompt:

grub> root (hd0,0)
root (hd0,0)
Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
...
Running "install /grub/stage1 (hd0) (hd0)1+16 p (hd0,0)/grub/stage2
/grub/grub.conf"... succeeded
grub> exit

5. Now verify that all is well while still running off of the CD, like so:

# cat /boot/grub/device.map
(hd0) /dev/sda

6. Unmount the boot filesystem and reboot the system.

# umount /tmp/recovery
# reboot
Be sure to set the grub device map for hd0 to /dev/hdc if /dev/hda has gone bye-bye




7. After replacing the disk check the RAID status

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0]
104320 blocks [2/1] [U_]

md1 : active raid1 sdb2[1] sda2[0]
522048 blocks [2/2] [UU]

md2 : active raid1 sda3[0]
4610560 blocks [2/1] [U_]

unused devices:

8. Repartition the disk, again, with sfdisk and we should end up with our partition table looking exactly the same
# sfdisk -d /dev/sda > mirror
# sfdisk /dev/sdb < mirror
The partition table should look almost identical

9. Now, just need to add back all the partitions
# mdadm -a /dev/md0 /dev/sdb1
# mdadm -a /dev/md1 /dev/sdb2
# mdadm -a /dev/md2 /dev/sdb3

10. Check the RAID details using the below commands
# mdadm -D /dev/md0
# mdadm -D /dev/md1
# mdadm -D /dev/md2
Once the RAID sync is done then restart and check the status.

No comments: