We recommend backing up your data before proceeding.
Replacing a failed drive in a RAID1 software RAID
RAID1 configurations offer redundancy by mirroring data across two drives. If one drive fails, the other continues operating, ensuring data integrity. This guide explains how to replace a failed drive and rebuild the RAID1 array using the mdadm
utility.
Each Elastic Metal server uses a RAID1 configuration after installation from the Scaleway console. If you want to change the RAID configuration of the server, you can modify the RAID array using rescue mode.
Before you start
To complete the actions presented below, you must have:
- A Scaleway account logged into the console
- Owner status or IAM permissions allowing you to perform actions in the intended Organization
- An Elastic Metal server with at least two disks in RAID1
Removing the failed disk from the RAID configuration
-
Boot server in rescue mode from the Scaleway console.
-
Log in to the server using the rescue account:
ssh em-XXX@<your_elastic_metal_ip>TipThe rescue credentials are available from your server’s status page in the Scaleway console.
-
Run the following command to make sure all disk caches are written to the disk:
sync -
Mark the failed disk as failed using
mdadm
:mdadm --manage /dev/md0 --fail /dev/sdb2 -
Visualize the existing
mdadm
RAID devices by running the following command:cat /proc/mdstatAn output as follows displays:
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]md126 : active (auto-read-only) raid1 sdb3[1] sda3[0]974869504 blocks super 1.2 [2/2] [UU]resync=PENDINGbitmap: 8/8 pages [32KB], 65536KB chunkmd127 : active (auto-read-only) raid1 sdb2[1](F) sda2[0]523264 blocks super 1.2 [2/2] [UU]unused devices: <none>The faulty device is marked with
(F)
. -
Remove the failed disk using the
mdadm --manage
command:mdadm --manage /dev/md0 --remove /dev/sdb2 -
Contact the technical support to replace the failed disk with a working one.
If the command fails due to the device being busy, ensure the disk is unmounted and re-check the status.
Adding the replacement disk to the RAID
- Once the failed disk is replaced, copy the partition table of the source disk to the new disk:
sfdisk -d /dev/sda | sfdisk /dev/sdbImportant
The
sfdisk
command above replaces the entire partition table on the new disk with the one of the source disk. Modify the command if you require preserving other partition information on the disk. - Create a mirror of the source disk using the
mdadm
command:mdadm --manage /dev/md0 --add /dev/sdb2 - Verify the status of the configuration:
mdadm --detail /dev/md0Tip
Use the following command to show the progress of the recovery of the mirror disk:
cat /proc/mdstat
Post-replacement checks
- Check for consistency in the RAID setup:
cat /proc/mdstat
- Monitor the RAID health and status regularly:
mdadm --detail /dev/md0
- Set up email alerts to detect RAID issues early:
mdadm --monitor --scan --daemonise --mail=root@example.com
Regularly monitoring RAID health prevents unexpected failures and data loss.