Was this page helpful?

Replacing a failed drive in a RAID1 software RAID

Reviewed on 02 January 2025 • Published on 26 August 2022

RAID1 configurations offer redundancy by mirroring data across two drives. If one drive fails, the other continues operating, ensuring data integrity. This guide explains how to replace a failed drive and rebuild the RAID1 array using the mdadm utility.

Each Elastic Metal server uses a RAID1 configuration after installation from the Scaleway console. If you want to change the RAID configuration of the server, you can modify the RAID array using rescue mode.

Before you startLink to this anchor

To complete the actions presented below, you must have:

A Scaleway account logged into the console
Owner status or IAM permissions allowing you to perform actions in the intended Organization
An Elastic Metal server with at least two disks in RAID1

Removing the failed disk from the RAID configurationLink to this anchor

Tip

We recommend backing up your data before proceeding.

Boot server in rescue mode from the Scaleway console.

Log in to the server using the rescue account:

ssh em-XXX@<your_elastic_metal_ip>

Tip

The rescue credentials are available from your server’s status page in the Scaleway console.

Run the following command to make sure all disk caches are written to the disk:

sync

Mark the failed disk as failed using mdadm:

mdadm --manage /dev/md0 --fail /dev/sdb2

Visualize the existing mdadm RAID devices by running the following command:

cat /proc/mdstat

An output as follows displays:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md126 : active (auto-read-only) raid1 sdb3[1] sda3[0]
      974869504 blocks super 1.2 [2/2] [UU]
        resync=PENDING
      bitmap: 8/8 pages [32KB], 65536KB chunk
md127 : active (auto-read-only) raid1 sdb2[1](F) sda2[0]
      523264 blocks super 1.2 [2/2] [UU]
unused devices: <none>

The faulty device is marked with (F).

Remove the failed disk using the mdadm --manage command:

mdadm --manage /dev/md0 --remove /dev/sdb2

Contact the technical support to replace the failed disk with a working one.

Important

If the command fails due to the device being busy, ensure the disk is unmounted and re-check the status.

Adding the replacement disk to the RAIDLink to this anchor

Once the failed disk is replaced, copy the partition table of the source disk to the new disk:

sfdisk -d /dev/sda | sfdisk /dev/sdb

Important

The sfdisk command above replaces the entire partition table on the new disk with the one of the source disk. Modify the command if you require preserving other partition information on the disk.

Create a mirror of the source disk using the mdadm command:

mdadm --manage /dev/md0 --add /dev/sdb2

Verify the status of the configuration:

mdadm --detail /dev/md0

Tip

Use the following command to show the progress of the recovery of the mirror disk:

cat /proc/mdstat

Post-replacement checksLink to this anchor

Check for consistency in the RAID setup:

cat /proc/mdstat

Monitor the RAID health and status regularly:

mdadm --detail /dev/md0

Set up email alerts to detect RAID issues early:

mdadm --monitor --scan --daemonise --mail=root@example.com

Tip

Regularly monitoring RAID health prevents unexpected failures and data loss.

Was this page helpful?