The identifier sda
refers to your hard disk.
Diagnosis of a failing disk
Smartmontools
is a set of tools that controls and monitors a disk using the SMART standard (Self-Monitoring, Analysis, and Reporting Technology System).
It consists of two parts:
smartd
, a daemon that allows you to periodically check your hard drivessmartctl
, a command line tool to view the status of the hard disk
The tool supports the vast majority of modern hard drives.
Before you start
To complete the actions presented below, you must have:
- A Dedibox account logged into the console
- A Dedibox dedicated server
How to check a single-disk server
- Log into your server using SSH.
- Run the following command from the root account (or precede it with
sudo
):smartctl -a /dev/sdaNote
How to check a Dell multi-disk server
Dell PERC H200 controller
On these servers, the physical disks are referred to as sg*
devices.
-
Log into your server using SSH.
-
Run the following commands:
smartctl -a -T permissive /dev/sg0smartctl -a -T permissive /dev/sg1smartctl -a -T permissive /dev/sg2TipAs the devices can be positioned a little further away, do not hesitate to test up to
sg5
if you do not have conclusive results.
Dell PERC H310 controller
Two possibilities exist for this type of controller:
megaclisas-status
andmegacli
The first one displays the status of the RAID volume, whilst the second one displays the SMART status of the disks.
- Log into your server using SSH.
- Update the APT package lists cache, and install the required packages:
apt updateapt install megaclisas-status megacli
- Run the following command to display the status of the RAID volume:
megaclisas-status
- Run the following little script to retrieve the SMART values of your disks:
DEVICE=/dev/sdafor i in $(megacli -pdlist -a0 | grep Id | cut -d":" -f2); doecho "============================== $i =============================="smartctl -s on -a -d megaraid,${i} ${DEVICE} -T permissivedone
How to check an HP multi-disk server
- Log into your server using SSH.
- Run the following command to display the status of the RAID:
An output similar to the following example displays:ssacli ctrl all show configSmart Array P410 in Slot 1 (sn: PACCRID111003N3)array A (SATA, Unused Space: 0 MB)logicaldrive 1 (1.8 TB, RAID 1, OK)physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 2 TB, OK)physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 2 TB, OK)
- Run the following command to display the SMART values of the disks:
then run:smartctl -a -d cciss,0 /dev/sg0smartctl -a -d cciss,1 /dev/sg0
How to use SMARTD to monitor your disks
smartd
allows you to monitor your disks and to be alerted (depending on the configuration) by email in case of failure.
There is no guarantee as to the result of SMARTD and the time remaining before the disk fails completely. However, we strongly encourage you to back up and request a replacement rapidly.
How to configure SMARTD
Below, you find an example of a single-disk server installed on a Debian-like machine.
The following commands are to be executed as root
or via sudo
.
- Log into your server using SSH.
- Enable basic SMART options:
smartctl -s on -o on -S on /dev/sda
- Check that the disk is healthy:
smartctl -H /dev/sda
- Edit the file
/etc/smartd.conf
, to set up automated tests:-
Start by commenting out the following line:
DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner -
Then add a line similar to the following example:
/dev/sda -a -d sat -o on -S on -s (S/../.././01|L/../1/03) -m root -M exec /usr/share/smartmontools/smartd-runner
-
The example above allows you to test your hard disk as follows:
- A short test (S) every day at 1am (01)
- A long test (L), every Monday (1) at 3am (03)
- Activate the daemon by uncommenting the line
start_smartd=yes
in the file/etc/default/smartmontools
. - Start the daemon by running the following command:
service smartmontools start
If a problem is detected, it will send a default mail to root (-m root).
You can redirect the mails sent to the root
user to your personal mailbox or send this mail directly to another address.
How to run tests manually
To run SMART tests manually, use the following commands:
smartctl -t short /dev/sda
to run a short test on your disksmartctl -t long /dev/sda
to run a long test on your disk
Once the tests are completed, you can check the results with the following command:
smartctl -l selftest /dev/sda
How to report disk failures
If you notice any errors when running a SMART diagnosis on your disk, open a support ticket and ask for the disk to be replaced, indicating the serial number with the result of the smartctl
command:
=== START OF INFORMATION SECTION ===Device Model: SAMSUNG HD103UJSerial Number: S13PJ1KQ513170 <----------------------- Serial NumberFirmware Version: 1AA01113User Capacity: 1 000 204 886 016 bytesDevice is: In smartctl database [for details use: -P show]ATA Version is: 8ATA Standard is: ATA-8-ACS revision 3bLocal Time is: Fri Oct 29 11:20:27 2010 CEST
For more information on Smartmontools, refer to the official documentation.