noquest banner



Hard disk testing

S.M.A.R.T.


This is a clever acronym for a useful technology.
We may even not know, but there are many sensors inside a hard disk that we can check and monitor for the drive health.

What does S.M.A.R.T. stands for?
Self-Monitoring, Analysis and Reporting Technology.

really damaged hard disk
S.M.A.R.T. won't help in this case.

History

1992 - IBM introduces an option on AS/400 SCSI-2 hard disk drives, it is just a single binary bit (0 means it is working, 1 means is going to fail).

1995 - Compaq, Seagate, Quantum and Conner create Intellisafe, which measures different health parameters on a hard disk drive.

The result is a new standard: S.M.A.R.T.


Standard?

Every company applies the standard in its own way.
Some add their own attributes.
Depending on the company, the values are stored in different measure units.
For example, the values could be stored in minutes or in hours, luckily the software that checks S.M.A.R.T. parameters knows how to distinguish them.

S.M.A.R.T. option in the computer BIOS

In order to access the BIOS, when booting the computer press DEL or F2.
Somewhere under Advanced Options, it depends on the motherboard manufacturer, we will see an option:
HDD S.M.A.R.T. Capability [Enabled or Disabled]

smart bios option
S.M.A.R.T. BIOS option.

The internal disk sensors are always working,. If we choose Enabled and the motherboard detects that some values are wrong, a warning message will be displayed during the boot process saying that the hard drive is going to fail.

SMART hard disk imminent
failure warning
Warning message when disk is going to fail.

We do recommend to enable HDD S.M.A.R.T. Capability option as well as the option High Temperature Warning on the CPU.
Needless to say is that it doesn't matter if HDD S.M.A.R.T. Capability option is enabled or disabled in order to use the test and monitoring software.






Reading the S.M.A.R.T. values

smartmontools

This is the software we are going to use to read the attributes and its values.
It is open source.
The first version: October 2002.
Home page: http://smartmontools.sourceforge.net/

Windows version:
Get it from the download page: http://sourceforge.net/projects/smartmontools/files/smartmontools/
Get the latest release smartmontools-X.XX-X.win32-setup.exe
Executable files are: smartctl.exe and smartd.exe

Linux version:
The GNU/Linux package is: smartmontools, available for any distribution.

To install in Fedora 14:

yum install smartmontools

To run the main utility:

smartctl

smartd is a service that monitors the hard disk parameter values.
smartctl is used to see right now the parameters and its values, also to run a test.

Supported hard disk types

It works on IDE, SCSI, SATA, PATA, connected to the mainboard with its corresponding cable.
A developer's version is working on RAID, USB adapters and NAS (GNU/Linux version has more supported devices). For testing **/with these devices, we will use -d parameter.
The updated support list is:
http://sourceforge.net/apps/trac/smartmontools/wiki/TocSupport#SupportedDevices


Sample output

smartctl.exe -a /dev/sda

=== START OF INFORMATION SECTION ===
Model Family:     Maxtor DiamondMax Plus 8 family
Device Model:     Maxtor 6E040L0
Serial Number:    E16ZH97N
Firmware Version: NAR61JA0
User Capacity:    41.110.142.976 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Wed Jul 15 10:02:12 2011
SMART support is: Available - device has SMART capability.
                  Enabled status cached by OS, trying SMART RETURN STATUS cmd.
SMART support is: Enabled

In this information section, we see the disk manufacturer.
The Device Model is a good value to search for more information on the World Wide Web. If we google this ID, we may get the manual and some pictures.
The Serial Number of the drive is an information usually printed on the disk, so this tool will help us to physically identify the disk when we are working with some of them.
The User Capacity is not real, that's because of the different cluster size used by operating systems: the MBR, the FAT, and so on.
ATA Version and ATA Standard parameters indicate values related to the disk speed capability.
Local Time indicates the computer time.
SMART support is available mostly 90% of times, and not available on really old drives.



=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (1021) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  17) minutes.

SMART Attributes Data Structure revision number: 16

This section is not so interesting, it shows if the self-test is currently running and the time needed to complete it, among other settings.


Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   220   220   063    Pre-fail  Always       -       9385
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       719
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       0
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   249   239   187    Pre-fail  Always       -       34129
  9 Power_On_Minutes        0x0032   233   233   000    Old_age   Always       -       573h+26m
 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   252   252   000    Old_age   Always       -       739
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       715
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       1501
194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -       35
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       10943
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       0
202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       0
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -       0
208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0
209 Offline_Seek_Performnce 0x0024   188   184   000    Old_age   Offline      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0


This is the most interesting part, we can see all the attributes and its values.
The most important values are:

  • 5 - Reallocated_sector_Ct, counts for bad sectors whose data have been moved to a special reserved area.
  • 9 - Power_on_Minutes or Power_on_Hours (POH), it is the internal aging counter that almost all disks have.
  • 10 - Spin_Retry_Count, count the spin start attempts to reach the operational speed when the first attempt was unsuccessful.
  • 194 - Temperature_Celsius, the current hard disk temperature.
  • 196 - Reallocated_Event_Count, counts the remapping operations when a bad sector is found, successful and unsuccessful attempts.
  • 197 - Current_Pending_Sector, counts the bad sectors pending to be moved.
  • 198 - Offline_Uncorrectable, counts the bad sectors that cannot be moved, really damaged sectors.

VALUE indicates the current value of the attribute.
WORST is the worst value in its lifetime.
THRESH is the threshold limit of the attribute, range from 0 to 255. If VALUE is less than or equal to the threshold, then the attribute has failed, which means that data on this disk is in danger.
RAW_VALUE is the value as it is written on the disk, each vendor have its own algorithm to convert RAW_VALUE to VALUE.



SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



Last section is a resume of the self-test.

To perform a self test right now, you can use the computer meanwhile:   

The fast one, maybe two minutes:

smartctl -t short /dev/sda

The slowest sector by sector check:

smartctl -t long /dev/sda

Wait two or three minutes for the short test and type:

smartctl -a /dev/sda

To see the results, you have to wait far more time when doing the long test.

Recomendations:


Power On Hours, POH, is a very important attribute (number 9). Vendors usually say that more than 5 years is enough time for a consumer hard disk.
In five years, using eight hours every day the computer gives 14600 POH, that's a key attribute to check.
Temperature (number 194) is the live temperature in Celsius. More than 60 ºC (140 Fahrenheit) is dangerous in most cases. Keep the hard disk cool always.


Other related links about hard disks


Here you can hear sounds of damaged hard disks:
http://datacent.com/hard_drive_sounds.php

Graphical user interface for smartmontools:
http://gsmartcontrol.berlios.de


Freeware Windows software HD Tune:
http://www.hdtune.com/

Seagate Tools for Hard Disk Diagnostic:
http://www.seagate.com/www/en-us/support/downloads/seatools

For a complete hard disk erase DBAN (Darik's Boot And Nuke)
http://www.dban.org/



September 2011.

Use main page comments for questions.


Tweet
Copyright NoQuest.com Contact NoQuest.com