Windows and the current state of S.M.A.R.T. tooling that understands NVMe
Posted by jpluimers on 2021/09/16
I had trouble with two Intel 600p NVMe SSD devices: read-errors.
It appeared only few tools understand how to get S.M.A.R.T. health information from them, and even then they did not explain the read errors.
I’m going to RMA them, but in case anyone else needs to get health information from NVMe SSD devices, here is which tools do what:
- [Wayback] CrystalDiskInfo version CrystalDiskInfo 8.12.7 ([Wayback/Archive.is] source code) can read the health data, and immediately showed one of the SSDs as bad, but not do a full read-scan of the device.
The support for NVMe health monitoring via the Intel RST drivers was added in [Wayback/Archive.is] CrystalDiskInfo 8.1.0 Beta2 (found via [Wayback] #1223 (Allow access of NVMe drives behind Intel RST drivers) – smartmontools), and before that other NVMe monitoring code had already been there for quite some time.
- [Wayback] HD Tune version 2.55, though old as in 2008 old, can do a full disk scan and shows the read errors perfectly.
- [Wayback/Archive.is] GSmartControl version 1.1.3 – despite being a few months old – cannot read the health data as it uses a far outdated smartmontools version 6.6:
smartctl 6.6 2017-11-05 r4594 [x86_64-w64-mingw32-w10-b19043] (sf-6.6-1) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
- [Wayback] smartmontools version 7.2 (via command-line tool
smartctl
) supports reading health data on Windows 10 via the Microsoft NVMe drivers (see [Wayback] NVMe_Support – smartmontools and [Wayback] Changeset 4348 – smartmontools); download via [Wayback/Archive.is] S.M.A.R.T. Monitoring Tools – Browse Files at SourceForge.net. I used this command for it:
smartctl.exe --xall /dev/sdc smartctl 7.2 2020-12-30 r5155 [x86_64-w64-mingw32-w10-b19043] (sf-7.2-1) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org ... Warning Comp. Temp. Threshold: 70 Celsius ... === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! - available spare has fallen below threshold - media has been placed in read only mode ...
- [Wayback] SpeedFan version 4.52 can only read temperature and reference temperature (which seems to be the [Wayback] Intel NVMe Warning Composite Temperature Threshold) likely because it is already 5 years old.
- [Wayback] Intel Memory and Storage Tool (the successor of the – now deprecated, but only Windows < 10 supported – [Wayback] Intel SSD Toolbox) only gets the health status when doing a full diagnostic scan.
The above two links are to StorageReview as links to the Intel site cannot archived in the Wayback machine nor Acrhive.is due to “you need to sign on“. These are the unarchived Intel links: Intel® Memory and Storage Tool (GUI) (download page) and Intel® SSD Toolbox (download page).
So basically, CrystalDiskInfo and HD Tune are my first line of checking for drive issues, followed by smartmontools
to get text output, then by vendor specific tools to assist with the RMA.
In the past, I used another smartmontools
wrapper, but it was discontinued and had an even older version than GSmartControl: Source: Closed: HDD Guardian – Home.
On Intel 600p becoming locked in read-only mode after failure:
- [Archive.is] Intel Clarifies 600p SSD Endurance Limitations, But TBW Ratings Can Be Misleading (Updated) | Tom’s Hardware
- [Archive.is] Intel Quietly Increases The 600p SSD Series Endurance Ratings | Tom’s Hardware
Start of Intel RMA procedure via [Wayback] Warranty Information.
My case looks remarkably similar to [Wayback] Full Diagnostic Scan always fails during Read Scan on my SSD 600p Series 256GB – Intel Community.
A few screenshots of the tools I used for health information:
–jeroen
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
smartctl 7.2 2020-12-30 r5155 [x86_64-w64-mingw32-w10-b19043] (sf-7.2-1) | |
Copyright (C) 2002-20, Bruce Allen, Christian Franke, http://www.smartmontools.org | |
=== START OF INFORMATION SECTION === | |
Model Number: INTEL SSDPEKKW010T7 | |
Serial Number: BTPY7425047S1P0H | |
Firmware Version: PSF121C | |
PCI Vendor/Subsystem ID: 0x8086 | |
IEEE OUI Identifier: 0x5cd2e4 | |
Controller ID: 1 | |
NVMe Version: 1.2 | |
Number of Namespaces: 1 | |
Namespace 1 Size/Capacity: 1.024.209.543.168 [1,02 TB] | |
Namespace 1 Formatted LBA Size: 512 | |
Local Time is: Thu Sep 09 23:05:57 2021 WEDT | |
Firmware Updates (0x12): 1 Slot, no Reset required | |
Optional Admin Commands (0x0006): Format Frmw_DL | |
Optional NVM Commands (0x001e): Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat | |
Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg | |
Maximum Data Transfer Size: 32 Pages | |
Warning Comp. Temp. Threshold: 70 Celsius | |
Critical Comp. Temp. Threshold: 80 Celsius | |
Supported Power States | |
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat | |
0 + 9.00W – – 0 0 0 0 5 5 | |
1 + 4.60W – – 1 1 1 1 30 30 | |
2 + 3.80W – – 2 2 2 2 30 30 | |
3 – 0.0700W – – 3 3 3 3 10000 300 | |
4 – 0.0050W – – 4 4 4 4 2000 10000 | |
Supported LBA Sizes (NSID 0x1) | |
Id Fmt Data Metadt Rel_Perf | |
0 + 512 0 0 | |
=== START OF SMART DATA SECTION === | |
SMART overall-health self-assessment test result: PASSED | |
SMART/Health Information (NVMe Log 0x02) | |
Critical Warning: 0x00 | |
Temperature: 27 Celsius | |
Available Spare: 89% | |
Available Spare Threshold: 10% | |
Percentage Used: 4% | |
Data Units Read: 87.913.671 [45,0 TB] | |
Data Units Written: 93.586.190 [47,9 TB] | |
Host Read Commands: 2.442.911.031 | |
Host Write Commands: 2.603.238.360 | |
Controller Busy Time: 33.761 | |
Power Cycles: 20 | |
Power On Hours: 27.452 | |
Unsafe Shutdowns: 20 | |
Media and Data Integrity Errors: 76 | |
Error Information Log Entries: 76 | |
Warning Comp. Temperature Time: 0 | |
Critical Comp. Temperature Time: 0 | |
Error Information (NVMe Log 0x01, 16 of 64 entries) | |
Num ErrCount SQId CmdId Status PELoc LBA NSID VS | |
0 76 7 0x00c8 0x0281 – 98033472 1 – | |
1 75 7 0x006d 0x0281 – 98033378 1 – | |
2 74 7 0x0053 0x0281 – 98033408 1 – | |
3 73 2 0x00a9 0x0281 – 184697906 1 – | |
4 72 8 0x00f7 0x0281 – 98033454 1 – | |
5 71 1 0x00c1 0x0281 – 184697856 1 – | |
6 70 1 0x0012 0x0281 – 98099712 1 – | |
7 69 1 0x008a 0x0281 – 98033408 1 – | |
8 68 1 0x0073 0x0281 – 184697856 1 – | |
9 67 1 0x00be 0x0281 – 98099712 1 – | |
10 66 1 0x0066 0x0281 – 98033408 1 – | |
11 65 1 0x0082 0x0281 – 184697856 1 – | |
12 64 1 0x004d 0x0281 – 98099712 1 – | |
13 63 1 0x0031 0x0281 – 98033408 1 – | |
14 62 1 0x003d 0x0281 – 184697856 1 – | |
15 61 1 0x00d7 0x0281 – 98099712 1 – | |
… (48 entries not read) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
smartctl 7.2 2020-12-30 r5155 [x86_64-w64-mingw32-w10-b19043] (sf-7.2-1) | |
Copyright (C) 2002-20, Bruce Allen, Christian Franke, http://www.smartmontools.org | |
=== START OF INFORMATION SECTION === | |
Model Number: INTEL SSDPEKKW010T7 | |
Serial Number: BTPY750500091P0H | |
Firmware Version: PSF121C | |
PCI Vendor/Subsystem ID: 0x8086 | |
IEEE OUI Identifier: 0x5cd2e4 | |
Controller ID: 1 | |
NVMe Version: 1.2 | |
Number of Namespaces: 1 | |
Namespace 1 Size/Capacity: 1.024.209.543.168 [1,02 TB] | |
Namespace 1 Formatted LBA Size: 512 | |
Local Time is: Fri Sep 10 12:45:25 2021 WEDT | |
Firmware Updates (0x12): 1 Slot, no Reset required | |
Optional Admin Commands (0x0006): Format Frmw_DL | |
Optional NVM Commands (0x001e): Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat | |
Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg | |
Maximum Data Transfer Size: 32 Pages | |
Warning Comp. Temp. Threshold: 70 Celsius | |
Critical Comp. Temp. Threshold: 80 Celsius | |
Supported Power States | |
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat | |
0 + 9.00W – – 0 0 0 0 5 5 | |
1 + 4.60W – – 1 1 1 1 30 30 | |
2 + 3.80W – – 2 2 2 2 30 30 | |
3 – 0.0700W – – 3 3 3 3 10000 300 | |
4 – 0.0050W – – 4 4 4 4 2000 10000 | |
Supported LBA Sizes (NSID 0x1) | |
Id Fmt Data Metadt Rel_Perf | |
0 + 512 0 0 | |
=== START OF SMART DATA SECTION === | |
SMART overall-health self-assessment test result: FAILED! | |
– available spare has fallen below threshold | |
– media has been placed in read only mode | |
SMART/Health Information (NVMe Log 0x02) | |
Critical Warning: 0x09 | |
Temperature: 34 Celsius | |
Available Spare: 0% | |
Available Spare Threshold: 10% | |
Percentage Used: 0% | |
Data Units Read: 9.254.093 [4,73 TB] | |
Data Units Written: 11.527.518 [5,90 TB] | |
Host Read Commands: 140.969.433 | |
Host Write Commands: 114.176.387 | |
Controller Busy Time: 2.393 | |
Power Cycles: 22 | |
Power On Hours: 27.149 | |
Unsafe Shutdowns: 19 | |
Media and Data Integrity Errors: 231 | |
Error Information Log Entries: 231 | |
Warning Comp. Temperature Time: 10 | |
Critical Comp. Temperature Time: 0 | |
Error Information (NVMe Log 0x01, 16 of 64 entries) | |
Num ErrCount SQId CmdId Status PELoc LBA NSID VS | |
0 231 2 0x00eb 0x0281 – 1805240240 1 – | |
1 230 4 0x0056 0x0281 – 1800434846 1 – | |
2 229 8 0x00c9 0x0281 – 1796255884 1 – | |
3 228 2 0x002a 0x0281 – 1791635165 1 – | |
4 227 6 0x00e5 0x0281 – 1790971578 1 – | |
5 226 2 0x0007 0x0281 – 1786802123 1 – | |
6 225 4 0x00cc 0x0281 – 1786313640 1 – | |
7 224 2 0x00e3 0x0281 – 1781969081 1 – | |
8 223 6 0x0013 0x0281 – 1781535958 1 – | |
9 222 2 0x0001 0x0281 – 1779073197 1 – | |
10 221 4 0x00d7 0x0281 – 1777200551 1 – | |
11 220 6 0x007a 0x0281 – 1776878020 1 – | |
12 219 2 0x0078 0x0281 – 1772367509 1 – | |
13 218 2 0x005e 0x0281 – 1772238514 1 – | |
14 217 8 0x0067 0x0281 – 1762821297 1 – | |
15 216 4 0x0094 0x0281 – 1762710670 1 – | |
… (48 entries not read) | |
Leave a Reply