The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,465 other followers

NVMe and SATA health data on ESXi: some links to investigate

Posted by jpluimers on 2021/08/25

Somehow, health data of my NVMe and SATA drives do not show up as health information on the web-ui of my ESXi playground rig.

So far, I noticed that ESXi runs a smartd, but does not ship with a smartctl, nor health data ends up in the web user interface. So you cannot see the state of NVMe and SATA devices easily.

Still these devices deteriorate over time and afterwards die, so below are some links to investigate later.

Goal is to use my own thresholds to set warning and error levels.

Some log entries:

syslog.log:2021-04-16T18:28:26Z jumpstart[65941]: UnresolvedVmfsVolume: deviceName=eui.0000000001000000e4d25c0e8dc74e01:1,lvmName=5ad4aeea-630efcbc-c307-0cc47aaa9742,label=IntelNVMe1TB-BTPY7425047S1P0H(VMFS),fsUuid=5ad4aeea-6954841c-470e-0cc47aaa9742
syslog.log:2021-04-16T18:30:57Z smartd: [warn] eui.0000000001000000e4d25c0e8dc74e01: REALLOCATED SECTOR CT below threshold (7 < 90)
syslog.log:2021-04-16T18:53:25Z jumpstart[65944]: UnresolvedVmfsVolume: deviceName=naa.600605b00aa054a0ff0000210221eaf8:1,lvmName=552f5788-ee485725-ce41-001f29022aed,label=850EVO1TBR1B(VMFS),fsUuid=552f5788-33e30274-8dba-001f29022aed
vmkernel.log:2021-04-17T16:58:58.665Z cpu8:66219)ScsiDeviceIO: 3001: Cmd(0x4395014c7140) 0x1a, CmdSN 0xf60 from world 67512 to dev "naa.600605b00aa054a0ff0000210221eaf8" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
vmkernel.log:2021-04-17T17:29:02.656Z cpu0:67578)ScsiDeviceIO: 3001: Cmd(0x4395015c34c0) 0x85, CmdSN 0xfbb from world 67512 to dev "naa.600605b00aa054a0ff0000210221eaf8" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
vmkernel.log:2021-04-17T17:59:06.658Z cpu0:68128)ScsiDeviceIO: 3001: Cmd(0x43950d7af780) 0x4d, CmdSN 0x1011 from world 67512 to dev "naa.600605b00aa054a0ff0000210221eaf8" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

Some links

Smartmontools

Google searches

Other ways of getting SMART data

  • [Wayback/Archive.is] ESXi S.M.A.R.T. health monitoring for hard drives (2040405) and [Wayback] VMware ESXi S.M.A.R.T Health Monitoring | ESX Virtualization which talk about the smartinfo.sh script which by now is a binary /usr/lib/vmware/vm-support/bin/smartinfo which shows similar results. Note the Power-on Hours are unreliable: for most drives they are non-persistent and are actually Power-on Hours since last reboot.
    • There is a ton more goodies in the /usr/lib/vmware/vm-support/bin directory which I want to look into:
      altlocaltgz.sh
      cat-newest-vmkernel-core.sh
      censor-shell-log.sh
      debug-hung-vm
      dump-upit-info.py
      dump-vmdk-rdm-info.sh
      dump-vmfs-traces.sh
      dump-vvol-traces.sh
      dvsData.sh
      encryption-epilog.sh
      encryption-prolog.sh
      extract_hp_docs.py
      hostd.sh
      localtgz.sh
      monitorCoreDump.sh
      nicinfo.sh
      nvmeinfo.sh
      partedUtil.sh
      rdmainfo.sh
      smartinfo
      storageHostProfiles.sh
      swfw.sh
      vFlash.sh
      vsanIscsiTarget.sh
      vsanIscsiTargetVitConf.sh
      vsanIscsiTargetVitStatus.py
      zdumps.sh

jeroen

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

 
%d bloggers like this: