“Solution” on ESXi 6.7 `smartinfo` throwing `error Cannot open device`
Posted by jpluimers on 2022/01/05
After writing Some notes on ESXi smartinfo
throwing error Cannot open device
, I bit the bullet and experimented with disabling the vmw_ahci
driver (shown as ahci
in the storage adapters view), which forces the sata_ahci
driver to be used (shown as ahci
in the storage adapters view).
Poof! All problems were gone.
All? Not all: on the console and terminal, there seems to be some throttling going on, as I observed the read or write speed limit to be capped somewhere close to 60 MB/s. More on that in a future post.
Basically, the smartinfo
trouble tripped me into thinking the device was bad, where I should have understood that the latency warnings in the vmkernel.log
file indicated the vmw_ahci
driver was the culprit. Oh well: never too old to learn.
References
I’m not alone on this; these posts also discuss latency issued:
- [Wayback] Esxi 6.7 ssd i/o latency spikes – VMware Technology Network VMTN (I fixed spelling mistakes)
Seeing latency spikes in excess of 100ms on my 2 samsung 860 evo SSDs. The throughput is fine getting 500Mbps+ on them but the vmkwarning.log has “performance has deteriorated. I/O latency increased from average value” warnings. This is a home server that’s why I’m using regular samsung SSDs.
…
I’ve seen similar issues (high latency spikes of 200ms ) with even the latest 6.7U3 as of September 2020. This in combination with Samsung EVO 860 SSD.
Which driver is your sata-port using?
Have you tried disabling the vmware ahci driver? (below command and a reboot will make ESXi use the ahci driver instead of the vmw_ahci driver).
you can check this under storage adapters through the host ui. When using the ahci driver you should be seeing ALL your sata ports. (the vmw_ahci only showed me 1 or 2). - [Wayback] ESXI 6.7: Strange storage issues | ServeTheHome Forums
This is more or less want I am trying right now. I found the info on : ESXi 6.5 vmw_ahci SSD extreme high latency and vm freeze issues – vDrone.
My server is running with this config since 12 hours now. Too soon to make any conclusion but at least your info makes me believe that I might be in the right direction.
- [Wayback] ESXi 6.5 vmw_ahci SSD extreme high latency and vm freeze issues – vDrone
Since the upgrade to ESXi 6.5 on my homelab i got bugged with a strange issue.
All my SATA SSD got really high latency and my VM’s randomly froze when i generated a big file copy actions. Latencies in the 5000/15000ms! Yes IEW! - [Wayback] Fix VMWare ESXi 6.* slow disk performance on HP b120i controller | Johan Draaisma
- [Wayback] HomeLab – SuperMicro 5028D-TNT4 Storage Driver Performance Issues and Fix – VIRTUALIZATION IS LIFE!
This is different from the transfer speed issues that were part of ESXi 4, though I have a gut feeling there is some correlation:
- [Wayback] Esx4 very slow throughtput in service console or V… – VMware Technology Network VMTN
- [Wayback] Slow disk read speed on ESX4 | vNotion (the web-site is gone, but this article was archived)
On AHCI: Advanced Host Controller Interface – Wikipedia
On native drivers versus legacy drivers:
- [Wayback] Troubleshooting native drivers in ESXi 5.5 or later
A new native driver model feature is introduced in VMware ESXi 5.5 that replaces an older model that employs a Linux compatibility layer.
In this article:
- Drivers using the new model are referred to as native drivers
- Drivers using the old model are referred to as legacy VMKLinux drivers
This article provides troubleshooting information for the native driver model.
The following inbox Native Drivers are included in default installation of ESXi 5.5: – See more at: http://www.virtuallyghetto.com/2013/11/esxi-55-introduces-new-native-device.html - [Wayback] ESXi 5.5 introduces a new Native Device Driver Architecture Part 2
…
A new concept of driver priority loading is introduced with the Native Device Driver model and the diagram below provides the current ordering of how device drivers are loaded.
As you can see OEM drivers will have the highest priority and by default Native Drivers will be loaded before “legacy” vmklinux drivers. On a clean installation of ESXi 5.5 you should see at least two of these directories: /etc/vmware/default.map.d/ and /etc/vmware/driver.map.d/ which contains driver map files pertaining to Native Device and “legacy” vmklinux drivers.
…
Steps
This is how you disable the native vmware_ahci
driver:
- Suspend all virtual machines
- Bring the ESXi box into maintenance mode, for instance by running
esxcli system maintenanceMode set --enable true
- On the console or terminal, run this:
# esxcli system module set --enabled=false --module=vmw_ahci # reboot
- After booting, get the ESXi box out of maintenance mode, for instance by running
esxcli system maintenanceMode set --enable false
- Power on all suspended virtual machines
To re-enable the native vmware_ahci
driver (in case not all your SATA devices are recognised):
- Suspend all virtual machines
- Bring the ESXi box into maintenance mode, for instance by running
esxcli system maintenanceMode set --enable true
- On the console or terminal, run this:
# esxcli system module set --enabled=true --module=vmw_ahci # reboot
- After booting, get the ESXi box out of maintenance mode, for instance by running
esxcli system maintenanceMode set --enable false
- Power on all suspended virtual machines
Suspending and unsuspending all virtual machines can be done on the console/terminal, see Source: VMware ESXi console: viewing all VMs, suspending and waking them up: part 5 how.
Results: storage adapters
With native vmware_ahci
, the storage adapters are these:
Storage adapters with
vmw_ahci
enabled
With legacy ahci
, the storage adapters are these:
Storage adapters with
vmw_ahci
disabled
What you see is that the Patsburg 6 Port SATA AHCI Controller
has been split from a single vmw_ahci
adapter in six separate ahci
adapters.
Results: read/write speeds and latency when suspending/unsuspending all virtual machines
With vmware_ahci
, the storage adapters are these:
Latency issues after resume/suspend cycle of all virtual machines: the latency stays up in the 15 millisecond region
With ahci
, the storage adapters are these:
Latency after resume/suspend cycle of all virtual machines: the latency goes down to the 2 millisecond region
For both cases, the read and write rates are roughly the same (and OK for an EVO 860 SATA device). The latency drops a lot (and with prolonged vmw_ahci
use goes up to like 15+ seconds, that is 15000+ milliseconds):
[Wayback] SSD 860 EVO 2.5″ SATA III 500GB Memory & Storage – MZ-76E500B/AM | Samsung US
Speeds are consistent, even under heavy workloads and multi-tasking allowing for faster file transfer. The 860 EVO performs at sequential read speeds up to 550 MB/s* with Intelligent TurboWrite technology, and sequential write speeds up to 520 MB/s. The TurboWrite buffer size* is upgraded from 12 GB to 78 GB.
* Performance may vary based on SSD’s firmware version and system hardware & configuration. Sequential write performance measurements are based on Intelligent TurboWrite technology. Sequential performance measurements based on CrystalDiskMark v.5.0.2 and I0meter 1.1.0. The sequential write performances are Intelligent TurboWrite region are 300 MB/s for 250/500 GB and 500 MB/s for 1 TB.
* Test system configuration: Intel Core i5-3550 CPU @ 3.3 GHz, 4 GB, OS – Windows 7 Ultimate x64, Chipset: ASUS P8H77-V .
* The TurboWrite buffer size varies based on the capacity of the SSD; 12 GB model, 22 GB for 500 GB model, 42 GB for 1 TB model and 2/4 TB for 78 GB. For more information on the TurboWrite, please visit http://www.samsungssd.com.
–jeroen
Leave a Reply