ESX 3.5 U3 and ESXi U3 are the first releases of ESX 3.x to provide official VMware support for the internal SAS networked storage that exists within the Intel® Modular Server. VMware has identified an interoperability issue between ESX 3.5 U3 and the dual SAS Storage Controller Modules (SCMs) that provide access to, and management of virtual disk storage, together with RAID functionality for the underlying SAS Hard Disk Drives (HDDs) that plug into the Intel® Modular Server chassis backplane. This interoperability issue precludes the ability of both ESX and the SCMs to provide failover between the SCMs in the event of an internal SAS link, SAS controller port, or SAS expander failure. Figure 1 below provides an illustration of the internal organization of these SAS components.
Upon the occurrence of any of the above SAS link or port related failures, the remaining SCM expects the host OS (ESX) to initiate explicit fail-over of the virtual disks the SCM presents from the underlying HDDs belonging to the now inaccessible SCM. ESX expects the remaining SCM to automatically execute implicit fail-over of these HDDs. Consequently, the HDDs belonging to the SCM that has experienced the SAS link or port related failure remain with that SCM, and thus their corresponding virtual disks remain inaccessible to ESX.
VMware plans to resolve this interoperability issue in a future ESX release by providing industry standard SCSI Asymmetric Logical Unit Access (ALUA) based explicit failover capability for external storage arrays and internal storage controllers such as the Intel® Modular Server SCM that support and comply with the SCSI ALUA industry standard specified in INCITS T10 SPC-3 or later.
It is important to note that failover between SCMs due to the failure of either SCM is not affected by this interoperability issue between ESX and the SCMs. In the event of an SCM failure, the other SCM will automatically take control of all the HDDs formerly belonging to the failing SCM. ESX will then automatically detect that access to the virtual disks formerly presented by the failing SCM from the “taken” underlying HDDs is now being provided by the remaining SCM, and begin to direct all SCSI I/O requests to this SCM.
Either one of the two procedures listed below can be used to manually resolve this issue by changing ownership of any virtual disks that have become inaccessible due to a SAS link, SAS controller port, or SAS expander failure over to the SCM that is not physically attached to the SAS path that has experienced this failure.
Use Intel Modular Server user interface to select the virtual disks that are inaccessible due to the SAS path failure, and change virtual disk ownership to the SCM that is not attached to the failing SAS path. Wait until the Intel Modular Server user interface shows the operation has completed.
Physically, remove SCM that is attached to the failing SAS path. This will trigger a failover of all virtual disks currently owned by the SCM that is physically attached to the failing SAS path over to the SCM that is not attached to the failing SAS path. Wait until the Intel Modular Server user interface shows the operation has completed.
Based on VMware KB 1007394