Prior to performing the update to vCenter Server 4.1, HA was configured and working on the ESXi hosts. Post-upgrade, you experience these symptoms:
- The ESXi host reconfigures successfully for HA, but immediately displays an error
- In the Summary tab, you see the error:
HA agent on in cluster has an error: error while running health check script
- The /var/log/vmware/vpx/vpxa.log contains messages similar to:
cmd=monitornodes -domain=vmware failed with error 3
The issue occurs if the HA agents on the ESXi hosts are not upgraded properly. For the hosts already experiencing the problem, the agents must be replaced with a correct one.
This issue is resolved in vSphere 4.1 Update 1 and vSphere 4.0 Update 3. For information on updating your vCenter Server and ESXi host to vSphere 4.1 Update 1, see Upgrading vCenter Server, Update Manager and ESX/ESXi to vSphere 4.1 Update 1 (1034497).
To workaround this issue, replace the agents using one of these options:
- Re-install HA agents via vCenter Server:
- Put the affected host into maintenance mode.
- Remove the host from the vCenter Server inventory.
- Without rebooting the host, add the host back into a HA cluster within vCenter Server.
- Exit maintenance mode.
- Re-install HA agents manually on the host:
- Within the vSphere client connected to vCenter, right click on the host and choose disconnect
- Log into the ESXi host using Tech Support Mode. For more information, see Tech Support Mode for Emergency Support (1003677).
- Run these commands to uninstall the vCenter and HA agents from the ESXi host:
- Without rebooting the host, re-connect it back into the HA Cluster within vCenter Server.
To avoid this issue, after the upgrade process and before re-connecting the hosts back into vCenter Server:
Ensure that HA is enabled on the vCenter Server cluster.
Reconnect the host to vCenter Server.
Exit maintenance mode.
Based on VMware KB 1027628