Understanding IRQ Sharing in ESX Server
Note: This article is intended to assist you to identify possible IRQ sharing issues, and potential solutions. For detailed information on IRQ and IRQ specifications, please contact the system vendor.
To enable PCI devices to interrupt the CPU, all PCI devices on the PCI bus are assigned an IRQ number. The VMkernel uses discovery and interrupt rerouting mechanisms provided by the BIOS to assign these IRQ numbers. In certain cases due to hardware design, however, two or more devices might be tied to the same interrupt controller pin. As a result, two or more devices end up sharing the same IRQ. Under normal circumstances, there is no performance impact and IRQ sharing goes unnoticed.
In ESX Server 3.5, devices can be owned either by the service console or by VMkernel. Interrupt processing overhead for devices owned by the service console is much higher than for devices owned by VMkernel because of an extra context switch needed for interrupt processing. Because of this, devices with high interrupt rates should not be assigned to the service console.
There is little effect on performance when IRQ sharing occurs between two low-interrupt rate devices and when both are owned by the service console or when both are owned by VMkernel. However, in cases when one of the IRQ sharing devices is owned by the service console and the other is owned by VMkernel, there can be significant performance impact. The impact is more severe (and easily observed) when the interrupt rates on either of the devices are high.
The performance impact is due to two reasons:
- Interrupt lines shared between the service console and VMkernel result in higher overheads due to extra context switches. When a shared interrupt is issued, the VMkernel has no direct way of determining which device caused the interrupt. The CPU then runs all interrupt service routines sequentially for all devices using that interrupt until it finds the device that caused the interrupt. When the shared devices are owned by the VMkernel, running this chain of interrupt routines does not take much time. However, in the case when IRQs are shared among VMkernel and the service console, executing the sequence of interrupt routines results in context switches on each interrupt. This has a significant performance impact.
- IRQ sharing limits interrupt processing to a single CPU. ESX Server was designed to make full use of the available hardware resources for optimal performance. Under normal conditions, interrupt processing is fanned out to different cores on the system, selecting the core that is least utilized. However, when the IRQ for a device in VMkernel is shared with that of a device owned by the service console, this interrupt processing gets limited to CPU #0. In devices with high interrupt rates, CPU #0 becomes a processing bottleneck. This problem is further aggravated by the fact that the service console also runs on CPU #0, even though its resource consumption is minimal.
Determining if IRQ Sharing Issues Affect Your System's Performance
The tell-tale sign of IRQ sharing between VMkernel and the service console is a high number of interrupts being serviced by PCPU0 (CPU #0 on the physical host) while the other CPUs are relatively lightly loaded. The high interrupt rates might sometimes render the service console unusable and cause a high variation in ESX Server performance.
The most common service console device that causes IRQ sharing is the USB controller. On the VMkernel side, the network and storage controllers are susceptible to IRQ sharing. IRQ sharing is more common in dual- and quad-port NICs and storage HBAs than in single port controllers. However, IRQ sharing is not restricted only to these devices. The effect is more visible under I/O intensive loads (with high interrupt rates). The problem might manifest itself as variation in performance or as an absolute drop in ESX Server performance.
To determine if your setup suffers from IRQ sharing, list the IRQ assignment in VMkernel by typing the following at the command-line of the service console:
> cat /proc/vmware/interrupts
This lists the interrupt usage. The output looks similar to the following example:
Vector PCPU 0 PCPU 1 PCPU 2 PCPU 3
0x21: 0 0 0 0 VMK ACPI Interrupt
0x29: 1 0 0 0 <COS irq 1 (ISA edge)>, VMK keyboard
0x31: 4 0 0 0 <COS irq 3 (ISA edge)>
0x39: 4 0 0 0 <COS irq 4 (ISA edge)>
0x41: 0 0 0 0 <COS irq 6 (ISA edge)>
0x49: 0 0 0 0 <COS irq 7 (ISA edge)>
0x51: 0 0 0 0 <COS irq 8 (ISA edge)>
0x59: 0 0 0 0 <COS irq 12 (ISA edge)>
0x61: 0 0 0 0 <COS irq 13 (ISA edge)>
0x69: 43762 0 0 0 COS irq 14 (ISA edge)
0x71: 0 0 0 0 <COS irq 15 (ISA edge)>
0x79: 7917 542 3583 8292 <COS irq 16 (PCI level)>, VMK aic79xx
0x81: 1544 0 0 0 COS irq 17 (PCI level), VMK aic79xx
0x89: 1212177 0 0 0 COS irq 19 (PCI level), VMK vmnic1
0x91: 90997 0 0 0 COS irq 18 (PCI level), VMK vmnic0
0x99: 152 0 0 0 <COS irq 20 (PCI level)>, VMK qla2300
0xdf: 8904447 10869326 11006262 10844861 VMK timer
0xe1: 74 5582 12007 16005 VMK monitor
0xe9: 60854 443843 477389 509724 VMK resched
0xec: 0 0 0 0 VMK ucodeUpdate
0xf1: 3 40 68 100 VMK tlb
0xf9: 243265 0 0 0 VMK noop
0xfc: 0 0 0 0 VMK thermal
0xfd: 0 0 0 0 VMK lint1
0xfe: 0 0 0 0 VMK error
0xff: 0 0 0 0 VMK spurious
This output lists the interrupt vectors with the number of interrupts fielded by each physical CPU ( PCPU). The last column lists the device or devices associated with the particular IRQs. Along with the device name, the owner of the device is also listed. COS refers to service console ownership, and VMK refers to VMkernel ownership. (In older versions of ESX Server, the service console was named the console operating system, or COS.)
For the sake of analysis, you can ignore devices owned by the service console that have their names enclosed in angle brackets (<>), because the service console does not load a driver for them. All other interrupt vectors that have a device with COS and VMK next to them indicate the IRQs are shared for that device. Again, devices with low interrupt rates (for example, 0x81 and 0x99) can be ignored because the performance impact due to them is minimal.
In the above example, IRQ 0x81, 0x89, 0x91, and 0x99 are shared between service-console and VMK devices. You don't need to analyze the interrupt counts for 0x81 and 0x99, because the counts are reasonably low. Also note that all interrupts for these four IRQs are being fielded by PCPU 0, while all other PCPUs show zero interrupt count. In contrast, interrupt vectors 0x89 and 0x91 (for vmnic1 and vmnic0) show high interrupt counts. The shared IRQs in the service console for vmnic1 and vmnic0 are IRQ 19 and IRQ 18 respectively.
In this example, you need to remove the device conflicting with vmnic0 and vmnic1 from the service console. To find the offending device in the service console, list the interrupt vectors:
> cat /proc/interrupts
This lists the IRQ lines assigned to the devices in the service console.
0: 1744046 vmnix-edge timer
1: 3 vmnix-edge keyboard
2: 163071 vmnix-edge VMnix interrupt
14: 44022 vmnix-edge ide0
17: 1499 vmnix-level usb-uhci, ehci-hcd
18: 91236 vmnix-level usb-uhci
19: 1303228 vmnix-level usb-uhci
You can see that both IRQ 18 and IRQ 19 are in use by usb-uhci, a USB Universal Host Controller device.
Resolving the IRQ Sharing Issue
To resolve IRQ sharing conflicts:
- Disable the problematic device, if it is not used.
- Move the device to a different PCI slot.
- Coalesce processing service console device interrupts (Only in ESX 3.5 Update 5)
Note: Please check with your hardware vendor if the USB device is used by any of your Remote Access card before disabling it.
Disabling the Device
If the conflicting device in the service console is unused (here, the USB controller), disable the device. Remove the usb-uhci module by using the command:
> rmmod usb-uhci
For your own system, replace usb-uhci with the appropriate device you determine from your particular output.
Other alternatives include:
- Preventing the service console from loading the driver for the device also resolves interrupt sharing. To prevent the service console from loading the driver for the device, remove the references to the driver from the file /etc/modules.conf.
- Disabling USB devices from the BIOS itself (for certain systems). Disabling USB controllers in the BIOS also prevents the USB drivers from loading on subsequent reboot cycles.
- On hardware known to have an interrupt sharing problem, installing ESXi Server instead of ESX Server avoids the interrupt issue.
- If the conflict can be avoided through BIOS action (e.g. disabling USB or possibly downgrading it to V1.1), then this method is preferred because the fix will persist through future upgrades.
- If you either run rmmod or modify modules.conf, you must repeat the process after an upgrade.
The output of /proc/interrupts is now:
0: 8690704 vmnix-edge timer
1: 3 vmnix-edge keyboard
2: 2238513 vmnix-edge VMnix interrupt
14: 212504 vmnix-edge ide0
17: 1715 vmnix-level ehci-hcd
Notice that COS IRQ 18 and IRQ 19 are no longer in use. Run a networking workload over vmnic0 and vmnic1 and check for proc/vmware/interrupts. The output looks like this:
Vector PCPU 0 PCPU 1 PCPU 2 PCPU 3
0x71: 0 0 0 0 <COS irq 15 (ISA edge)>
0x79: 32895 36823 68596 74985 <COS irq 16 (PCI level)>,VMK aic79xx
0x81: 1760 0 0 0 COS irq 17 (PCI level), VMK aic79xx
0x89: 1596687 26796 534484 289717 <COS irq 19 (PCI level)>, VMK vmnic1
0x91: 344252 1035 1951 768 <COS irq 18 (PCI level)>, VMK vmnic0
0x99: 616 0 0 0 <COS irq 20 (PCI level)>,VMK qla2300
0xdf: 45701368 115529363 127721751 128342600 VMK timer
Note that interrupts are now fanned across all available CPUs. COS irq18 and COS irq19 are now within angle brackets, which signifies that no module has been loaded in the service console, and there is no interrupt sharing.
Moving the Device
In certain cases, the particular service console device cannot be disabled because it is required for the correct functioning of the service console (for example, certain Ethernet controllers). When this is the case, try moving the card to a different PCI slot. Because the interrupt lines allocated to devices are determined by their physical locations in the machine, changing the slots in which the cards are inserted might cause a reassignment of IRQ numbers. Be sure to recheck all controllers for interrupt sharing after making this change.
Coalesce processing service console device interrupts (Only in ESX 3.5 Update 5)
As mentioned above, the performance impact due to IRQ sharing can be because of two reasons, namely, service console context switch overhead and CPU #0 processing bottleneck. Either of these reasons can result in degraded throughput for the VMkernel devices sharing IRQs with service console. In addition, the processing bottleneck on CPU #0 may also impact service console performance. For cases where the impact is largely due to the context switch overhead, ESX 3.5 Update 5 includes a method to improve performance by coalescing the processing of service console device interrupts that share an IRQ with VMkernel devices.
By default this method is disabled. When enabled, the interrupts are coalesced and processed only after a configurable threshold is crossed. This decreases the volume of spurious interrupts processed by service console when the interrupts are generated by a VMkernel device sharing the IRQ. Thus performance can improve due to lesser context switches. However, this approach penalizes processing of genuine interrupts from service console devices, since unacknowledged interrupts are re-delivered till the coalescing threshold is crossed. Considering that in typical scenarios, a low interrupt rate service console device shares IRQ with a high interrupt rate VMkernel device, this is a worthy trade-off and should result in overall performance gains.
The valid values for the coalescing threshold are 0 through 10. By default it is set to 0 indicating coalescing is disabled. Setting a non zero value dictates the number of interrupts that are coalesced before one is processed by service console for any shared IRQ. We recommend setting the lowest value at which acceptable performance improvements are observed. The coalescing threshold can be queried and set from the VMware Infrastructure Client as follows in ESX 3.5 Update 5 only:
Click the Configuration tab.
Click Advanced Settings.
View or set Irq.IRQNumHostPend.
Alternatively, run the following commands from the service console:
- esxcfg-advcfg --get /Irq/IRQNumHostPend
- esxcfg-advcfg --set <value> /Irq/IRQNumHostPend
Based on VMware KB 1003710