On modern processors, hardware-assisted virtualization outperforms classical binary translation for most workloads. But hardware virtualization has a potential problem: virtualization exits are expensive. While hardware virtualization executes guest instructions at native speed, guest/VMM transitions can sap performance. Hardware designers attacked this problem both by reducing guest/VMM transition costs and by adding architectural extensions such as nested paging support to avoid exits.
This paper proposes complementary software techniques for reducing the exit frequency. In the simplest form, our VMM inspects guest code dynamically to detect back-to-back pairs of instructions that both exit. By handling a pair of instructions when the first one exits, we save 50% of the transition costs. Then, we generalize from pairs to clusters of instructions that may include loops and other control flow. We use a dynamic binary translator to generate, and cache, custom-translations for handling exits. The analysis cost is paid once, when the translation is generated, but amortized over many future executions.
Our techniques have been fully implemented and validated in recent versions of VMware products. We show that clusters can provide some of the same benefits for device I/O performance as can device paravirtualization. Moreover, we demonstrate that clusters often enable substantial gains for nested virtual machines, delivering speedups as high as 1.68x. Intuitively, this result stems from the fact that transitions between the inner guest and VMM are extremely costly, as they are implemented in software by the outer VMM.
Ole Agesen, Jim Mattson, Radu Rugina, Jeffrey Sheldon (VMware)