Modern chip-level multiprocessors (CMPs) contain multiple processor cores sharing a common last-level cache, memory interconnects, and other hardware resources. Workloads running on separate cores compete for these resources, often resulting in highly variableperformance. It is generally desirable to co-schedule workloads that have minimal resource contention, in order to improve both performance and fairness. Unfortunately, commodity processors expose only limited information about the state of shared resources such as caches to the software responsible for scheduling workloads that execute concurrently. To make informed resource management decisions, it is important to obtain accurate measurements of per-workload cache occupancies and their impact on performance, often summarized by utility functions such as miss-ratio curves (MRCs). In this paper, we first introduce an efficient online technique for estimating the cache occupancy of individual software threads using only commonly-available hardware performance counters.
Carl Waldspurger, Puneet Zaroo (VMware); Richard West (Boston University); Xiao Zhang (University of Rochester)