A Runtime Driver Verification System Using VProbes

Bo Chen
VMware, Inc.
chenb@vmware.com

Abstract

We introduce DDV, a new runtime verification system for device drivers for the VMware ESXi hypervisor. We took an innovative approach of building the core runtime verification system on VMware VProbes, a dynamic instrumentation framework similar to DTrace and SystemTap but optimized for virtualized environments. Typical use of these dynamic instrumentation frameworks is limited to the collection of data for offline troubleshooting, and we are not aware of any previous use of VProbes or similar frameworks to build complex systems for runtime analysis and verification. Our experience showed that this approach is not only feasible but has also made DDV much easier to develop, evolve, and use. Using runtime check and fault injection logic based on the ESXi driver API specification, DDV has already found 40 unique defects in several extensively tested device drivers widely used in data centers. Compared to similar driver verification systems such as the Windows Driver Verifier or KEDR for Linux, it has unique advantages in comprehensiveness and extensibility due to our approach.

1. Background and Introduction

Device drivers are critical to the reliability of operating systems. In mainstream operating systems, most device drivers still run in privileged mode and share the same address space as the rest of the operating system kernel. Therefore, defects in a device driver can and usually do crash the operating system.

In today’s virtualized data centers, the “operating system” that manages physical devices is the bare metal (Type 1) hypervisor, such as VMware ESXi. The hypervisor multiplexes the hardware resources to guest operating systems through virtual devices. Therefore, as shown in Figure 1, the guest OS now runs only the driver for the virtual devices—typically only one or two for each class of device—and the hypervisor must deal with the multitude of device drivers from various device vendors: recent version of ESXi ships with dozens of drivers and supports many more through a certification program. Moreover, a fault in a driver for a guest OS affects only the virtual machine (VM) using it, whereas a fault in the driver for the hypervisor can affect all the VMs—hundreds in many cases—running on it.

Unfortunately, device drivers are quite prone to defects. Analysis by Chou et al. of Linux 2.4.1 showed that drivers have far higher defect rates than the kernel average and account for over 70% of all bugs in the kernel [1]. In Windows XP, driver defects caused 57% of all system crashes [2]. Our own data at VMware showed that a similar portion of ESXi faults were actually due to defects in device drivers.

Figure 1. Management of Physical and Virtual Devices in the Virtualized Data Center.

Figure 1. Management of Physical and Virtual Devices in the Virtualized Data Center.

Multiple complementary techniques must be used for detecting defects in such critical yet error-prone pieces of software. These include the two most commonly used methods in software defect detection: manual code inspection and testing. However, both face unique challenges when used for device drivers. Effective manual code inspection requires expertise in the code. For device drivers, the expertise required includes several distinct but interconnecting functional areas: the hardware, the firmware, the software structure of the driver, and the operating system it runs on. Few people, if any, who work for either the device vendor or the OS vendor have such knowledge. Testing, which requires control of input and observation of output faces a similar problem, because direct observability and controllability of hardware and firmware is poor in software testing. Verification is another category of methods typically used for critical software, and is commonly divided into static and dynamic verification. Static verification, which checks if software meets certain correctness properties without running it, can be more thorough but is typically more prone to false positives (i.e., incorrectly reporting a defect or violation). Several types of static verification are already in use for ESXi, including its device drivers. Dynamic verification, also known as runtime verification, checks and sometimes guides the execution of the software. It often makes conventional testing much more effective by checking properties of the software not checked by the test cases, and in some techniques by expanding the coverage of its execution path.

For runtime verification of device drivers, an obvious focal point is the driver API. The driver API, provided by OS and hypervisor vendors to driver developers, defines exactly how drivers access resources and services from the kernel and vice versa, in a programmatic manner. This is precisely the type of specification or “contract” that various verification techniques can help to check. Moreover, the API defines the boundary between the kernel and the driver, and is an area where a large percentage driver defects reside or are exhibited. It is thus not surprising that a runtime driver verification tool at the driver API layer had been available since the first version of ESXi. In that design, drivers to be verified are built against a different set of driver API headers, which redirect the driver’s calls to the runtime verifier (see Figure 2). The verifier performs applicable checks prior to the call, then calls the actual driver API function and may perform additional checks after the call. If fault injection—another feature of the dynamic driver verifier—is turned on, instead of calling the actual driver API, it returns a “fake” value that signifies a fault. For example, to inject a heap allocation failure, it skips the kmalloc call from the driver and simply returns NULL. However, requiring the driver to recompile against a different set of headers whenever it needs to be verified turned out to cause too much hassle in regular testing. In addition, writing the verification features in C proved to be laborious and prone to error—creating errors in code that shares the same address space as the driver and has symptoms often indistinguishable from those of the driver.

Therefore, we turned our eyes to VMware’s dynamic instrument framework, VProbes [3]. Like other dynamic instrumentation frameworks such as DTrace [4] and SystemTap [5], it enables users to specify events of interests and the actions to be performed when the events occur. These instrumentations, also known as probes, are injected on-demand into the running binary—requiring no recompilation, rebooting, or reloading—and have no effect on the system when inactive. Thus, implementing a driver verification system on VProbes would not only make it more convenient to use in regular driver testing, but would also open it to use cases such as detecting and diagnosing potential driver issues on a live system. Additionally, scripts written in Emmett, the language that VProbes uses to specify what to do in response to which events, is extensively checked for errors prior to and during execution. VProbes guarantees that the only effects the script can have on the instrumented system (besides performance overhead) are those specified by users through well-defined methods.

Figure 2. Compile-Time Instrumentation of kmalloc() in the old DDV design.

Figure 2. Compile-Time Instrumentation of kmalloc() in the old DDV design.

Still, VProbes and other dynamic instrumentation frameworks are generally used to collect data for troubleshooting or other types of analysis. Like DTrace’s D language or the SystemTap scripting language, Emmett provides a set of features designed mainly to collect, filter, and aggregate data. Other features typically found in general-purpose programming languages, such as dynamically allocated memory and string or array operations, is notably missing. Thus, although two other major analytical tools are based on VProbes—one for detecting data races and the other for VM intrusion detection [6]— both use VProbes to collect data and then use components written in another language for offline analysis of the data. We are not aware of any major tool that performs online analysis as complex as driver verification in VProbes or other dynamic instrumentation frameworks. We were also concerned about the performance overhead of running these online analyses in Emmett, considering that some of driver APIs might be called hundreds of thousands times per second when processing I/O traffic.

Despite the concerns, we proceeded with developing DDV, which consists of a core supporting framework and several types of basic verification features for the ESXi driver API. We did encounter a number of challenges due to limits in the Emmett language or other aspects of VProbes, but we were able to overcome most of them with innovative use of VProbes features, which are discussed in sections 3 and 4 of this paper. We were able to cover both the Linux-like VMKLinux driver API and the new VMware ESXi Native Driver API—which has more than two hundred API functions with applicable runtime verification rules—and developed many new checking, fault- and rare event–injection, and supporting features. Compared to the original runtime verification system written in C, which covers only parts of the VMKLinux Driver API and has fewer features, the VProbes-based system was developed in less time, has fewer lines of code, and has been used to detect many more defects, demonstrating the effectiveness of VProbes for complex runtime analysis.
Building DDV on VProbes also gave it two additional advantages not possible with other approaches we considered:

  • Comprehensive observability and controllability – With a unified interface, VProbes can listen to a variety of events throughout every layer of the ESXi virtualized environment. During these events it can gather and correlate data from different transient or persistent data sources, and it performs several types of state modifications. This enables development of features that, for example, monitor an I/O operation all the way from the guest OS down to the ESXi kernel I/O stack, through the driver interface and to the driver itself.
  • Easy third-party extensibility – Driver development is often a collaborative effort between device and OS vendor, who might not have each other’s source code. Because VProbes does not require using or compiling the source in order to function, each party can develop verification or diagnostic features specific for their component and integrate them during use. The user-defined “static probes,” which does require source code modification, can provide implementation-independent events and data to share among various features and components.

2. Architecture

At a high level, the way DDV works is very simple (see Figure 3). A user starts DDV by having VProbes load the DDV Emmett script that contains probes for all instrumented API functions. When an instrumented function is called, VProbes notifies DDV, which first determines if it is a valid call from the driver being verified. If it is, the applicable check features as defined in the probe body are executed. If fault injection is requested for the call, register or memory state modification that simulates the fault is done, and the call to the actual API function is skipped. Otherwise the function executes as normal, and applicable post-call DDV checks are executed. Logging of errors or diagnostic information can also happen in any of these steps, as required by the user. DDV factors out the common verification and support features to a “library” of Emmett functions, so that the logic specific to each instrumented function can be written in as few as 1 line of code and rarely more than 10. A command line interface (CLI) was written in Python for users to start, stop, configure, and control DDV.

Figure 3. Core DDV Architecture.

Figure 3. Core DDV Architecture.

3. Runtime Checks

3.1 Local Checks
The simplest verification features to implement are those we called “local checks,” which are checks that can be performed using information directly accessible from the driver API call context. Examples include checking if a parameter is not NULL, or if it is in a specific range, or if a specific flag is used in a specific execution context (e.g., regular “world” vs. interrupt vs. “bottom half” context). Implementing them in DDV is not very different from writing them in C.

Note that these types of checks can and often are already implemented in the kernel, in which case it might return an error code to the caller, log a warning or error message, or in some cases halt (i.e., crash) the system depending on whether it is a debug or release build of ESXi. Nevertheless, DDV still implements them for two reasons. Theoretically, as a runtime verification system, DDV does not care whether or how kernel actually implements these checks, and instead is only concerned with the specification of the API. Practically, having DDV checks in addition to kernel checks is often useful, because DDV tries to log a variety of information useful for debugging the issue, whereas checks in the kernel typically focus on minimizing performance overhead and preventing the error from doing damage to the system.

3.2 Object State Checks
Almost all driver API functions are used by drivers to access one or more objects defined by the kernel. These objects can be hardware resources, such as heap memory, PCI ports, and interrupt numbers; kernel services, such as timers, work queues, and locks; or upper-layer abstractions such as network or SCSI device handles. An API call might try to get or change the states of these objects and is only legitimate when the object is in certain states according to the API specification. Object state check features need to maintain the objects’ states throughout the transitions and check for violations of transition rules.

DDV provides a set of common functions to implement object state transition rules. For example, for the VMKLinux driver API function disable_irq(), the state transition rule is defined by the line ddvTryAdjustBy(KOT_IRQ_DISABLE, irq, 1), which means “try to increase the value of the kernel object of type KOT_IRQ_DISABLE (i.e., the IRQ disable count) with the key equal to the parameter irq, by 1. If there is no object of type KOT_IRQ_DISABLE with key irq, DDV reports an error about an attempt to change an invalid or nonexistent KOT_IRQ_DISABLE object.

Storing the states of these objects in the VProbes Emmett language proved to be a challenge. The only collection data type in the Emmett language that supports read and write of individual elements is an associate array known as a bag. The capacity of a bag must be statically declared, and both the keys and values must be 64-bit integers. Thus, to store several properties associated with a kernel object, we must either pack them into one 64-bit integer if possible, or have more than one bag entry, each associated with a property. In this case a single kernel object type (e.g., ESXi native driver API heap allocation) is expressed in two or more DDV object types (e.g., the heapID and the size of the allocations).

In our initial implementation, each type was stored in a separate bag. Thus we had bags named timer_state, heap_allocSize, heap_heapID, and so on. The capacity of each bag was decided manually according to the maximum concurrent number of that type of object that a driver can use. For example, we found that drivers typically use fewer than 512 timers concurrently, so the capacity of timer_state was 512. However, the VProbes heap shared by all these bags has a relatively limited capacity, and deciding the size of the bags became a difficult balancing act to ensure enough capacity for each type of object while also ensuring that their total size did not exceed the VProbes heap size. A setting of the sizes that could accommodate all types of objects for one driver might not be suitable for another, or even for different devices using the same driver.

Thus in the current implementation, all types of objects are stored in a single large bag that consumes almost the entire VProbes heap. Each key used in this bag is composed of the same key used in the previous multibag implementation, plus a new object type ID at the upper 16 bits. This is possible because the keys we use to identify a kernel object were either the x86-64 canonical address of the object, for which the upper 16 bits are sign extensions of bit 47 and thus contain no information, or a number that uses 32 bits or fewer,such as the IRQ or I/O port number. This design avoids any waste of limited VProbes heap caused by varying kernel object usage by different drivers. The example shown in Figure 4 illustrates how two heap allocations and one timer state stored at one of the allocated heap locations are handled under the previous and current models.

Figure 4. DDV object state storage using multiple bags vs. using one bag.

Figure 4. DDV object state storage using multiple bags vs. using one bag.


3.3 Object Leak Checks

A special type of object state transition rule violation is not calling the necessary release / cleanup function. It includes not only heap memory leaks, but also failure to unmap DMA memory, release I/O ports or interrupt numbers, destroy locks or timers, and so on. In fact, almost every type of kernel object tracked by DDV expects a release/ cleanup function call from the driver. Failure to do so could lead to problems much later, making their diagnostic very difficult. Because there is no driver-initiated state transition to check against—the driver API call is simply missing—checks for this type of violation must be triggered elsewhere.

An obvious trigger is the exit of the module cleanup function called during driver unload. DDV intercepts this event and reports any object that remains in its object state storage as leaks, because all objects requested and used by the driver should be cleaned up/ released by this time. Information about the leaked objects are logged in the error messages to assist debugging.

However, in many cases the driver is never unloaded, or the object leak is causing problems before the driver unloads. For this, DDV allows checks for leaks between any two points in time for which the quantity of some types of object should remain the same. These points are defined for specific features and can be events that DDV can directly instrument, such as before and after a function call or beginning and completion of an I/O request. They can also be signaled by the user for events not easily observable by DDV. At these points DDV takes a snapshot of the quantity of these types of objects for runtime comparison, and the states of these objects can be logged for offline diagnostics.

4. Fault Injection and Rare Event Simulation

The specification of a driver API function defines whether and how it can “fail.” The driver must correctly handle these potential failures, but because many of them are rare and hard to test, they are a major source of defects. We have added these failure modes to the Emmett scripts of the fallible functions, which can be triggered through one of the basic random or deterministic fault-injection commands, or through synthetic fault injection and rare event simulation scenarios. Every failure mode of a function can be synthesized through state modification primitives provided by VProbes. In most cases it simply changes the return value of a function. For example, returning 0 to kmalloc indicates a heap allocation failure. To do this, DDV uses a built-in VProbes function to modify a register (return value in x86-64 is stored in rax) and uses another VProbes built-in to skip the execution of the intercepted function. For some other API functions, the failure response can also include specific values written to a memory block specified by the caller, which DDV modifies before returning. The driver API specification also defines whether a call always returns immediately or can block for some amount of time, which DDV can simulate through the VProbes timed delay function. For example, when a driver tries to acquire a spinlock, DDV can delay for some time before calling the spinlock acquisition function, as if the spinlock was locked.

4.1 Basic Fault Injection
4.1.1 Random Fault Injection
When issuing a random fault injection command, the user specifies the percent chance that calls to a specific driver API function will be randomly made to fail. The user can also use a consecutive failure option, in which case if the random number determines that a call will fail, several consecutive calls after it will also fail. This mode is very useful for testing the error handling within error-handling code. Calls to the function will continue to fail randomly until the user resets the failure chance back to 0.

4.1.2 Deterministic Fault Injection

DDV provides two commands to deterministically cause specific calls to fail. In the first one, the name of a fallible function and an integer N is specified. DDV causes the Nth call to that function to fail after the command is issued. In the second one, DDV causes the Nth call to any fallible function to fail after the command. In both cases a function call is made to fail at most once. The three basic fault injection commands can be issued externally through DDV’s CLI, or internally in DDV’s Emmett script (e.g., when handling a DDV instrumented function call).

4.2 Synthetic Fault Injection and Event Simulation

4.2.1 Exhaustive Fault Injection at Driver Loading
A driver requests most of its resources during the driver module loading and the device probing and initialization phases. Typically, when the request for some resources fails, the driver either tries to clean up and release previously acquired resources, or it retries the resource request. This leads to complex error-handling paths and is prone to defects. We implemented an exhaustive fault injection feature to efficiently explore error-handling paths after driver loads. With this feature, DDV’s Python-based command- line client repeatedly loads and unloads the driver, and during the Nth driver loading attempt, it issues the “fail the Nth fallible API call” command to DDV. In other words, every fallible API call (such as a resource allocation request) during driver loading will have failed exactly once after this feature is used. Sometimes a driver makes tens of thousands or more fallible calls during driver loading—for example, when a networking driver preallocates the ring buffers for each of its ports. This would take an unacceptable amount of time if we try to fail each one during a load – unload cycle. We implemented an optimization in which the driver is first loaded once without any fault injection so that DDV can record the stack traces of all fallible calls. The sequence of stack traces is then analyzed to find long-repeating patterns, indicating resource allocations in a loop, and in the subsequent exhaustive fault injection process fault injection is performed only for the first few and last few repeats of the call pattern, which most likely will uncover any error-handling issue in that loop if it exists.

4.2.2 Lock Contention Simulation
Both VMKLinux and the ESXi Native Driver API contain many functions operating on different types of spinlocks and semaphores. Certain correctness and performance issues might be more likely to be revealed when these locks are under some level of contention. Thus we built a lock-contention simulation feature that issues random-delay injection commands to each of the blocking lock functions, and random fault-injection commands to the nonblocking lock functions (e.g., variations of *try_lock, which fail when the lock cannot be immediately acquired). This makes it convenient for users to control the amount of lock contention simulated for a driver, using a single command.

4.2.3 Memory Pressure Simulation
The driver APIs also provide a myriad of functions for allocating memory under the abstractions of heaps, slabs, memory pools, and (for network drivers) packet buffers. A memory pressure simulation feature enables users to set when these allocations fail based on random chance, requested size, contiguity requirement, or some total available amount of memory to simulate. The last one relies on current memory allocation status and causes an allocation to fail only if it is going to exceed the simulated amount of total memory.

4.2.4 Context-Aware SCSI Command and Network Packet Injection
SCSI commands and network packets are the respective focal points of storage and networking stacks. Some types of commands/packets are rarely generated by real upper-layer workloads, but the I/O stack and driver must nevertheless handle them correctly. Testing of ESXi drivers already involves the use of tools that can craft specific SCSI commands and networking packets and inject them unto the I/O stack. DDV can make these tools more effective by using its observation of the runtime status of the driver and its fault injection capability. For example, a few of the most elusive storage driver defects we have seen are due to race conditions between a SCSI reset and some other rare event. DDV can help reproduce these by triggering the reset when it observes these rare events. Or, for example, DDV can detect the injection of a specific type of SCSI command and inject faults during its processing.

5. Supporting Services

5.1 Call Filtering
Currently DDV instruments only one driver—known as the driverunder- test (DuT)—at a time. However, unlike the original driver verifier, in which instrumentation is placed on the DuT, DDV instruments the driver API functions; thus, calls to them by any driver are intercepted by VProbes. So first DDV needs to filter out calls not from the DuT, which is done by comparing the module ID of the caller against that of the DuT.

In ESXi the kernel assigns a module ID to each kernel module (including all drivers) as they load. DDV acquires this ID by having a probe at the kernel function for driver loading. If the driver is already loaded, DDV must get the module ID from its CLI client, which queries the kernel about the driver. The module ID of the API caller is retrieved by DDV from a stack structure in the kernel that records current and previous module IDs of the executing thread (called a world in ESXi).

DDV can also filter out calls from the DuT if necessary. For example, if the use case has very low tolerance of instrumentation overhead, and the user wants to turn off verification features for some very frequently called API functions, DDV can be instructed to do so at runtime.

5.2 Error Logging and Call Tracing
When VProbes intercepts a function call, the following information is directly available: function name, parameter values, return value, register states, time of entry and return, current PCPU ID, and the stack trace. DDV automatically logs this information if call tracing is enabled. Information specific to the instrumented function, such as relevant memory states, and those specific to the DDV feature can also be logged. These logs can be very useful in analyzing a detected bug. The user can adjust log verbosity at runtime for all or individual functions. The user can also choose to log only when DDV detects an error. Logs are written to the VProbes ring buffer, which is periodically drained by a separate process; thus it adds little overhead to the instrumented functions. We initially used text logging for all error and debug messages and for call tracing. However, when logging is enabled for some frequently called functions, the VProbes ring buffer often overflowed due to the high volume of data. To resolve this issue, a binary logging scheme was adopted that uses numbers to designate fixed text information. It achieved an average of 5x reduction of bytes logged for most error messages and function call traces.

5.3 Configuration and Runtime Communication
The user can configure DDV before running it using DDV’s CLI, which in turn changes values in a configuration file. The file is actually an Emmett script file containing #define name value lines included in DDV by the Emmett preprocessor. These options include the name and type (VMKLinux or ESXi Native) of the DuT, special check or event injection features to enable, global log level, and so on.

The user can also control various aspects of DDV at runtime through the CLI. This is accomplished by activating a special probe called UserPoke, which enables the passing of up to 10 parameters at a time to the DDV Emmett code that handles the UserPoke probe. The first parameter is used as an opcode to multiplex different control commands. Two-way runtime communication with other kernel components is also possible by having DDV intercept prespecified function calls to get information or commands through the parameters, and pass information back by changing its return value or memory.

6. Results and Evaluation

6.1 Driver Defects Found
To date, DDV was able to find at least one previously unknown defect in every device driver we used it on, for a total of 40 unique defects in 15 ESXi device drivers. These drivers were all extensively tested by hardware vendors and VMware prior to DDV verification, and most have also been used widely in the field with one or more versions of ESXi. In particular, the 9 drivers using the VMKLinux API were ported from Linux drivers, and several of the defects DDV discovered also existed in their Linux counterparts. DDV also unexpectedly exposed 6 defects in the ESXi I/O stack and device-management code.

Defects discovered by DDV checks are reported with the specific driver API specification it violated and function call level context information. Defects revealed using its fault injection features are reported with the fault it injected. And for both types of defects, DDV can be requested to log all relevant driver API calls and kernel object states, plus information about other events of interest through ad-hoc user extension of DDV. These types of information are often very valuable in debugging: From our experience, finding the root cause of a DDV-reported issue typically takes only one to two hours. On one occasion, five different defects from a driver were detected and the root caused determined in about two hours. Among the driver defects, 19 are detected by check features and another 21 by fault injection. We describe several of the defects below. Source code is omitted to avoid revealing the identity of the driver and vendor. Among the defects are:

  • Double initialization of a timer – The timer initialization function was called for an already active timer under certain circumstances, which resets its content and causes various mysterious failures. Calling the function for an active timer is a violation of driver API specification and was immediately reported by DDV.
  •  Different sizes used for dma_alloc_coherent and dma_free_coherent calls for a DMA buffer – A default size was used for the allocation of a coherent DMA buffer during driver loading, but the variable storing the size could be adjusted after loading, and the potentially different size was used in freeing the buffer. DDV detected the violation of a rule about equal size for use in allocation and freeing. The defect was undetected in the Linux version of the driver for 9 years before its recent discovery and fix.
  • Unsigned int used as loop variable in an error-handling loop – A loop was used to allocate a series of buffers, and when one of the allocations fails, another loop was used to release the already allocated buffers. Both loops used the same unsigned int i as the loop variable. But the buffer release loop was descending (because i already contained the index of the last allocated buffer) and terminated when i < 0. This would never happen for an unsigned int, thereby creating an infinite loop. DDV used fault injection to exercise this path, which had never been tested before.
  • Memory leak during interrupt allocation failure – When the request of an interrupt number failed during the device probing process, the driver failed to release some allocated memory during the cleanup operation, causing a memory leak. Both the fault injection and memory leak checking features in DDV played a role in detecting this defect.

6.2 Performance Overhead
Performance overhead to the software being verified is always a concern for various runtime verification techniques. Because drivers sit in a performance-critical path of the kernel and the whole ESXi-virtualized stack, the tolerance for overhead is low for DDV. We measured the overhead of DDV instrumentation of a call to a driver API function, in terms of added CPU cycles per call, as well as the total effect on typical storage and networking drivers’ throughput and latency, in terms of percent slowdown. The former metric is primarily used to help us in finding the main sources of overhead and reducing them, while the latter is used to decide whether the overhead is acceptable.

On a test server using Intel Xeon E5620 CPUs and the debug build of ESXi 5.5, we measured the number of CPU cycles (using rdtsc) of various driver API calls, with and without DDV instrumentation, and calculated their differences. We found that DDV added about 6,000 to 8,000 cycles to functions for which DDV instruments only their entries, and double those amounts to functions for which both entries and exits are instrumented. Close to 3,000 cycles are used by VProbes just to intercept the function call—the overhead of having an empty probe. Another approximately 1,000 cycles are used by DDV code to prepare for check and fault injection features, such as call filtering. These 4,000 cycles are the “fixed” cost of DDV instrumentation. The rest depends on the number and complexity of verification rules applicable to each driver API. These are dominated by Emmett bag operations, which cost about 900 to 1,200 cycles compared to scalar (int) variable accesses of tens of cycles each. DDV typically has one- to three-bag operations, used to read and write kernel object states, per driver API function call. We have already used several techniques to reduce the number of bag operations.

For example, Bloom filters implemented as bit arrays on 64-bit integers, mapped to each of the more than 220 driver API functions, are used to determine if certain per-function properties have changed from the default value (stored in an int) to a function-specific one stored in a bag.

We have also tested the performance of several networking drivers on test beds that enable them to run near their maximum possible speeds. Using netperf as benchmark, we found that throughput was reduced by 5% to 20% when DDV is turned on, depending on the driver. We consider this as an acceptable level of overhead. Note that the slowdown is dominated by instrumentation of several API functions on hot data paths, namely slab memory allocation and freeing, and DMA map and unmaps, which are sometimes called hundreds of thousands of times per port. In hypothetical use cases in which the 5% to 20% overhead is unacceptable, DDV can be controlled to turn off the instrumentation of these functions at the cost of losing checks related to these functions, and we found that the slowdown would then become too small to be noticed. From an even higher level, when quality assurance teams run their suite of tests on the drivers, the overall slowdown due to the activation of DDV checks is also too small to be noticed. This has great practical importance, because any significant slowdown to the full driver test suite, which already takes days to run for each driver, could affect product test and release schedules and increase resistance to DDV’s adoption.

7. Related Work

7.1 OS-Level Dynamic Instrumentation Frameworks
Two dynamic instrumentation frameworks similar to VProbes are DTrace—originally developed for the Sun (now Oracle) Solaris operating system and subsequently ported to many other UNIXlike operating systems—and SystemTap, developed for Linux. Of the three, DTrace was developed the earliest and is perhaps the most popular and well-known. As mentioned earlier, DTrace and SystemTap are designed for collecting data for real-time display or offline analysis. Our search of the public “libraries” of DTrace and SystemTap scripts on the Internet, and publications about the frameworks’ usage, show that this is indeed how they are used. We have not found any other use of them for complex runtime verification similar to DDV’s. Tanaka et al. [7] have developed a SCSI fault injection tool using SystemTap that is similar in concept to doing SCSI fault injection in VProbes. There have been two other major analytical tools based on VProbes (both by VMware interns): one for detecting data races and the other for VM intrusion detection. But in both cases VProbes is used to collect execution traces for offline analysis.

7.2 Runtime Driver-Verification Tools

The only other runtime driver-verification tools we are aware of are Windows Driver Verifier [8], which has been available since Windows 2000, and the more recent KEDR [9] project, which focuses on instrumenting Linux kernel modules. The Windows Driver Verifier has many useful features but seems to check only a select number of error-prone functions and areas, rather than the entire driver API, and its individual features can only be turned on or off rather than configured in many ways as in DDV. Unlike DDV, the Windows Driver Verifier requires a reboot to enable and disable it. We are also not able to find any information about extending its functionality, and it is likely that the features are coupled with the kernel and cannot be extended by third parties. KEDR is similar to DDV in that it is presented as framework for other features and can be turned on and off without rebooting. But it does not have the capability to execute complex logic at runtime, so its verification features are mostly implemented through analyzing logs offline.

8. Conclusion and Future Work

Our experience developing DDV on VProbes shows that it is possible to build this type of complex runtime analysis tools on scriptable dynamic instrumentation systems. The scripting languages they provide are already sufficiently expressive and efficient for complex logic, though sometimes creativity might be required from the user to compensate for features not yet available. The overhead of running complex logic on these frameworks is also quite low and should be acceptable for most test and even production environments. The dynamic nature of these frameworks provides many advantages in ease of development and use. Many types of “advanced” features using information from several layers of the system can be developed or benefit from the excellent systemwide observability and controllability of these frameworks. As a runtime driver-verification framework, DDV has proved to be highly effective. Even though the verification features we have already implemented are relatively basic, they have been able to detect a decent number of driver defects that would be otherwise difficult to detect. It has also been useful for finding the root causes of these and other driver defects. The ease of use and low overhead incurred, mainly due to our choice of using VProbes, helped its adoption for testing a large number of drivers. Compared to other runtime driververification tools, DDV has unique advantages in comprehensiveness and extensibility that we plan to exploit more in future. Although we will continue to enhance DDV’s core framework, the main direction of our future work on DDV is the development of the more “advanced” verification features. This includes continued effort to integrate DDV with SCSI command and network packet injection tools; the full-stack observability and controllability from DDV can greatly enhance their effectiveness. We are also investigating features to help expose and diagnose concurrency-related defects in drivers and other parts of the kernel, for example by analyzing or altering the behavior of synchronization primitives and kernel scheduling for the code being verified. Adding offline analysis of DDV logs is also being considered. In short, there is a very large body of work in runtime verification that we can learn from and contribute to.

Acknowledgments

The author would like to thank his colleagues who have directly contributed to the DDV project, including Zongyun Lai, Shu Wu, Shujun Ou, Yizhong Zhang, and Ichiro Kobayashi. Also, DDV would not be possible without the wonderful VProbes framework built by the VMware VProbes team or their continued support of the DDV project: Radu Rugina, Ricardo Gonzalez, Alok Kataria, Vivek Thampi, and others played key roles in developing new features needed by DDV.

References

  1. A. Chou, J. Yang, B. Chelf, S. Hallen, and D. Engler, “An Empirical Study of Operating Systems Errors,” 18th ACM Symposium of Operating System Principles, 2001.
  2. A. Ganapathi, V. Ganapathi, and D. Patterson, “Windows XP Kernel Crash Analysis,” 20th Conference on Large System Administration, 2006.
  3. M. Carbone, A. Kataria, R. Rugina, and V. Thampi, “VProbes: Deep Observability Into the ESXi Hypervisor,” VMware Technical Journal, Summer 2014.
  4. B. Cantrill, M. Shapiro, and A. Leventhal, “Dynamic Instrumentation of Production Systems,” USENIX Annual Technical Conference, 2004.
  5. V. Prasad, W. Cohen, F. Eigler, M. Hunt, J. Keniston, and B. Chen, “Locating System Problems Using Dynamic Instrumentation,” Linux Symposium, 2005.
  6. A. Dehnert, “Intrusion Detection Using VProbes,” VMware Technical Journal, Winter 2012.
  7. K. Tanaka, M. Hamaguchi, T. Sato, and K. Tatsukawa, “SCSI Fault Injection Test,” Linux Symposium, 2008.
  8. Microsoft, “About Driver Verifier.”
  9. V. Rubanov and E. Shatokhin, “Runtime Verification of Linux Kernel Modules Based on Call Interception,” 4th IEEE International Conference on Software Testing, Verification and Validation, 2011.