VMware, Inc./Massachusetts Institute of Technology
Many current intrusion detection systems (IDS) are vulnerable to intruders because they run under the same operating system as a potential attacker. Since an attacker often attempts to co-opt the operating system, the IDS is vulnerable to subversion. While some systems escape this flaw, they generally do so by modifying the hypervisor. VMware® VProbes technology allows administrators to look inside a running virtual machine, set breakpoints, and inspect memory from a virtual machine host. We aim to leverage VProbes to build an IDS for Linux guests that is significantly harder for an attacker to subvert, while also allowing the use of a common off-the-shelf hypervisor.
A common mechanism for defending computers against malicious attackers uses intrusion detection systems (IDSes). Network IDSes monitor network traffic to detect intrusions, while host-based IDSes monitor activity on specific machines. A common variety of host-based IDSes watches the kernel-application interface, monitoring the system calls that are used . Based on the sequences of system calls used and their arguments, these IDSes aim to determine whether or not an attack is underway.
While intrusion detection systems are not fully effective, they have proven to be useful tools for catching some attacks. Since a host-based IDS runs on the host it is protecting, it is vulnerable to a virus or other attacker that seeks to disable it. An attacker might block network connectivity the IDS requires to report results, disable hooks it uses to gather information, or entirely kill the detection process. This is not a theoretical risk. Viruses in the wild, such as SpamThru, Beast, Win32.Glieder.AF, or Winevar directly counter anti-virus software installed on their hosts.
The robustness of a host-based IDS can be improved by running it on the outside of a virtual machine, using capabilities exposed by the hypervisor to monitor the virtual machine, and gather information the agent in the guest would ordinarily use.
VMware ESXi™ supports VProbes, a mechanism for examining the state of an ESXi host or virtual machine, similar to the Oracle Solaris Dynamic Tracing (DTrace) facility in the Oracle Solaris operating system. VProbes allows users to place user-defined probes in the ESXi kernel (VMkernel), the monitor, or within the guest. Probes are written in a C-like language called Emmett, and perform computation, store data, and output results to a log on the ESXi host. While primarily used to diagnose performance or correctness issues, VProbes also can be used to supply data to the IDS.
Our IDS is architected as two components:
- The gatherer uses VProbes to retrieve system call information from the guest virtual machine. This allows it to run outside of the guest while still gathering information from within the guest.
- The analyzer uses the data gathered to decide whether or not a system call sequence is suspicious. The analysis component is similar to components in other intrusion detection systems, and can use the same types of algorithms to identify attacks.
One advantage of splitting the gatherer from the analyzer is modularity. Two major variants of the gatherer exist currently: one for Linux and one for Microsoft Windows, with specialized code for 32-bit versus 64-bit Linux and the different operating system versions. All of these variants produce the same output format, enabling attack recognition strategies to be implemented independently in the analysis component. The gatherer does not need to change based on the attack recognition strategy in use. The analysis component can be oblivious to the operating system, architecture, or version. In addition, it is possible to run several analyzers in parallel and combine results. Running the analyzers with saved data rather than live data could be useful for regression testing of analyzers or detecting additional past exploits as improved analyzers are developed.
The division is as strict as it is for a different reason: language. VProbes provides quite limited support for string manipulations and data structures. Additionally, the interpreter has relatively low limits on how much code probes can include. While these limitations likely are solvable, separating them required significantly less work and allows the analyzer to use the functionality of Python or other modern languages without reimplementation.
The gatherer essentially outputs data that looks like the output of the Linux strace utility, with names and arguments of some system calls decoded. Depending on what seems most useful to the analysis component, this may eventually involve more or less symbolic decoding of names and arguments.
The gatherer also is responsible for outputting the thread, process, and parent process IDs corresponding to each system call, as well as the program name (comm value or Microsoft Windows equivalent, ImageFileName, and binary path). Analysis scripts use this data to build a model of normal behavior and search for deviations. Generally, these scripts build separate models for each program, as different programs have different normal behavior.
The data gathering component uses VProbes to gather syscall traces from the kernel. Gatherers exist for 32-bit and 64-bit Linux (across a wide range of versions), and 64-bit Microsoft Windows 7. The Linux gatherers share the bulk of their code. The Microsoft Windows gatherer is very similar in structure and gathers comparable data, but does not share any code.
To run the gatherer, VProbes sets a breakpoint on the syscall entry point in the kernel code. When a syscall starts to execute the breakpoint activates, transferring execution to our callback. The callback extracts the system call number and arguments from the registers or stack where it is stored. In addition, the probe retrieves data about the currently running process that the analysis components need to segregate different threads to properly sequence the system calls being used and associate the syscalls with the correct per-program profile.
Optionally, the gatherer can decode system call names and arguments. The Linux callback has the name and argument format for several system calls hardcoded. For numeric arguments, the probe simply prints the argument value. For system calls that take strings or other types of pointers as arguments, it prints the referenced data. It also prints the name of the system call in use. Since the current analysis scripts do not examine syscall arguments, this capability was not implemented for the Microsoft Windows gatherer.
Another optional feature reports the return values from system calls. As with argument values, current analysis scripts do not utilize this data. Consequently, while support is available for Linux, it was not implemented for Microsoft Windows.
Writing a gatherer requires two key steps. First, relevant kernel data structures storing the required information must be identified. Second, the offsets of specific fields must be made available to the Emmett script. While the general layout changes little from one version of the kernel to another, the precise offsets do vary. As a result, an automated mechanism to find and make available these offsets is desirable.
3.1.1 Relevant Kernel Data Structures
The first step in implementing the gatherer is to find where the Linux or Microsoft Windows kernel stores the pertinent data. While a userspace IDS could use relatively well-defined, clearly documented, and stable interfaces such as system calls, or read /proc to gather the required information, we are unable to run code from the target system. As a result, we must directly access kernel memory. Determining the relevant structures is a process that involves reading the source, disassembling system call implementations, or looking at debugging symbols.
In Linux, the key data structure is the struct task_struct. This contains pid (the thread ID), tgid (the process ID), and pointers to the parent process’s task_struct. We output the thread and process IDs, as well as the parent process ID, to aid in tracking fork calls. On Microsoft Windows, broadly equivalent structures exist (Table 1).
|Breakpoint at||syscall_call, sysenter_do_call (32-bit);system_call
Table 1: Key fields in the Linux and Microsoft Windows process structures
Identifying the running program is surprisingly difficult. The simplest piece of information to retrieve is the comm field within the Linux task_struct. This field identifies the first 16 characters of the process name, without path information. Unfortunately, this makes it difficult to distinguish an init script (which tends to use large numbers of execve and fork calls) from the program it starts (which may never use execve or fork). Hence, full paths are desirable.
Path data is available through the struct mm_struct, referenced by the mm field of the task_struct. By recursively traversing the mount and dentry structures referenced through the mm_struct exe_file field, the full path of the binary being executed can be retrieved. Since exe_file is the executed binary, the entry for shell scripts tends to be /bin/bash, while Python scripts typically have a /usr/bin/python2.7 entry, and so on. Therefore, it is important to identify the current program based on both the comm and exe_file fields—simply using comm is insufficient because of init scripts, while exe_file cannot distinguish between different interpreted programs.
In Microsoft Windows, finding this data poses different challenges. The program name (without a path) is easy to find. It is stored in the ImageFileName field of the EPROCESS structure. As with Linux, the full path is harder to find. On Windows, the EPROCESS structure SeAuditProcessCreationInfo.ImageFileName->Name field is a UNICODE_STRING containing the path to the process binary. Unlike Linux, recursive structure traversal is not required to read the path from the field. However, Microsoft Windows stores this path as a UTF-16 string. As a result, ASCII file names have alternating null bytes, which means Emmett’s usual string copy functions do not work. Instead, we individually copy alternating bytes of the Unicode string into an Emmett variable. This converts non-ASCII Unicode characters into essentially arbitrary ASCII characters. We believe these will be rare in program paths. Additionally, since the current analysis scripts treat the path as an opaque token, a consistent, lossy conversion is acceptable.
3.1.2 Accessing Fields from Emmett
Even after correct structures and fields are found a challenge remains. Although instrumentation has the structure address, it needs to know the offset of each field of interest within the structure so it can access appropriate memory and read data. Unfortunately, kernels are not designed to make the format of internal data structures easily accessible to third-party software. Emmett has two main features that make this feasible: the offat* family of built-in functions and sparse structure definitions.
18.104.22.168 Finding Offsets at Runtime
The offat* family of functions allows finding these offsets at runtime. Each function scans memory from a specified address, checking for instructions that use offsets in particular ways. Frequently, symbol files are used to find the appropriate address.
Emmett supplies three functions in this family. The first is offatret, which finds the ret instruction and returns the offset loaded into rax to be returned. By passing the address of a simple accessor function such as the Windows nt!PsGetProcessImageFileName to offatret, we can find the offset of a structure field such as the ETHREAD ImageFileName. The second is offatseg, which finds the first offset applied to the FS or GS segment registers. These registers are commonly used for thread-local state, making them helpful for finding thread-specific structures such as the task_struct in Linux or the Microsoft Windows ETHREAD. With a Microsoft Windows guest, offatseg(&nt!PsGetCurrentProcess) finds the offset of the CurrentThread pointer within the KPCR structure. Finally, offatstrcpy searches for calls to a specific function address and returns the offset used to load RSI for the call. This could be used to find the offset of a string element in a structure, but is not currently used by any gatherers.
The offat* functions offer the advantage of allowing a single Emmett script to be used against kernel binaries with different offset values. As a result, the VProbes distribution includes library code that uses offat* to find the current PID and certain other fields, which was used for the Microsoft Windows gatherer. However, offat* requires finding an appropriate accessor in which to search for offsets and is dependent on the precise assembly code generated by the compiler. Consequently, another mechanism was desirable for new gatherer code.
22.214.171.124 Encoding Offsets in Scripts
Emmett also supports sparse structure definitions, allowing required offsets to be conveniently encoded within the script. A sparse structure is defined similar to a normal C structure, except that some fields can be omitted. Prefixing a field definition with an @ character and an offset enables Emmett to use the specified offset instead of computing an offset based on the preceding fields in the structure and their size. Given a way to find offsets, this allows only relevant fields to be specified, ignoring those that are not needed.
For the Linux gatherer, a Linux module assembles the necessary offsets. When loaded, it exposes a file in the /proc/ directory. The file contains offsets of a number of important fields in the kernel, formatted as #define statements. The file can be copied to the host and #included in a script that uses those offsets directly or to define sparse structures. Currently, users must compile the module, copy it to a machine running the correct Linux version, and load it into the kernel. In the future, we plan to provide a script to extract the relevant offsets directly from debug symbols in the module binary, instead of needing to load the module.
In the Microsoft Windows gatherer, the requisite offsets are extracted directly from the debugging symbols. A script uses the pdbparse Python library to convert ntkrnlmp.pdb, which Microsoft makes available for use with debuggers, into Emmett structure definitions containing the fields of interest. The output file can be #included from the gatherer, and the structures can be traversed just as they would be in kernel code.
While this technique requires updating the script for different kernel versions, we find it more convenient. One advantage is that files with offsets can be generated automatically, and the Emmett preprocessor used to incorporate the current offsets into the rest of the code. Using sparse structures allows the Emmett compiler to perform type checking. The process of writing scripts is less prone to error when the compiler distinguishes between pointer and non-pointer members, or an offset in the ETHREAD and EPROCESS structures. In addition, code is much clearer when the same names and syntax can be used as is present in the source or debugger output. Therefore, while the Microsoft Windows gatherer uses both offat* and sparse structures, the Linux gatherer uses only the latter.
While our work focused on gatherers, we wrote two simple analyzers to validate the technique was sound. Both analyzers build profiles on a per-program basis. Programs are identified by their comm value and binary path (or Microsoft Windows equivalent). Recall that using only the former causes an init script and the daemon it runs to share a profile, while using only the latter combines all programs in a given interpreted language into one profile.
Both analyzers are passed logs of a normally executing system, as well as a potentially compromised system. They use this information to build a profile of normal behavior. If the potentially compromised system deviates from the profile, they report a potential attack.
3.2.1 Syscall whitelist
The simplest form of a profile is a simple whitelist of allowed system calls. As the normal log is read, the analyzer notes which system calls are used, such as open, read, write, and so on. When the potentially compromised log is read, the analyzer sees if any new system calls are used. If any are, it reports a possible intrusion.
While extremely simple, this analyzer can detect some attacks. We installed a proftpd server that was vulnerable to CVE-2010-4221 and attacked it using Metasploit Project’s exploit. Under normal operation, an FTP server has a very simple system call pattern: it mostly just opens, reads, writes, and closes files. An attack, however, often uses the execve function. Since the daemon does not normally use execve, it does not appear in the whitelist and the analyzer immediately triggers.
One advantage of this technique is that it requires little tuning and has few false positives. The profile is simple enough that building a complete “normal” profile is quite manageable. A disadvantage, of course, is that it detects relatively few intrusions. For example, an attack on a web server running CGI scripts is quite hard to detect, since a web server uses a much wider variety of system calls.
A mechanism commonly used in the academic intrusion detection literature is sequence time-delay embedding, or stide. In this technique, the profile consists of all n-tuples of consecutive system calls. For example, with n=3 and a system call sequence of open, read, write, read, write, close the 3-tuples would be (open, read, write), (read, write, read), (write, read, write), and (read, write, close). To scan a trace for an intrusion, the analyzer checks for tuples that are not found in the training data. If enough such tuples are found in a sliding window an intrusion is likely. The length of the tuples, size of the sliding window, and threshold of how many tuples are required to trigger the system can be tuned to achieve an acceptable level of false positives and negatives.
We had difficulty getting this technique to produce reasonable error rates. One refinement that may help is to build tuples on a per-thread basis, so that multi-threaded programs or programs with multiple instances running at once do not interleave system calls. The runs were performed with only a handful of system calls reported to the analyzer. Using a more complete set might be more successful. Finally, we could try larger sets of training data and further tweak the parameters.
One concern for security software such as this is the performance overhead it entails. Since VProbes primarily causes a slowdown when the probe triggers, we expect performance to depend largely on how often system calls occur. In a workload involving a large number of system calls, we expect performance to suffer. In a largely computational workload, however, we expect performance to be roughly the same as without instrumentation.
To measure the performance overhead of instrumentation, we ran an Apache httpd instance in a virtual machine and used ab to measure how long it took to make a large number of requests to the server. Several different file sizes were tested, as well as static and dynamic content to get a sense of how different system call patterns could affect performance. For this test, a version of the Linux instrumentation was used that printed approximately a dozen system calls and decoded their arguments. We found that for small static files (a couple hundred or thousand bytes), performance was prohibitive. Larger files were more reasonable: a 13 KB file had roughly 5 percent overhead compared to an uninstrumented server, and 20 KB or larger files had overhead well under 1 percent. We also considered a common web application, Mediawiki, to see how dynamic sites compared. An 11 KB page had approximately 13 percent overhead, while a 62 KB page saw 3 percent overhead.
Our current instrumentation is not optimized. To get a better sense of what portions of the instrumentation are slow, we enabled printing of all system calls and experimented with removing various parts of the instrumentation. With a 10 KB file, we found downloading it 10,000 times took 4.7 seconds on average. With full instrumentation, downloads took approximately 14.4 seconds. Of that 9.7 second increase, it appears that setting the breakpoint accounts for approximately 28 percent and computing the path to the binary is approximately 42 percent. With 70 percent of the runtime cost apparently due to these two components, optimizing either could have a significant impact.
For the breakpoint, VProbes supplies two ways to trigger probes. Users can set a breakpoint at an arbitrary guest address (as we currently do) or trigger when a certain limited set of events occurs, such as taking a page fault, changing CR3, or exiting hardware virtualization. This latter type of probe is significantly faster. If one of these probe points can be used instead of the current breakpoint, it could nearly eliminate the 28 percent of cost of the breakpoint.
For computing the path to the binary, more data is gathered then necessary. Simply retrieving the base name of the binary (bash, python2.7, and so on) is likely to be sufficient in combination with the comm value, and should be substantially faster. Alternatively, the binary can be cached and only recomputed when the CR3 or another property changes.
VProbes provides a viable replacement for an in-guest agent for a system call-based intrusion detection system. Our system is divided into two components: a gatherer that uses VProbes and an analyzer that determines whether a sequence of system calls is suspicious. The unoptimized performance of the gatherer likely is acceptable, depending on the workload, and several opportunities exist for further optimization.
While we focused on data gathering rather than analysis, we have an end-to-end proof of concept that uses whitelisted system calls to successfully detect certain attacks on a simple FTP server. More elaborate analysis techniques have been amply studied in the literature and could be combined with our instrumentation.
- Apache. ApacheBench, http://httpd.apache.org/docs/2.2/programs/ab.html
- Tal Garfinkel. Traps and pitfalls: Practical problems in system call interposition based security tools. In Proc. Network and Distributed Systems Security Symposium, pages 163–176, 2003.
- jduck. ProFTPD 1.3.2rc3 – 1.3.3b Telnet IAC Buffer Overflow (Linux), http://www.metasploit.com/modules/exploit/linux/ftp/proftp_telnet_iac
- Andrew P. Kosoresow and Steven A. Hofmeyr. Intrusion detection via system call traces. IEEE Softw., 14(5):35–42, September 1997.
- pdbparse, http://code.google.com/p/pdbparse/
- Raghunathan Srinivasan. Protecting anti-virus software under viral attacks. Master’s thesis, Arizona State University, 2007.
- Kymie M. C. Tan, Kevin S. Killourhy, and Roy A. Maxion. Undermining an anomaly-based intrusion detection system using common exploits. In RAID, pages 54–73. Springer-Verlag, 2002.
- Christina Warrender, Stephanie Forrest, and Barak Pearlmutter. Detecting intrusions using system calls: Alternative data models. In IEEE Symposium on Security and Privacy, pages 133–145. IEEE Computer Society, 1999.