In fast-changing desktop environments, we are witnessing increasing use of Virtual Desktop Infrastructure (VDI) deployments due to better manageability and security. A smooth transition from physical to VDI desktops requires significant investment in the design and maintenance of hardware and software layers for adequate user base support. To understand and precisely characterize both the software and hardware layers, we present VMware View® Planner, a tool that can generate workloads that are representative of many user-initiated operations in VDI environments.
Such operations typically fall into two categories: user and admin operations. Typical user operations include typing words, launching applications, browsing web pages and PDF documents, checking email and so on. Typical admin operations include cloning and provisioning virtual machines, powering servers on and off, and so on. View Planner supports both types of operations and can be configured to allow VDI evaluators to more accurately represent their particular environment. This paper describes the challenges in building such a workload generator and the platform around it, as well as the View Planner architecture and use cases. Finally, we explain how we used View Planner to perform platform characterization and consolidation studies, and find potential performance optimizations.
As Virtual Desktop Infrastructure (VDI) [6,7,8] continues to drive toward more cost-effective, secure, and manageable solutions for desktop computing, it introduces many new challenges. One challenge is to provide a local desktop experience to end users while connecting to a remote virtual machine running inside a data center. A critical factor in remote environments is the remote user experience, which is predominantly driven by the underlying hardware infrastructure and remote display protocol. As a result, it is critical to optimize the remote virtualized environment  to realize a better user experience. Doing so helps IT administrators to confidently and transparently consolidate and virtualize their existing physical desktops.
Detailed performance evaluations and studies of the underlying hardware infrastructure are needed to characterize the end-user experience. Very limited subjective studies and surveys on small data sets are available that analyze these factors [19,20,21], and results and techniques do not measure up to the scale of the requirements in VDI environments. For example, the required number of desktop users easily ranges from a few hundred to tens of thousands, depending on the type of deployment. Accordingly, there is a pressing need for an automated mechanism to characterize and plan huge installations of desktop virtual machines. This process should qualitatively and quantitatively measure how user experience varies with scale in VDI environments. These critical measurements enable administrators to make decisions about how to deploy for maximum return on investment (ROI) without compromising quality.
This paper presents an automated measurement tool that includes a typical office user workload, a robust technique to accurately measure end-to-end latency, and a virtual appliance to configure and manage the workload run at scale. Ideally, the workload needs to incorporate the many traditional applications that typical desktop users use in the office environment. These applications include Microsoft Office applications (Outlook, PowerPoint, Excel, Word), Adobe Reader, Windows Media player, Internet Explorer, and so on. The challenge lies in simulating the operations of these applications so they can run at scale in an automated and robust manner. Additionally, the workload should be easily extensible to allow for new applications, and be configurable to apply the representative load for a typical end user. This paper discusses these challenges and illustrates how we solved them to build a robust and automated workload, as described in Section II.
The second key component to a VDI measurement framework is precisely quantifying the end-user experience. In a remote connected session, an end-user only receives display updates when a particular workload operation completes. Hence, we need to leverage the display information on the client side to accurately measure the response time of a particular operation. Section III presents how a watermarking technique can be used in a novel way to measure remote display latency. The watermarking approach was designed to:
- Present a new encoding approach that works even under adverse network conditions
- Present smart watermarking placement to ensure the watermark does not interfere with application updates
- Make the location adaptive to work with larger screen resolutions
The final piece is the automated framework that runs the VDI user simulation at scale. We built a virtual appliance with a simple and easy-to-use web interface. Using this interface, users can specify the hardware configuration, such as storage and server infrastructure, configure the workload, and execute the workload at scale. The framework, called VMware View Planner, is designed to simulate a large-scale deployment of virtualized desktop systems and study the effects on an entire virtualized infrastructure (Section IV). The tool scales from a few virtual machines to thousands of virtual machines distributed across a cluster of hosts. With View Planner, VDI administrators can perform scalability studies that closely resemble real-world VDI deployments and gather useful information about configuration settings, hardware requirements, and consolidation ratios. Figure 1 identifies the inputs, outputs, and use cases of View Planner.
Figure 1. View Planner: Inputs, Outputs, and Use cases
Using the View Planner tool, we enable the following use cases in the VDI environment:
- Workload generation. By configuring View Planner to simulate the desired number of users and configure applications, we can accurately represent the load presented in a given VDI deployment. Once the workload is running, resource usage can be measured at the servers, network, and storage infrastructure to determine if bottlenecks exist. Varying the load enables required sizing information to be obtained for a particular VDI architecture (Section V).
- Architectural comparisons. To determine the impact of a particular component of VDI architecture, we can configure a fixed load using View Planner and measure the latencies of administrative operations (provisioning, powering on virtual machines, and so on) and user operations (steady-state workload execution). Switching the component in question and measuring latencies again provides a comparison of the different options. A note of caution here is that both administrative and user operation latencies can vary significantly based on the underlying hardware architecture, as described in sections V.C and V.E.
- Scalability testing. In this use case, hardware is not varied. The system load is increased gradually until a resource of interest is exhausted. Typical resources measured include CPU, memory, and storage bandwidth. Example use cases are presented in section V.B.
There are many other use cases, such as remote display protocol performance characterization (section V.D), product features performance evaluation, and identification of performance bottlenecks. Section VI provides relevant and related work in VDI benchmarking.
2. Workload Design
Designing a workload to represent a typical VDI user is a challenging task. Not only is it necessary to capture a representative set of applications, it also is important to keep the following design goals in mind when developing the workload.
Scalability. The workload should run on thousands of virtual desktops simultaneously. When run at such scale, operations that take a few milliseconds on a physical desktop could slow down to several seconds on hardware that is configured in a suboptimal manner. As a result, it is important to build high tolerances for operations that are sensitive to load.
Robustness. When running thousands of virtual desktops, operations can fail. These might be transient or irreversible errors. Operations that are transient and idempotent can be retried, and experience shows they usually succeed. An example of a transient error is the failure of a PDF document to open. The open operation can be retried if it fails initially. For operations that are not idempotent and cannot be reversed, the application simply is excluded from the run from that point onwards. This has the negative effect of altering the intended workload pattern. After extensive experimentation, this tack was decided upon so the workload could complete and upload the results for other operations that complete successfully. Our experience shows that only a few operations fail when run at scale, and the overall results are not altered appreciably.
Extensibility. Understanding that one set of applications is not representative of all virtual desktop deployments, we chose a very representative set of applications in our workload and enable View Planner users to extend the workload with additional applications.
Configurability. Users should be able to control the workload using various configurations, such as the application mix, the load level to apply (light, medium, heavy), and the number of iterations.
We overcame many challenges during the process of ensuring our workload was scalable, robust, extensible, and configurable. To realize these goals, we needed to automate various tasks in a robust manner.
Command-line-based automation. If the application supports command-line operations, it is very easy to automate by simply invoking the commands from the MS-DOS command shell or Microsoft Windows PowerShell.
GUI-based automation. This involves interacting with the application just like a real user, such as clicking window controls, typing text into boxes, following a series of interactive steps (wizard interaction), and so on. To do this, the automation script must be able to recognize and interact with the various controls on the screen, either by having a direct reference to those controls (Click “Button1”, Type into “Textbox2”) or by knowing their screen coordinates (Click <100,200>). The user interfaces of Microsoft Windows applications are written using a variety of GUI frameworks. Windows applications written by Microsoft extensively use the Win32 API to implement windows, buttons, text boxes, and other GUI elements. Applications written by third-party vendors often use alternative frameworks. Popular examples include the Java-based SWT used by the Eclipse IDE, or the ActionScript-based Adobe Flash. Automating applications with a Win32 API-based GUI is relatively straightforward with the AutoIT scripting language . Automating applications that use alternative frameworks for the GUI is not straightforward and requires other tools.
API based automation. This involves interacting with the application by invoking its APIs to perform specific actions. Microsoft’s COM API is a good example of this model. All Microsoft Office applications export a COM interface. Using the COM API it is possible to do almost everything that a user can do using the GUI. API-based automation is chosen over GUI-based automation when the GUI elements are very complicated and cannot be accessed directly. For example, it is very difficult in Microsoft Outlook to click on an individual mail item using direct GUI controls, let alone obtain information about the mail item, such as the identity of the sender. On the other hand, the Microsoft Outlook COM API provides a rich interface that lets you locate and open a mail item, retrieve information about the sender, receiver, attachments, and more.
The next sections provide a description of the workload composition, discuss how to avoid the workload starting at the same time, and illustrate how to perform timing measurements.
A. Workload Composition
Instead of building a monolithic workload that executes tasks in a fixed sequence, we took a building-block approach, composing each application of its constituent operations. For example, for the Microsoft Word application we identified Open, Modify, Save, and Close as the operations. This approach gave us great flexibility in sequencing the workload in any way desired. The applications and their operations are listed in Table 1.
|2||Excel||[“OPEN”, “COMPUTE”, “SAVE”, “CLOSE”, “MINIMIZE”, “MAXIMIZE”, “ENTRY”]|
|3||Word||[“OPEN”, “MODIFY”, “SAVE”, “CLOSE”, “MINIMIZE”, “MAXIMIZE”]|
|4||AdobeReader||[“OPEN”, “BROWSE”, “CLOSE”, “MINIMIZE”, “MAXIMIZE”]|
|5||IE_ApacheDoc||[“OPEN”, “BROWSE”, “CLOSE”]|
|6||Powerpoint||[“OPEN”, “RUNSLIDESHOW”, “MODIFYSLIDES”, “APPENDSLIDES”, “SAVEAS”, “CLOSE”, “MINIMIZE”, “MAXIMIZE”]|
|7||Outlook||[“OPEN”, “READ”, “RESTORE”, “CLOSE”, “MINIMIZE”, “MAXIMIZE”, “ATTACHMENT-SAVE”]|
|10||Video||[“OPEN”, “PLAY”, “CLOSE”]|
|12||Webalbum||[“OPEN”, “BROWSE”, “CLOSE”]|
Table 1: Applications with their IDs and operations
B. Avoiding Synchronized Swimming
Operations in a desktop typically happen at discrete intervals of time, often in bursts that consume many CPU cycles and memory. We do not want all desktops to execute the same sequence of operations for two reasons. It is not representative of a typical VDI deployment, and it causes resource over commitment. To avoid synchronized swimming among desktops, the execution sequence in each desktop is randomized so the desktops perform different things at any given instant of time and the load is distributed evenly distributed (Figure 2).
Figure 2. Shows the randomized execution of different operations across different desktop virtual machines. While the figure is not drawn to scale, the box width shows the relative timing of different operations.
The time taken to perform desktop operations ranges from a few milliseconds to tens of seconds. A high resolution timer is needed for operations that complete very quickly. The Query Performance Counter (QPC) call in Microsoft Windows cannot be relied upon because it uses the virtualized timestamp counter, which may not be accurate . Consequently, the timestamp counter must be fetched from the performance counter registers on the host CPU. The virtual machine monitor in VMware software enables a virtual machine to issue an rdpmc() call into the host machine to fetch hardware performance counters. To measure the latency of an operation, we simply wrap the operation with these rdpmc() calls to obtain a much more reliable measurement. Since the rdpmc() call is intercepted by the hypervisor, translated, and issued to the host, it can take more cycles than desired. Our measurements indicate this call consumes approximately 150,000 cycles on a 3 GHz processor, or approximately 50 microseconds. The operations measured take at least 50 milliseconds to complete, which means the overhead of using the rdpmc() call is less than 0.1 percent.
D. Extending the Workload with Custom Applications
As mentioned earlier, a fixed set of applications is not representative of possible virtual desktop deployments. View Planner is designed to be extensible, enabling users to plug in their applications. Users must follow the same paradigm of identifying operations constituting their application. Since the base AutoIT workload included with View Planner is compiled into an executable, users cannot plug in their code into the main workload. To help this issue, we implemented a custom application framework that uses TCP sockets to communicate between the main workload and the custom application script. Using this feature, users can add customized applications into their workload mix and identify the applications suite that best fits their VDI deployments. For example, a healthcare company can implement a health related application and mix it with typical VDI user workloads and characterize their platform for VDI deployments.
E. Workload Scalability Enhancements
Virtualized environments make effective use of hardware by allowing multiple operating system instances to run simultaneously on a single computer. This greatly improves utilization and enables economies of scale. While there are innumerable benefits to virtualization, poorly designed virtual environments can cause unpredictability in the way applications behave, primarily due to resource over commitment. The goal of the View Planner workload is to reliably detect and report poor designs in virtual desktop environments. Because the View Planner workload is technically another application running inside virtual desktops, it is susceptible to the same unpredictability and failures under load. To make the process of timing measurement and reporting reliable, we built mechanisms into the workload that ensure the workload runs to completion even under the most stressful conditions.
1) Idempotent Operations and Retries
To make the workload more manageable, we split the operations of each application into the smallest unit possible. We also designed most of these operations to be idempotent, so that failed operations can be retried without disturbing the flow of operations. Our experience indicates many operations fail due to transient load errors and many typically succeed if tried again. As a result, the software retries the operation (just as a normal user) three times before declaring a failure. While the retry mechanism has significantly improved the success rate of individual operations passing under high load, some operations might still fail.
Two options are available when an operation fails. The first option is to fail the workload because one of its constituent operations failed after three retries. Another option is to continue with the workload by ignoring further operations of the application that encountered a failure. We decided to leverage the second approach for two reasons:
- Our workload is composed of many applications and an even greater number of operations. Failing the entire workload for one or two failed applications discards all successful measurements and results in wasted time.
- Since our workload runs on multiple virtual desktops simultaneously, failures in a few desktops do not have
a significant impact on the final result if we consider the successful operations of those desktops.
By selectively pruning failed applications from a few virtual machines, we are able to handle failures at a granular level and still count the successful measurements in the final result, resulting in improved robustness and less wasted time for users. We also flag the desktops that failed the run.
2) Progress Checker (View Planner Watchdog)
In situations where extreme reliability and robustness are needed, a watchdog mechanism is needed to ensure things progress smoothly. We use this concept by employing a progress checker process, a very simple user-level process with an extremely low chance of failure. A progress file, a simple text file, keeps track of the workload progress by storing the number of operations completed. The progress checker studies workload progress by reading the progress file.
When the workload starts, it keeps an operations count in a known Microsoft Windows registry location and tries to launch the progress checker process. The workload fails to run if it cannot launch the progress checker. When the workload starts performing regular operations, it increments the count stored in the registry. The progress checker process periodically wakes up and reads the registry. It terminates the workload if progress is not detected.
The progress checker sleeps for three times the expected time taken by the longest running operation in the workload. This ensures the workload is not terminated accidentally. Finally, if the progress checker needs to terminate the workload, it does so and reports the timing measurements completed far so they can be included in the final score.
3. End-User Measurement Framework
The second part of the View Planner framework is the precise measurement of the user experience from the client side. This section describes the novel measurement technique used to measure application response time from the client side. It presents the VDI watermarking approach as well as a brief description of our plugin (client agent) implementation.
A. View Planner Workload Watermarking
In our previous approaches [1, 2], the idea was to use the virtual channel to signal the start and end of events through the display watermarking on the screen. In these techniques, the display watermarking location overlapped with applications, resulting in a chance for the watermarking update to be overlapped by application updates. This results in the particular event being missed and the workload not progressing as expected. We needed a mechanism in which the watermarking location is disjoint to application rendering. To enable better decisions on the client side,
we also required metadata encoded in the watermark. This allows us to identify the operation on the client side, as well as drive the workload from the client side to realize true VDI user simulation.
We designed our watermarking approach such that it:
- Uses a new encoding approach that works even under adverse network conditions
- Uses smart watermarking placement to ensure the watermark does not interfere with application updates
- Adapts to larger screen resolutions
To precisely determine application response time on the application side, we overlay metadata on top of the start menu button that travels with the desktop display. This metadata can be any type useful information, such as the application operation type and number of events executed. Using this metadata information timing information for the application operation can be derived on the client side. We detect the metadata and record the timestamps at the start and end of the operation. The difference between these timestamps enables the estimation of application response time. There are many challenges associated with getting accurate metadata to the client side, making it unobtrusive to the application display, and ensuring it always is visible. As a result, the location of metadata display and the codec used to encode metadata is very important.
Table 1 illustrates an example of the encoding of different application operations. The table shows the applications (Firefox, Microsoft Office applications, Adobe Reader, Web Album, and so on) that are supported with View Planner along with their operations. Each application is assigned an application ID and has a set of operations. Using this approach, we can encode a PowerPoint “Open” operation in pixel values by doing the following:
- Note that the Microsoft PowerPoint application has application ID “6”
- See that “Open” is the first operation in the list of operations
- Calculate the encoding of each operation using the following formula: (application_id * NUM_SUBOPS) + (operation_id)
- Determine the encoding of the Microsoft PowerPoint “Open” operation is (6 * 10) + 1 = 61
- The value 10 is used for NUM_SUBOPS since assume the number of operations for a particular application will not exceed 10.
After looking at one encoding example, let’s see how we send this encoded data to the client side and how robustly we can infer the code from the client side. As shown in Figure 3, we display the metadata on top of the start menu button since it does not interfere with the rendering of other applications. The watermark is composed of three lines composed of white and black pixel colors, each 20×1 pixels wide. The first line is used to denote the test ID, the event code for the current running operation. The next two markers signal the start and end of a particular operation. Using the example shown at bottom of Figure 3, we can explore how these three lines are used to monitor response time. When the workload executes the Microsoft PowerPoint “Open” operation, the workload watermarks the test ID location with the event code 61, as calculated earlier. The workload puts the sequence number for the number of operations “n” that have occurred (1 in this case) in the start location. The protocol sends the watermark codes to the client when a screen change occurs. The encoded watermarks are decoded as a Microsoft PowerPoint “Open” operation on the client side. When the client observes the start rectangle update, the “start” timestamp is recorded. When the “Open” operation is complete, the workload watermarks the test ID location again with the Microsoft PowerPoint “Open” event code (61) and a code (1000-n) in the end location. When the client observes the end rectangle update with the sum of the start ID and end ID equaling 1000, the “end” timestamp is recorded. The difference between the timestamps is recorded as the response time of “Open” operation.
Figure 3. This figure shows the VDI desktop with the Microsoft PowerPoint application running in the background. The watermarking region is near the start menu and is shown on the left. The watermarking codes below show the PowerPoint Open operation. Figure 3a shows the start of the operation and Figure 3b shows the end of the operation.
B. Measurement Plugin Implementation
This section describes the internal details of the measurement plugin. There are common requirements, such as getting rectangle updates, finding pixel color values, sending key events, and using the timer for timestamps. With these four APIs, we can extend the measurement technique to any client device, such as Apple iPads or Android-based tablets. Any mirror driver can be used to get rectangle updates. The function of a mirror driver is to provide access to the display memory and screen updates as they happen. In our implementation, we use the SVGA DevTap interface that we built and implemented as part of the SVGA driver. The software performs approximately 40 scans per second (25 ms granularity) to process incoming display updates to look for encoded watermark events.
On the client side, the plugin runs a state machine. It changes state from sending an event for the next operation, waiting for the start event, waiting for the end event, and finally waiting for the think time. In the “sending event” state, the plugin sends a key event to signal the desktop to start the next operation. The plugin records the time when it sees the end of the event. It continues to iterate through different states of the state machine until the workload “finish” event is sent from the desktop. During video play operation, the main plugin switches to the video plugin  and records frame timings. After the video playback is complete, it switches back to the main measurement plugin to measure the response time of other applications. For timing, the host RDTSC is used to read the timestamp counters and divide by the processor frequency to determine the elapsed time.
4. View Planner Architecture
This section provides an overview of the third piece of VDI simulation framework—the View Planner framework to simulate VDI workloads at scale—and discuss its design and architecture. To run and manage workloads at large scale, we designed an automated controller. The central piece in the View Planner architecture, the automated controller is the harness or appliance virtual machine that controls everything, from the management of participating desktop and client virtual machines, to starting the workload and collecting results, to providing a monitoring interface through a web user interface.
The View Planner appliance is essentially a CentOS Linux-based appliance virtual machine that interacts with many VDI server components. It also runs a web server to present a user-friendly web interface. Figure 4 shows the high-level architecture of View Planner. As shown in the diagram, the appliance interacts with a VMware vCenter View connection server or Virtual Center server to control desktop virtual machines. It also communicates with client virtual machines to initiate remote protocol connections. The appliance is responsible for starting the workload simulation
in desktop virtual machines. Upon completion, results are uploaded and stored in a database inside the appliance. Results can be viewed using the web interface or extracted from the database at any time.
The harness controller provides the necessary control logic. It runs as a Linux user-level service in the appliance and interacts with many external components. The control logic implements all needed functionality, such as:
- Keeping state and statistics
- Controlling the run and configurations
- Providing monitoring capabilities
- Interacting with the database and virtual machines
- Collecting and parsing results and reporting scores
The View Planner tool uses a robust and asynchronous remote procedure call (RPC) framework (Python Twisted) to communicate with desktop or client virtual machines. Testing shows successful connection handling for up to 4,000 virtual machines.
Figure 4. View Planner high-level architecture
A. View Planner Operations
This section discusses the View Planner flow chart and how View Planner operates the full run cycle. Once the harness is powered on, the service listens on a TCP port to serve requests from the web interface. Figure 5 shows the operation flow chart for View Planner. In the first phase, View Planner stores all server information and their credentials. Next, desktop virtual machines can be provisioned (an administrative operation) using the web interface. Following this step, the View Planner user defines the workload profile (applications to run) and the run profile (the number of users to simulate), and so on. Next, the run profile is executed. After the run completes, results are uploaded to the database and can be analyzed. This process can be repeated with a new workload and run profile.
Figure 5. View Planner operational flow chart
Let’s discuss in detail what happens in the background, starting when a VDI evaluator executes a particular run profile until the final results are uploaded. This is illustrated in Figure 6 for three different modes of View Planner:
- In “local” mode, the workload executes locally without clients connected.
- In “remote” mode, workload execution and measurement are performed with a remote client connected to one desktop (one-to-one).
- In “passive” mode, one client can connect to multiple desktops (one-to-many).
In these modes, View Planner first resets the test for the previous run, and powers off virtual machines. At this stage, the harness is initialized and ready to execute the new run profile. Next, a prefix match stage finds participating virtual machines based on the desktop or client prefix provided in the run profile. View Planner powers on these participating virtual machines at a staggered rate that is controlled by a configurable parameter. VDI administrators needing to investigate bootstorm issues can increase this value to a maximum, causing View Planner to issue as many power on operations as possible every minute.
Once the desktops are powered on, they register their IP addresses. Upon meeting a particular threshold, View Planner declares a certain number of desktop or client virtual machines are available for the run. At this stage, View Planner waits for the ramp up time for all virtual machines to settle. Next, it obtains the IP address for each desktop and client virtual machines and uses these IP addresses to initiate the run.
Figure 6. View Planner operations for different modes
In remote and passive mode, View Planner executes an extra phase in which it logs off desktop virtual machines (Figure 6). After the logoff storm, View Planner waits for the virtual machines to settle and CPU usage to return to normal. After this stage, the appliance sends a command to the desktops to execute the workload in local mode. For remote and passive modes, commands are sent to clients to execute logon scripts. This is logon storm is similar to what happens when employees arrive at the office in the morning and log into their desktops. The login storm is an “admin” operation and can be controlled by a configurable parameter. Once the connections are established, they update their status to the harness and View Planner records how many workload runs have started. After the run completes, the desktops upload the results. For remote mode, View Planner finds the matching clients and asks the clients to upload the results. These results are inserted into the database. The View Planner passive mode is good to use when VDI evaluators do not have sufficient hardware to host client virtual machines.
B. Handling Rogue Desktop and Client Virtual Machines
When simulating a large user run, there may be situations in which a few desktops are stuck in the customization state or are unable to obtain an IP address. In this case, a timer runs all the time. After every registration, reset the timer is reset and the software waits for 30 minutes. If more registrations are not seen after 30 minutes, and the required threshold is not met, we kick off the logic to find the bad virtual machines and reset them. After obtaining the IP address of each registered desktop, we use the virtual machine name to IP address mapping to find the rogue virtual machines. Once the rogue virtual machines are reset, they register their IP addresses and, on meeting the threshold, the run starts If the threshold is still not met, the timer is reset a few times and the
run is started with the registered number of virtual machines.
C. Scoring Methodology
An invocation of the View Planner workload in a single virtual machine provides hundreds of latency events. As a result, when scaling to thousands of desktops virtual machines using View Planner, the number of latency measurements for different operations grows very large. Hence, a robust and efficient way to understand and analyze the different operational times collected from the numerous desktop virtual machines is required. To better analyze these operations, we divided the important operations into two buckets. Group A consists of interactive operations, Group B consists of I/O intensive operations. Group A and Group B operations are used to define the quality of service (QoS), while the remaining operations are used to generate additional load. For a benchmark run to be valid, both Group A and Group B need to meet their QoS requirements. QoS is defined in terms of a threshold time limit, Th. For each group, 95 percent of the operations must complete within Th. Limits are set based on extensive experimental results spanning numerous hardware and software configurations. The View Planner benchmark “score” is the number of users (virtual machines) used in a run that pass QoS.
5. Results and Case Studies
This section presents workload characterization results, describes several View Planner use cases, and presents associated results.
For most of the experiments presented in subsequent sections, all of the applications supported in View Planner ran with 20 seconds of think time. Think time is used to simulate the random pause when a VDI user is browsing a page or performing another task. For the 95 percent Group A threshold (Th), we selected a response time of 1.5 seconds based on user response time and statistical analysis.
A. Workload Characterization
We first characterized the workload based on how many operations View Planner executed and how randomly the operations were distributed across different iterations in different desktop virtual machines.
Figure 7 shows the average number of times each operation is executed per virtual machine. The *-Open and *-Close operations are singleton operations and occur at low frequency, while interactive operations (AdobeReader-Browse, IE_ApacheDoc-Browse, Word-Modify, and so on) typically are executed more than 10 times. This is very close to real-world user behavior: the document is opened and closed once, with many browse and edit operations performed while the document is open.
Figure 7. Average event count for each application operation
As discussed in Section III.A, using the proposed watermarking technique the response time of applications from the client side can be measured. The response time chart in Figure 8 shows the application latency seen for different operations with the RDP protocol in LAN conditions. As seen from the graph, most of the Open and Save operations take more than two seconds, as they are CPU and I/O intensive operations, while most interactive operations, such as Browse and maximize operations, take less than a second.
Figure 8. Average response time for each application operation measured from the client side
Figure 9 shows virtual machine execution of operations over each of the seven iterations. The y-axis corresponds to the ID of a particular virtual machine and the x-axis is time. The colors represent different iterations. The data plotted is from a 104 virtual machine run on an 8-core machine. Due to heavy load on the system, there is a skew between the iteration start and stop times across virtual machines, resulting in iteration overlap in the system. We can see that different virtual machines start and finish at different times due to randomized load distribution on the physical host.
Figure 9. Shows the iteration overlap and how applications run in each virtual machine in 104 virtual machine run.
B. Finding the Score (Number of Supported Users)
One of most important use cases of View Planner is determining how many users can be supported on a particular platform. We used View Planner on a single host with the VMware® vSphere® 5 platform and Fibre Channel (FC) storage to determine the maximum number of supported users. Detailed results are shown in Table 2. The simulation started with 96 users. We observed that the Group A 95th percentile was 0.82 seconds, which was less than the threshold value of 1.5 seconds. We systematically increased the number of simulated desktops and looked at the QoS score. When the number of users was increased to 124, we could no longer satisfy the latency QoS threshold. Consequently, this particular configuration can only support approximately 120 users.
|TOTAL # VMs||GROUP A 95% (SEC)||QOS STATUS|
Table 2: QoS score for different numbers of users
C. Comparing Different Hardware Configurations
View Planner can be used to compare the performance of different platforms:
- Storage protocols, such as the Network File System (NFS), Fibre Channel, and Internet SCSI (iSCSI) protocols
- Processor architectures, such as Intel Nehalem, Intel Westmere, and so on
- Hardware platform configuration settings, such as CPU frequency settings, memory settings, and so on
To demonstrate one such use case, we evaluated the memory over-commitment feature in VMware vSphere . Table 3 shows the 95th percentile QoS threshold of View Planner with different percentages of memory over-commitment. The results show that even with 200 percent memory over-commitment, the system passed the QoS metric of 1.5 seconds.
|Virtual Machines/Host||Logical Mem||Physical Mem||%MemOvercommit||95th Percentile Latency|
Table 3: QoS with different memory over-commitment settings
D. Comparing Different Display Protocols
To compare different display protocols, we need to precisely characterize the response time of application operations. This is a capability of our watermarking technique. We simulated different network conditions—using LAN, WAN, and extreme WAN (very low bandwidth, high latency)—to see how the response time increased for different display protocols. Figure 10 shows the normalized response time chart comparing the PC-over-IP (PCoIP), PortICA, and RDP display protocols for three network conditions. These results show that PCoIP provides much better response time in all network conditions compared to other protocols. View Planner enables this kind of study and provides a platform to characterize the “true” end-user experience.
Figure 10. The normalized View Planner response time for diferent display protocols for different network conditions
E. Performance Characterization
View Planner can be used for performance characterization. To illustrate one study, we investigated the differing numbers of users a given platform could support using different versions of VMware View™. Figure 11 shows the 95th percentile response time for VMware View 4.5 and 5.0. We set threshold of 1.5 seconds and required the 95th percentile response time to fall below this threshold. For VMware View 4.5, the response time threshold crossed this threshold at 12 virtual machines (or 12 users) per physical CPU core. Hence, we can support between 11 to 12 users per core on VMware View 4.5. Looking at VMware View 5 result, we see it can easily support 14.5 Windows 7 virtual machines per core. Using View Planner, we were able to characterize the number of users that can be supported on a physical CPU core, and compare two versions of a product to analyze performance improvements (30 percent better consolidation in VMware View 5 compared to VMware View 4.5).
Figure 11. 95th percentile response time for VMware View 4.5 and VMware View 5 as the number of virtual machines per CPU core was increased.
F. Performance Optimizations
View Planner can help users find bottlenecks in the virtualization layer during a scalability study and apply performance optimizations. Using this tool, we can pinpoint the performance bottleneck at a particular component and try to fix the problem. For example, in a particular run, we found that application latencies (user experience) were poor. Upon further analysis, we traced the problem to the storage array, where disk latency was quite high and available I/O bandwidth was fully saturated. We also can study the performance of many protocol features and understand their impact on overall end-user experience. In addition, we identified many performance issues in the CPU and network (downlink bandwidth) usage in various applications during our protocol performance analysis, highlighting the significant potential of this workload in VDI environments.
6. Related Work
There are a number of companies providing VDI test solutions. Some, such as View Planner, focus on the entire VDI deployment [16, 17], while others offer limited scope and focus on a specific aspect of a VDI solution, such as storage . At a high level, the functionality provided by these solutions might appear similar to View Planner at first glance. However, these solutions do not leverage watermarking techniques to accurately gauge the operation latency that is experienced by an end-user. Instead, they rely on “out-of-band” techniques to estimate remote response. For instance, out-out-band techniques include explicitly communicating event start and stop events to the client using separate artificially created events. In this situation, the start and stop events, unlike our watermarking technique, do not piggyback on the remote display channel and may not accurately reflect the operation latency observed by a user. Other approaches involve network layer taps to attempt to detect the initiation and completion of operations. Not only are these approaches potentially inaccurate, they introduce significant complexities that limit portability between operating systems and VDI solutions.
The out-of-band signaling exploited by other VDI test solutions can lead to significant inaccuracies in the results, which in turn can lead to misleading conclusions about permissible consolidation ratios, protocol comparisons, and result in invalid analysis of VDI deployments. Other approaches include analyzing screen updates and attempting to automatically detect pertinent events (typically used for comparative performance analysis) , and inferring remote latencies by network monitoring and slow-motion benchmarking [20, 21]. While these approaches work for a single VDI user on an unloaded system, they can significantly perturb (and are perturbed by) the behavior of VDI protocols under load, making them unsuitable for the robust analysis of realistic VDI workloads at scale.
Finally, a variety of other techniques have been developed for latency analysis and to look for pauses due to events such as garbage collection . These approaches assume (and depend on the fact) the user is running on a local desktop.
This paper presented View Planner, a next-generation workload generator and performance characterization tool for large-scale VDI deployments. It supports both types of operations (user and administrative operations) and is designed to be configurable to allow users to accurately represent their particular VDI deployment. A watermarking measurement technique was described that can be used in a novel manner to precisely characterize application response time from the client side. For this technique, watermarks are sent with the display screen as notifications and are piggybacked on the existing display. The detailed architecture of View Planner was described, as well as challenges in building the representative VDI workload, and scalability features. Workload characterization techniques illustrated how View Planner can be used to perform important analysis, such as finding the number of supported users on a given platform, evaluation of memory over-commitment, and identifying performance bottlenecks. Using View Planner, IT administrators can easily perform platform characterization, determine user consolidation, perform necessary capacity planning, undertake performance optimizations, and a variety of other important analyses. We believe View Planner can help VDI administrators to perform scalability studies of nearly real-world VDI deployments and gather useful information about their deployments.
We would like to thank Ken Barr for his comments and feedback on the early drafts of this paper, as well as other reviewers. Finally, we thank everyone in the VDI performance team for their direct or indirect contributions to the View Planner tool.
Any further information and details about the View Planner tool can be achieved by sending an email to email@example.com
- B. Agrawal, L. Spracklen, S. Satnur, R. Bidarkar, “VMware View 5.0 Performance and Best Practices”, VMware White Paper, 2011.
- L. Spracklen, B. Agrawal, R. Bidarkar, H. Sivaraman, “Comprehensive User Experience Monitoring”, in VMware Tech Journal, March 2012.
- VMware Inc., Timekeeping in VMware Virtual Machines, http://www.vmware.com/vmtn/resources/238
- Python Twisted Framework, http://twistedmatrix.com/trac/
- Google web toolkit (GWT), http://code.google.com/webtoolkit/
- Sturdevant, Cameron. VMware View 4.5 is a VDI pacesetter, eWeek Vol. 27, no. 20, pp. 16-18. 15 Nov 2010.
- VMware View: Desktop Virtualization and Desktop Management www.vmware.com/products/view/
- Citrix Inc., Citrix XenDesktop 5.0. http://www.citrix.com/English/ps2/products/feature.asp?contentID=2300341
- J. Nieh, S. J. Yang, and N. Novik, “Measuring Thin-Client Performance Using Slow-Motion Benchmarking”, ACM Trans. Comp. Sys., 21:87–115, Feb. 2003.
- S. Yang, J. Nieh, M. Selsky, N. Tiwari, “The Performance of Remote Display Mechanisms for Thin-Client Computing”, Proc. of the USENIX Annual Technical Conference, 2002.
- VI SDK. https://www.vmware.com/support/developer/vc-sdk/
- AutoIT documentation http://www.autoitscript.com/autoit3/docs/
- Hwanju Kim, Hyeontaek Lim, Jinkyu Jeong, Heeseung Jo, Joonwon Lee, Task-aware virtual machine scheduling for I/O performance., Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, March 11-13, 2009, Washington, DC, USA
- Micah Dowty, Jeremy Sugerman, GPU virtualization on VMware’s hosted I/O architecture, ACM SIGOPS Operating Systems Review, v.43 n.3, July 2009
- VMware White Paper. “Understanding Memory Resource Management in VMware® ESX™ Server “, (2010).
- LoginVSI—Admin Guide of Login Consultants “Virtual Session Indexer” 3.0, http://www.loginvsi.com/en/admin-guide
- Scapa Test and Performance Platform, http://www.scapatech.com/wpcontent/uploads/2011/04/ScapaTPP_VDI_0411_web.pdf
- VDI-IOmark, http://vdi-iomark.org/content/resources
- N. Zeldovich and R. Chandra, “Interactive performance measurement with VNCplay”, Proceedings of the USENIX Annual Technical Conference, 2005.
- A. Lai, “On the Performance of Wide-Area Thin-Client Computing”, ACM Transactions on Computer Systems, May 2006.
- J. Nieh, S. J. Yang, N. Novik, “Measuring Thin-Client Performance Using Slow-Motion Benchmarking”, ACM Transactions on Computer Systems, February 2003.
- A. Adamoli, M. Jovic and M. Hauswirth, “LagAlyzer: A latency profile analysis and visualization tool”, International Symposium on Performance Analysis of Systems and Software, 2010.