Scaling View Management Operations: Analysis and Optimizations

Oswald Chen
VMware Inc.
ochen@vmware.com

Michael Pate
VMware Inc.
mpate@vmware.com

Banit Agrawal
VMware Inc.
banit@vmware.com

Michael Spradlin
VMware Inc.
mspradlin@vmware.com

Soumya Mishra
VMware Inc.
soumyamishra@vmware.com

Kenny To*
VMware Inc.

Dhiraj Parashar
VMware Inc.
dparashar@vmware.com

Abstract

Scaling VMware View is not an easy task because it involves many parts and dependencies. These include many View Clients connecting over a LAN or WAN to a View Security Server to set up a secure tunnel with the View Connection Server that manages desktop virtual machines with the View Agent installed. All of these components, most importantly, the desktop virtual machines, run on VMware vSphere infrastructure which in turn depends on hardware, storage, and network capabilities. Any of these components could be a bottleneck when scaling to thousands of users.

This paper describes the methodical process we used to learn about the system and make improvements. We designed test cases with metrics relevant to the customer by looking at user stories, added features to existing test tools for more realistic simulations, implemented profiling and reporting infrastructure to aid analysis, ran tests on hardware representative of real world deployments, analyzed the resulting data, and finally made improvements in both code and documentation as part of the VMware Horizon View 5.2 release.

1. Introduction

VMware View is an enterprise solution for replacing traditional physical desktops with virtual Windows desktops. The virtual desktops are deployed in the data center and managed as a service, which enables companies to increase security, ease desktop management, and decrease operational costs while end users are able to access their desktops using tablets, smartphones, thin clients and personal computers from the office, at home, or while traveling.

There are two categories of people dealing with View: the administrators managing View and the end users interacting with their desktops. Administrators use the View Connection Server, a connection and management server, to provision desktops as virtual machines. The View Connection Server itself uses VMware vSphere infrastructure to provision these virtual machines from a master image. End users authenticate to the View Connection server to get access details for their virtual desktop and interact with it over a remote display protocol. Over time, administrators also use the View Connection Server to perform management operations such as recomposing desktop images to apply system updates, provisioning additional desktops to accommodate new users, and migrating desktops between data stores.

As customers adopt View, they are expecting to support higher and higher numbers of seats. To achieve this, the entire solution must perform well and maintain stability, reliability, and serviceability while scaling the deployment size. End users expect to log on and access their desktops quickly, and use applications on them with responsiveness similar to their experience on physical desktops. Administrators need to manage the desktop virtual machines and infrastructure backing them such as desktop images, resource pools, and data stores, all while staying within a certain maintenance window and easily troubleshooting issues when they occur. In this paper, we show how we ensure these needs are met by running actual and simulated View deployments at scale and making product or documentation improvements where necessary.

A deployment of the View solution consists of several View components, vSphere components, storage, networking, and other infrastructure components required for the desktops. Hence making the solution scale and verifying the success is a challenge as any one component can fail or be a bottleneck. For example, when the end user tries to connect to their desktop, the View Client first goes through a LAN, WAN, or mobile network to the View Security Server which resides in a DMZ and establishes secure, authenticated channels with the rest of the solution. Then the View Client creates such a channel to the View Connection Server to request desktop access details. To satisfy a desktop connection request, the desktop virtual machine may need to be started on demand, which is sensitive to current storage I/O load. If the desktop is powered off, the desktop is powered on. As the desktop starts, it acquires an IP address through DHCP, synchronizes with Active Directory, and starts the View Agent, a service running on the desktop for assisting with management. Once the View Agent contacts the View Connection Server, it starts a Windows session. Then the View Client can finally connect, over another secure channel through the View Security Server, to the desktop.
* Kenny was a member of View Large scale group when he was with VMware, Inc.

We approached this scalability challenge methodically by going through repeated iterations of measurement, analysis, and improvement. We needed to find the appropriate measurements in large scale deployments to ensure that they’re applicable to our customers in the real world. Therefore we used user stories to drive test case selection and test on real hardware. For example, we determined that at the start of a work day or shift, users typically all log on at nearly the same time except for a small percentage who are early or late. This is modeled well as a normal distribution over a time window, so we built tools to simulate the varying frequency of log on attempts when driving test clients.

VMware View Planner [1] is a View capacity planning tool including a test harness which we adapted for our tests. To enable automatic collection of measurements, we started a profiling framework for logging the amount of time spent in sensitive code paths. Then we added tools to View Planner for collecting logs and other artifacts generated during a test run. After collecting everything, we fed it into reporting infrastructure we built which consumes profiling data from the logs to compute basic aggregate statistics, generate plots, and present them as reports through a web interface. The reports contain the data we need to identify potential product, deployment, or configuration issues that cause performance degradation or decreased reliability. Once we identified the issues, we started making improvements.
Through our efforts, we have accomplished the following:

  • Improved provisioning speed by a factor of 2.
  • Improved data store rebalancing speed by a factor of 4-10.
  • Added product support for larger desktop pools by allowing them to span multiple networks.
  • Added a product feature to perform mass maintenance operations in a rolling fashion, allowing users to continue using their desktops.
  • Added reusable features to View Planner for automatic log collection and simulating normal distributions.
  • Started an extensible profiling framework for View and infrastructure to generate reports from its output. In this paper we will describe the test cases we chose, tools and test beds we created, our findings from testing, improvements we’ve made to View, and future work.

2. Analysis and Scope Optimization

2.1 Resource Dedication and Team Empowerment
One of the challenges that View faced in scaling up was to secure dedicated resources from both human and hardware perspectives. Traditionally View test beds and engineers were dedicated to a small set of specific test cases during the product release cycle. When the target release was shipped, these resources moved on to other areas. View Scale team was founded to answer that challenge. It consisted of dedicated human and hardware resources that spanned beyond regular release cadences and were responsible for scalability challenges in general.

View Scale team had dedicated storage/compute/network resources that were capable of scaling up to 10K View desktops. And its team members included Product Manager, View Architect, System Architect, as well as software engineers from View R&D, View System Test, and Performance teams. The composition of the team had a wide range of domain expertise and authority which in-effect empowered the team for effective decision making and enabled high level of agility.

2.2 User story Driven Test Cases and Scalability Targets

The team’s first responsibility was to work with product management and developed a set of real world user stories that were highly critical for our customers. These user stories were targeted for both View administrators and end users, including:

  • Management operations for View Administrators: Provisioning, Refresh, Recompose, and Rebalance at scale.
  • End user activities: Log on, log off storms; as well as workload run after logon and power/maintenance policies after log off.
    • As part of user story definition, the team also defined a set of real world criteria against which the scale tests were measured. Including:
    • 8 hour maintenance window for management operations.
    • Maximum acceptable error rate in conjunction with automated error recovery.
    • Log on/off storms that modeled normal distribution in a 60 minute window for various desktop power policies.
    • Maximum acceptable end user log-on time during log-on storms.
    • Guest OS workload definitions for power, knowledge, and task workers.

2.3 Test Tools Development
Once the test cases and their performance and scale criteria were defined, the team needed to establish ways to drive the test execution, quantify the test output, and enable effective troubleshooting and analysis. The test frameworks chosen to fulfill those requirements were View Load Simulator and VMware View Planner.
View Load Simulator (VLS) was used for simulated log on tests. VLS consisted of a set of JMeter clients (which simulated View Clients) that were capable of modeling normally distributed logons, and a set of simulated ESX hosts with the ability to mimic View agent behaviors. For VLS driven logon tests, the View components under test were View Security servers and Connection servers. Use of VLS allowed execution of a scaled logon storm with relatively small hardware cost. To conduct real end-to-end test with real hardware and virtual machines, View Planner was used. View Planner 2.1 [1], already had the ability to work with View deployments to perform uniformed distributed log on storms and guest OS workload generation. The additional enhancements added to View and View Planner to enable View scale tests included:

  • View Profile logging framework to enable extractable profile log entries.
  • Automated and centralized DCT collection process.
  • Normally distributed log-on and log-off storm simulation.
  • Centralized profile entry parsing and uploading to report database
  • Graphical JSON/OpenFlash based reporting framework.

With these tooling frameworks and their additional features for scalability test, the team was able to execute real world test scenarios and quantify the results, as well as performed very granular level of trouble shooting and analysis to zero in on potential performance issues at scale.

2.4 Incremental Scale Up, Exploration, Findings, Improvement, and Validation
We adapted an incremental approach on scaling up to 10K. Initially tests were done at lower scale so that we could iron out any test bed design and tooling issues; as well as establishing a performance base line. The same tests were later re-run at higher scale levels (with 2K desktops increment). For each scale level the following aspects were examined:

  • Exploring and tuning system configurations (for example, increased VirtualCenter and View Composer concurrency limits during VM provisioning) to see whether it can yield better performance.
  • Generating and analyzing profiling reports to identify scalability and performance bottleneck in the system.
  • Root caused scalability issues were fixed and reviewed, and were validated in the next iteration and its performance improvement was quantified and documented.
  • Wider scope product improvements were also evaluated, implemented, and validated on each scalability level. Product improvements that yielded substantial benefit were communicated to the release management team to be included in upcoming release.

The ultimate goal of View Scale effort was to provide our customers with best practices and reference architectures on scaling View based on the team’s findings and recommendations. The means of documentations included Knowledge Based articles, View Architecture Planning Guide, as well as Tech Marketing reference architectures and via technical blogs.

3. Experimental Setup

In this section, we talk about the experimental setup and configurations used for scaling 10k real desktop VMs using one instance of vCenter. We also describe various large scale design aspects which were taken into consideration while designing the test-bed. We followed the best practices to keep the replica on the SSD disk which will absorb most of the read IOPS and avoid high IOPS requirements from the spinning drives.
The experimental setup for virtual desktops and infrastructure VMs is shown in the figure below:

Figure 1. Experimental setup for virtual desktops and Infrastructure VMs.

Figure 1. Experimental setup for virtual desktops and Infrastructure VMs.

As shown in the test bed diagram in Figure 1, we use five clusters with several hosts (6 to 12) in each cluster to scale to 10k desktops. In each cluster, we deploy a pool of 2k desktops and the replica is kept on the SSD drive as shown in Figure 2.

Figure 2. Showing a pictorial representation of the replica kept on SSD used to make the linked clones.

Figure 2. Showing a pictorial representation of the replica kept on SSD used to make
the linked clones.

To provision one such 2k desktop pools, we used 360 15k hard drives which are 300GB each. Since one hard drive provides about 200 IOPS per hard drive, with 360 hard drives, we were getting about 72000 IOPS, so this translates to about 36 IOPS per desktop VM which is sufficient even for power user. We also used the same storage array to host the client VMs (these are used to connect to desktop VMs with PCoIP display protocols), however amount of hardware/storage required for clients was significantly less because we only needed 250 clients to connect to 2k desktops, i.e, one client was used to connect to 8 desktops. On the infrastructure side, we used five View connection servers, 5 View security servers for tunneled connections tests, one instance of vCenter server for hosting all 10k desktops.

3.1 View Infrastructure and Configuration
Now that we have described the hardware test bed, we present the configurations used in the software layers for the 10k desktop VMs tests. We optimized the desktop image as per the Windows 7 optimizations and best practices guide [8] where we disabled some group policy settings, disabled some services, etc. The Configurations used for View connection sever is shown in the table below:

pate-3

3.2 Logon/Logoff storm
To support the Logon/off scenario, new features were added to ViewPlanner like normal distribution simulation which was used to mimic a real world scenario where users would logon to their desktops at varied time following a bell curve pattern. Following were few of these changes:

  • Normal distribution: To perform normal distribution while Logon/off test and not the usual Uniform distribution, we devise a mechanism where the user sleeps for a specific random time which is decided by the normal distribution. Log-on/off period is defined as +/- 3 standard deviation window. For example, a 60 minute log-on period window has a 10 min SD. To translate this to logon/logoff rate:
    • 95% of log-on/off happens within +/- 2 standard deviation window.
    • 68% of log-on/off happens within +/- 1 standard deviation window.
    • Peak log-on/off rate = 0.4 * (X / T) where X is total log-on/off attempts, T is the time for 1 SD window.
  •   View client logoff: We run log off test very similar to what an end-user would in real-life where he/she closes the view client windows. In View Planner, we add the functionality to do logoffs after the workload is completed. First, Clients ignore random sleeps and logon attempt limits. This ensures all desktops are connected. Then, desktops wait until all are connected, then sleep randomly depending upon the ramp-up time. After that, desktops complete their workloads as normal. After a desktop completes, the harness informs the client of the corresponding user. The client closes the view client window to trigger a view logoff.
  • Powered off and suspended desktop interoperability: We mixed in a certain percentage of powered-off and suspended desktops in our log on storm tests to mimic real world scenarios.
  • Limiting View re-connection attempts: Allowed number of view client logon attempts until giving up. A set number of view client connection attempts can be set, after which the client gives up.
  • Logging connection time information: At the end of the test, the results file is appended with information about how many attempts each user had along with how long these took. This occurs for both View Planner remote and passive modes.
  • Adding an upper bound to non-randomness based ramp up time sleeps: We add an upper bound on the ramp-up time to make the total run cycle faster.

3.3 View Profiling framework and Reporting infrastructure
View profiling framework uses well defined prefixes and syntax (operation name, start/end time, and object Id) to create log entries on View Connection server logs for highly performance critical operations. These log entries are included in DCT collection process and can be parsed and uploaded to the MySQL database for performance analysis.

Reporting Infrastructure was an integral part of the entire process, as it provided a great insight into the details of all the operations including Provisioning, Recompose, Refresh, Rebalance and Logon/off. The reporting mechanism consisted of three steps, gathering logs, parsing them and extracting required information into a relational MySQL database and using that information and generating meaningful reports. It was made sure that before starting any test, logs were using View Planner. The logs collection part collected ESXTOP data, View Planner logs and product logs which included View Connection server, VC, View Composer server, Security server Agent desktops and Clients machines.

ESXTOP logs consisted information regarding the hosts used for a particular test, the CPU usage, memory usage, IOPS usage, IO latency, network throughput, and also guest level stats. View Planner logs hold the data regarding client registration to VP harness, connection to the desktop, initiation and termination of workload and logoff or disconnect. The View Planner log and the Agent/Client logs were useful for Logon/off test report generation. For collection of Agent/Client logs, parameters were inserted in View Planner to control the percentage of DCT (Agent/Client logs) required to be collected out of the entire setup and delay between triggering collection command to each Desktop.

Collection of all the above mentioned logs was done in a separate harness made for log storage. From there, the logs were utilized by scripts which parsed them and extracted the useful information into database tables. Report generation could be triggered both using View Planner Harness UI and console.

Figure 3. Screenshot of a sample report used for analysis of the management operation under test.

Figure 3. Screenshot of a sample report used for analysis of the management operation under test.

Figure 3 gives an idea of how a report looks like with all the parameters which it measures like the clone creation, linked clone creation/deletion, refitting operations etc. Around 96 such parameters are displayed in the report for the Standard broker and around 35 for replica brokers. The “chart”, “histogram”, and “concurrency” links open up different plots of the data, and we’ll have some significant examples of them later in this paper.

4. Findings

4.1 Certificate verification and log on time
Initial results from the end to end log on tests of 2000 users showed that each user took about 30 seconds to complete their log on. Most of the remote XML API calls with the connection server completed in less than 1 second, with the exception of the combined + calls and the call. Since performs the primary work in setting up the tunnel and brokering the desktop connection, it was not surprising that it would take some time. However, and are trivial operations, and profiling data from the connection server confirmed that they were executing in under 1 second. Yet, profiling data from the clients indicated a total time of over 16 seconds. Armed with this knowledge, we did some debugging and found that before actually executing the calls, in the process of verifying the connection server’s certificate, the client was attempting to contact the Windows Update server [2]. Since our test bed was in an isolated lab with no external internet access, this attempt was timing out and adding an additional 15 seconds to the operation time. While this deployment scenario is not typical, some customers with special security considerations may still face the same issue, so we published a KB article describing the problem and suggested solutions [3].

4.2 I/O workloads and power operations
While the speeds of the remaining XML API calls were satisfactory, the call was still the largest contributor to log on time, so we investigated it more to determine whether anything could be improved. We found that the bulk of the call was spent waiting for the desktop agent to finish processing a “start session” command. The “start session” command is responsible for allocating a Windows and display protocol session for the user. Since much of this is handled by 3rd party code, there wasn’t anything we could change directly. However, we observed a dramatic difference in the “start session” times depending on whether the user was running any desktop workload after logging on. With no workload, “start session” consistently completed in 2-3 seconds:

Figure 4. Graph showing the “start session” completion in a short duration of time for NO workload.

Figure 4. Graph showing the “start session” completion in a short duration of time for NO workload.

With a medium sized workload, “start session” operations completed in 2-3 seconds at first, but towards the peak of the log on window they started varying dramatically in length anywhere between 3 seconds and the 60 second timeout:

Figure 5. Graph showing variation in the “start session” completion time for middle sized workloads.

Figure 5. Graph showing variation in the “start session” completion time for middle sized workloads.

We analyzed the ESXTOP data from the test run and found a correlation between the longer “start session” times and the times of peak IOPS activity. This suggests, as we have long suspected, that IOPS capacity of data stores is a significant factor in View desktop performance [4].

As we moved on to larger scale tests and different parameters, we uncovered some bugs and interesting behaviors at scale. One that stood out is the impact of the power operations concurrency limit [5] on log on storms where the connection server powers on some VMs on demand. The primary symptom was that at scales above 4000 users, some log on attempts would fail, and the connection server logs would indicate that it had exceeded operation limits. After collecting a few data points, we realized that this was the concurrency limits feature of the product working as intended, but with a default setting that was too low for the given scale. We worked out some exact equations describing the relationships between total scale, log on window, and the amount of time it takes to fully power on a VM. These equations turned out to be relatively complicated due to involving shifts of the cumulative distribution function for the normal distribution. By focusing only on peak power on rates, which is the worst case in the equations, we arrived at a much simpler, though approximate, equation:

concurrentPowerOperations =
desktopPowerOnTime * peakPowerOnRate

We published this equation and its application to a common set of parameters to a KB article [6]. The default power operations concurrency limit of 50 should support a peak desktop power on rate of about 16 desktops per minute, which corresponds to 2000 users logging on over 60 minutes with 20% of their desktops powered off.

4.3 Provisioning operation
For pool management operations such as provisioning desktops, we focused on the end to end time for completing all operations as that would ultimately determine the maintenance window. We were especially interested in experimenting with different concurrency limits because we suspected that the default limit was overly conservative and that we could easily decrease the end to end time by increasing the concurrency limit. The following table shows the results sorted by the time per 512 desktops provisioned.

pate-7

This was quite unexpected. Even with dramatically higher concurrency limit settings, the overall throughput was virtually the same as at the conservative default setting of 8. Increasing the setting to 100 noticeably reduced the performance compared to all previous settings. We drilled down into more detailed profiling information from all the runs and found that the provisioning time for many desktops was dominated not by the clone operation itself, but rather the preparation step of creating a cloning specification. Unfortunately, our instrumented profiling still wasn’t yet detailed enough to figure out the root cause, but through code inspection we found suspicious usage of a lock. In particular, the lock was using a strict LIFO policy which led to a distinct pattern of earlier operations taking longer to finish.

Figure 6. Graph showing Provision operation taking longer to finish due to LIFO policy used for scheduling the cloning operations.

Figure 6. Graph showing Provision operation taking longer to finish due to LIFO policy used for scheduling the cloning operations.

The operations getting starved the most were taking up to 3500 seconds to complete. With a trivial change to use a fairer queue policy on the lock, operation length became much more consistent at 800-1000 seconds each:

Figure 7. Improvements in the Provision time due to usage of a fairer queue policy on the lock for scheduling the cloning operations.

Figure 7. Improvements in the Provision time due to usage of a fairer queue policy on the
lock for scheduling the cloning operations.

This was noticeably better, but 1000 seconds is still a tremendous overhead compared to the primary work of cloning the VM which typically only took 60-120 seconds. That was unacceptable, so we instrumented more profiling to confirm that lock contention was the root cause, then designed and developed an improved preparation step that quickly releases the lock. The improvement has since been shipped in Horizon View 5.2 and is described in more detail further down in this paper.

After making that improvement, we discovered still a couple more bottlenecks.

  • Provisioning speed decreasing with increasing pool size). This was caused by inefficiencies in allocating an appropriate name for the next VM to be created. This resulted from the fact that the VMs in a pool are ordered lexicographically, while the code expected numerical ordering.
Figure 8. Graph showing the decreasing provisioning concurrency with increasing pool size.

Figure 8. Graph showing the decreasing provisioning concurrency with increasing pool size.

  • At very large scale (8K → 10K) the provisioning speed decreases significantly. This was primarily caused by the decryption of all VMs’ private keys to determine if something has changed. This was optimized by just checking the public keys and the charts are shown in Figure 9:
Figure 9. Difference in the concurrency fluctuation of 2K provisioning at 8K scale before and after code change.

Figure 9. Difference in the concurrency fluctuation of 2K provisioning at 8K scale before and after code change.


4.4 Maintenance operations

The end to end time for recompose operations did show some improvement by increasing the concurrency limit from its default setting of 12:

Figure 8. Graph showing the decreasing provisioning concurrency with increasing pool size.

Figure 8. Graph showing the decreasing provisioning concurrency with increasing pool size.

  •  At very large scale (8K → 10K) the provisioning speed decreases significantly. This was primarily caused by the decryption of all VMs’ private keys to determine if something has changed. This was optimized by just checking the public keys and the charts are shown in Figure 9:
Figure 9. Difference in the concurrency fluctuation of 2K provisioning at 8K scale before and after code change.

Figure 9. Difference in the concurrency fluctuation of 2K provisioning at 8K scale before and after code change.


4.4 Maintenance operations

The end to end time for recompose operations did show some improvement by increasing the concurrency limit from its default setting of 12:

pate-14

The throughput improved rapidly, but we started getting diminishing returns beyond a setting of 40. Additionally, we found that even higher concurrency settings put us at risk of overloading and eventually crashing the vCenter server. With a few more test runs on different test beds, we determined that the ideal concurrency setting for maximizing throughput without risking instability is highly dependent on the capacity of the infrastructure. However, deriving an equation to express this relation is an intractable problem since it involves so many variables: capacities of storage, network, CPU, and memory; vCenter and ESX versions; and configurations of vSphere clusters and concurrency limits, to name a few. Instead of documenting any equation, we need to make the system automatically discover capabilities and tune itself accordingly.

5. Improvements

5.1 Provisioning and Rebalance throughput/ Datastore selection improvement
5.1.1 Problem
Provisioning and rebalance operations require selection of an appropriate datastore for placement of each individual VM. This Datastore selection happens after obtaining a lock. However the amount of work that is done for datastore selection takes nontrivial amount of time. This causes the following two problems.

  • Slowness: Each cloning/rebalance operation spends a high percentage of their time waiting to acquire the lock. As a result at higher concurrency levels no throughput improvements are observed.
  • Correctness: In spite of synchronizing the DS selection the VMs do not get distributed correctly across various DS’s. This stems from the fact that after DS selection we are not reserving any space for the allocation that we just made and the clone creation will take some time to create disks.

5.1.2 Key Concepts
Following strategies have been used to solve this problem.

  • Caching: One of the key reasons for slowness of each single selection is the time spent to make the VC call to get the latest DS and VM information. This is easily approximated using the cached values. VCCache is leveraged as much as possible.
  • Reservation: Using the latest value of datastore’s freeSpace impacts the correctness, as the VC is not yet aware of the space allocation we have done. Caching this value and adjusting it with the reserved space improves both the correctness and speed of the solution.

5.1.3 Throwaway Cache Reservation Algorithm
A simple caching and reservation strategy is a little problematic as on a short term our internal cached data has a more accurate view of the datastores as compared to VCCache (or VC itself). However the internal maps diverge from the real disk usage as time progresses. This can be due to increase in disk sizes/PowerOns/PowerOffs/ deleteVMs/external operations.

Throwaway: To avoid this we follow a strategy of throwing away our internal cache (on cache expiry) and recalculating it from VCCache. This does mean that we are probably throwing away the reservation information of at least some in-flight provisioning operations (which haven’t yet reflected in VCCache) and the decision we make after our cache refresh might be suboptimal. However the following points help the overall approach to stay on track.

  • Assuming that we were distributing VMs across the datastores before the refresh, the VCCache data shouldn’t be skewed for or against a particular DS by too much.
  • Inaccuracies that might creep in after DSCache is recalculated from VCCache should cancel out over multiple cycles of refresh.
  • Since linked clones start out really small if we were to begin with datastores with widely varying distribution, we would end up over-packing many more VMs on the datastore with least density. To overcome this, a penalty factor is used to discourage consecutive/frequent selection of the same datastore. The penalty is calculated as (Penalty Factor * Steady State Size of the VM).

5.2 Large Pool Support Problem
We identified two artificial system-imposed constraints as impediments to reaching 10K scale. Each of these dealt with how large an individual pool of desktop machines View could support. Though View allows an arbitrary number of desktop pools to exist, it is not a reasonable expectation of customers reaching 10K scale to create many small pools in order to achieve 10K total machines within those desktop pools. This was additionally important because pools are normally managed by customers with desktops that have similar business case purposes, and not created with characteristics to resolve limitations of the View system itself. As such, we found it desirable to remove any constraints that would prevent a desktop pool of reaching its recommended maximum size of 2000 machines. With this allowance in place, our testing could then be performed with as little as 5 total desktop pools.

5.2.1 Eight Host Limit
View defines a desktop pool on a single cluster within Virtual Center. Therefore, the cluster capacity represents an upper bound on the number of machines that can be provisioned within a desktop pool. In previous versions of Virtual Center, because of concurrency constraints when utilizing shared disks, VMFS enforced a limit of supporting only up to 8 hosts. To protect against this, View internally prevented desktops pools from being created on clusters using VMFS from containing more than 8 hosts (this constraint was not present for NFS disks). However, starting in vSphere 5.1 and VMFS version 5, this constraint was increased to 32 hosts. View lagged behind in modifying this limit. Once we removed the 8 host limit in supported environments, 32 hosts were more than sufficient to support up to 2000 desktops in a single pool.

5.2.2 Automatic Network Label Assignment
Conventional wisdom and industry best practice pegs the recommended size of a VLAN to a /24 subnet, or about 254 hosts, especially in virtual environments. When provisioning desktops within a pool, newly created virtual machines take on the network label characteristics of the pool’s one target parent virtual machine. This network label defines the VLAN tag for the machine, which in turn defines the subset size, and, more indirectly, the DHCP address range available to that machine. Therefore, when an administrator creates desktop pools with a parent virtual machine using recommended standards, newly provisioned machines all shares the same DHCP address range. This means that any more than about 254 machines using this configuration in a desktop pool will oversubscribe the number of IP addresses available to them. This then places an artificial limit on the size of the pool without awkward workarounds to later reassign those child machines to new network labels and VLANs.

In order to address this problem, we implemented a new feature to deal with the fact newly provisioned desktop machines always inherit their pool’s parent machine’s network label. Administrators are now able to specify a set of available network labels (and, indirectly, VLANs and DHCP IP address ranges) that newly provisioned machines can be assigned. An additional step was added in View’s provisioning to automatically assign the next non-exhausted network label to new machines in the pool instead of the parent’s. In this way, as long as the administrator supplied enough network label capacity, desktop pools could be created of large sizes while still following the best practice of individual /24 VLAN subnets. As an additional challenge, we found that re-provisioning operations to refresh or recompose existing machines to some old or new parent state would wipe away these network label assignments. We added logic to prevent this case. With these features, IP range constraints are no longer an impediment to overall pool size in View.

5.3 Rolling maintenance support
While we improve management operations’ scale and throughput, it is also important to be mindful about different variants of scale use cases. For customers with mission critical View Composer pools, maintaining desktop availability during refit operations is as important as the operation throughput itself. This is particularly true for Health Care customers where physicians need to be able to access critical patient data even during maintenance window. The rolling maintenance support was achieved by introducing a new configuration parameter: RollingRefitMinReadyVM, which was basically the minimum number of desktops available for logon during recompose/refresh/rebalance operations for a View Composer pool.

The algorithm being run during a View Composer pool maintenance operation is as followed.

  • Look up the RollingRefitMinReadyVM setting for the pool.
  • Query the number of ready to use desktops in this pool.
  • Check the maximum concurrent operations allowed for refit operations.
  • Effectively, the maximum concurrent refit operations for this pool = Min[(NumReadyVMsInPool – RollingRefitMinReadyVM), MaxConcurrentRefitOperationsAllowed] With rolling maintenance support, maintenance operations and system down time are no longer synonymous. And maintenance operation throughput can be pushed higher without sacrificing critical user accessibility.

5.4 Auto-tuning concurrency limits
Our findings above showed us that choosing optimal concurrency limit settings is a tricky and expensive task even for a dedicated scalability team with internal engineering knowledge, so it would be completely unreasonable for customers to attempt it. Therefore we started pursuing an auto-tuning feature for concurrency limits. Instead of taking manually configured static settings, the product should automatically detect the current capacity of its underlying infrastructure and dynamically adjust the concurrency limit up and down as appropriate to increase management operation throughput without risking instability.

We started prototyping this idea in the form of an external tool hooked into the connection server’s logs to detect when adjustments would be appropriate and then automatically make the adjustments through View’s PowerCLI administration interface. Our strategy was inspired by TCP’s congestion control system and its use of feedback as a signal to back off: we would gradually ramp up the concurrency setting until we observed either the throughput dropping or error conditions, at which point we would reverse course and lower the setting. Since the system capacity may continue to change, we needed to keep this strategy active and essentially implement a hill climbing algorithm targeting the throughput.

We tested the prototype on the same test beds we’d been using, and the results were encouraging. The recompose operation throughput noticeably increased compared to test runs using the default concurrency settings, and the peak that the prototype found was close to what we had found in our manually driven static tests.

6. Related Works

To adaptively tune the concurrency limit for management operations, work done by Bhatt et al [7] is closest to our approach. Their vATM system [7] implements a feedback-based throughput maximizer for vSphere operations, and it can potentially be used for multiple applications built on vSphere instead of just one. Since this solution is more general and more developed, there is a possibility to leverage some of the functionalities from this work in the future. There have been also some studies around understanding the management overhead and building scalable datacenter [9][10], however some of the these techniques are not directly applicable to the specific needs of View scale requirements.
Regarding the visualization and profiling tools for logs, there are several tools available such as VMware vCenter Log Insight [11] that we could use. However, to get custom and meaningful insights, we added some simple enhancements to ViewPlanner tool to get the precise data/charts which helped us to find the bottlenecks.

7. Conclusion

Our View Scale testing and engineering effort helped in solving the challenging View scalability problem. It uncovered several aspects of the product which became very vivid at scale testing and also provided concrete information about reproducing such a scenario. The following table gives a very high level comparison of the time taken for the management operations before and after our View Scale effort. These new comparison tests were performed on a different configuration of underlying hardware, but they still showed an enormous reduction in the end to end time required for each operation.
pate-15

We published many useful findings from this initiative in the View Connection server’s help guide.

8. Future Work

Potential future projects for the View Scale team include: (1) As previously mentioned, we will look for integration opportunities with vATM to maximize concurrent operation limits. (2) Design and implement a job framework that is capable of supporting scheduling and workflow for provisioning and maintenance operations (3) Integrate View Planner, VLS, and other test frameworks with VMODL based View API to enable complete end to end automation of scale tests, including even higher scale use cases like Multi-Datacenter View.

References

1. Banit Agrawal et al. VMware View® Planner: Measuring True Virtual Desktop Experience at Scale. VMware Technical Journal, December 2012.
2. Window Update Server.
3. View users inside the firewall might experience a 15-second delay when connecting to View Connection Server, while Windows attempts to reach Windows Update Server. (2020988) http://kb.vmware.com/kb/2020988
4. B. Agrawal et al. “VMware® Horizon View™ 5.2 Performance and Best Practices”, VMware Whitepaper, Performance March 2013.
5. VMware Horizon View 5.2 Administration Guide
6. Setting a concurrent power operations rate to support View desktop logon storms http://kb.vmware.com/kb/2015557
7. Chirag Bhatt et al. vATM: VMware vSphere Adaptive Task Management, VMware Technical Journal, summer 2013.
8. VMware® Horizon View™ Optimization Guide for Windows 7 and Windows 8, White Paper,
9. Vijayaraghavan Soundararajan , Kinshuk Govil, Challenges in building scalable virtualized datacenter management, ACM SIGOPS Operating Systems Review, v.44 n.4, December 2010
10. Vijayaraghavan Soundararajan, Jennifer M. Anderson: The impact of management operations on the virtualized datacenter. ISCA 2010: 326-337
11. www.vmware.com/products/vcenter-log-insight