Automatic Discovery of Configuration Policies

Lalit P. Jain
Cloud Management Intern, VMware, Inc.
ljain2@uccs.edu

Greg Frascadore
VMware, Inc.
gfrascadore@vmware.com

Abstract

Cloud computing data centers contain thousands of host servers and millions of virtual machines, each with its own configuration. Automation can enforce standards and keep configurations synchronized, but defining the desired state (the policies) is still a manual process. We describe a method that automatically discovers configuration policies by monitoring configuration changes and clustering resource properties into policies that are based on their correlation using mutual information.
This work is a step toward the automatic discovery and generation of configuration assessment rules.
General Terms: algorithms, management, measurement
Keywords: clustering, configuration management, configuration assessment

1. Introduction

Cloud computing data centers contain physical and virtual forms of computing servers, network switches, data stores, storage arrays, application servers, and numerous other resources that have software-defined configuration. In a typical data center there can be thousands of physical devices and millions of virtual resources. Automation is the key to managing the complexity. Automation can enforce configuration standards, detect configuration drift, and measure the degree to which a data center is in compliance with its desired state. However, although automation can enforce configurations standards, defining the desired state is still a manual process.
The identification of configuration patterns and automation are the key to managing the various devices in the data center.

Configuration patterns stem from configuration goals for resources. These goals are policies. Just as there is more than one desired state in the data center, there are also many policies. A single resource can be subject to multiple policies, and a single policy can apply to multiple resources. Configuration policies arise from practicalities, from best practices, or by fiat.

2. Overview

In this paper we describe a process for automatically discovering policies for resources and for discovering the asset classes that such policies induce. Our approach is to monitor the configuration changes made by an administrator over time. We leverage several assumptions:

  • In the modern data center, the trend toward the software-defined data center and DevOps practices is making configuration changes monitorable and trackable, because they take place in virtual resources or through management applications that create logs.
  • The purpose of a series of configuration changes is to place the target resource into a desired state.
  • Relevant properties are ones that influence a key performance indicator (KPI). These are the subjects of configuration change. Unchanging properties are irrelevant.

3. Background

A configuration policy is a rule or guideline that constrains the state of a resource by limiting, or disciplining, certain property values of the resource. For example, the VMware vSphere® Hardening Guide [1] is a policy that includes constraints for properties like those shown in Table 1.

jain-1

Although some systems—such as SCAP/OVAL [2] [3] and Puppet [4] [5] [6]—support the automatic testing and enforcement of configuration policies, most policies are uncodified or intangible. Some policies arise as local best practices that administrators implement manually with ad-hoc changes. Other policies arrive in the form of security implementation guides (STIGS) from the federal government [7] or as regulations from the Payment Card Industry (PCI) [8]. Policies from these sources are nonoperational blueprints and checklists that the administrator interprets and implements manually. This situation is undesirable because intangible policies are hard to evolve, change, and understand.

Complicating the issue is that the policies are not the only intangibles. An asset class is a subset of resources related by mission, location, or affiliation. Examples are production servers, accounting desktops, and West Coast hosts. Historically, asset-class definitions also arise informally. An administrator takes the inventory of known resources and categorizes them using local expertise. The admin then interprets and applies policies differently to the resources according to their asset class. For example, resources that process payments might be disciplined more frequently than documentation portals. The entire process is error-prone and nonreproducible.

Policies induce asset classes. The domain of every policy partitions the resources of the data center into one group that is subject to the policy, and another that is not. The former is an asset class. The intersections and unions of the directly induced asset classes create even more subdivisions. Asset classes enable us to think of resources as groups differentiated by the policies that apply. For this partitioning to be practical, the policies from which the classes arise must be codified.

The DevOps philosophy for data center operations is gaining momentum [9]. DevOps calls for creating reproducible configuration change driven by codified policies (i.e., tangible desired state). The idea is to treat infrastructure like code. Unfortunately, writing executable policies resembles software development. Policy interpreters require the use of languages such as OVAL or XQuery. Property tests written in these languages require domain expertise regarding the guideline being codified, as well as technical expertise in programming the rule language of the assessment system. The limiting factor of the current approaches is this necessity for manual declaration. The effect is that many configuration policies never become codified.

4. Details

We propose to remove the necessity for manual declaration of policy rules. We can use the logs of configuration changes made by users, along with semisupervised clustering techniques, to discover configuration policy. Effectively, we are codifying policies by observing the changes that users perform while enforcing intangible policies. This becomes possible because the modern data center is software-defined, making configuration change trackable. Configuration changes are made for a purpose. An administrator makes a change in order to bring a resource into a desired state. The goal state can be informal and uncodified, or it can be motivated by regulations the administrator is trying to follow. Configuration change can also be the result of tailoring of externally imported guidelines. No matter the cause, the effect is that the data center is being made to conform to the desired state of the administrator’s organization. The changes are an expression of policy and desired state.

In the software-defined data center, the configuration of a resource is trackable because the physical data center components are commoditized and customization takes place on the virtual replacements: virtual compute, network, and storage. Administrators provision and control resources and implement policy through management and workflow applications instead of adjusting physical components.

Configuration change is a software-controlled state change made through applications that create logs. Management applications such as the VMware® vCenter™ inventory service report state changes in a change-event log or Atom feed. By retrieving the new state of a resource and comparing it with the previous state, we create a change log for every resource and property. For every resource and property, the log records a row entry like those in Table 2.

jain-2

There are many resource types: virtual machines, hosts, storage, and networks. Applications such as Web servers and database systems are resources, as are instances of application stacks (e.g., LAMP stacks). Each resource type supports a collection of mutable properties. Guidelines and policies dictate the desired state and constraints on a mutable property.

We associate a baseline value with every resource property. The initial value is usually the baseline when a property has a predictable initial value. The specific value of the baseline is unimportant as long as a comparison with it indicates when a property is set to a nonbaseline condition. Examples of properties and respective baseline values are shown in Table 3.

jain-3

In combination with the configuration change log of Table 2, the knowledge of baseline property values provides an insight necessary for discovering policies. When a property is consistently set to a nonbaseline condition, some policy is disciplining that property. If the policy is unknown, we can discover it by correlating the changes. To do this, we look back over an attention window—a past region of the change log. From the log entries we create two tables: a change indicator table and a final-value table.

4.1 Change Indicator Table

The change indicator table is an array of binary-valued feature vectors. Each row represents the final state of a resource at the end of the attention window. During the attention window, properties of the resource might have changed value. The bits within a row indicate whether the respective property landed in a non-baseline value at the end of the attention window. In other words, for each resource ri a vector ri = {Xij} indicates whether the jth property Pj was set to a non-baseline value. The value Xij = 0 if the jth property of ri has its baseline value by the end of the attention window (i.e., was never set to a non-baseline value or was set, but eventually reset, to the baseline value). Otherwise, Xij= 1. Figure 1 shows an example row of an indicator table.

Figure 1. For each resource like vm01 we associate a vector of bits that indicate whether the respective property of the resource has changed value. At the end of the attention window, if the property value lands in a nonbaseline condition, the corresponding bit is set to 1. Otherwise, the property was unchanged or reset to the baseline, and the bit value is 0.

Figure 1. For each resource like vm01 we associate a vector of bits that indicate
whether the respective property of the resource has changed value. At the end
of the attention window, if the property value lands in a nonbaseline condition,
the corresponding bit is set to 1. Otherwise, the property was unchanged or reset
to the baseline, and the bit value is 0.

We must pad the indicator table for certain resource and property combinations. Virtual machines and VMware® ESX™ hosts share some properties, such as IP address and config-ntp. In these cases the rows for virtual-machine resources and hosts might all have the bit set in the column for the shared property (e.g., IP address). However, some resource types have unique properties. Virtual machines have usb.present, but hosts do not. When a property is not applicable for a particular resource type, the respective row and column of the indicator table will contain 0, as if the nonexistent property is fixed in its baseline condition. With this padding, the indicator table for an attention window resembles Figure 2.

The change indicator table is not necessarily large. A row appears only if the respective resource has undergone at least one change during the attention window. A property column appears only if at least one resource has had that property value land in a nonbaseline condition. If a property does not change for any resource during the attention window, no column appears. Other omissions are possible. Some properties are immutable or read-only. Other properties, such as MAC -addresses, are a -priori irrelevant to policies even if they do change value. By omitting columns for unchanging and irrelevant properties, we further reduce the number of P1 .. Pk and the size of the indicator table. (Relevant properties are ones that affect a key performance indicator (KPI) or are tested by an extant policy such as an imported PCI benchmark, hardening guide, or STIG.).

4.2 Final-Value Table

The final- value table contains the final property value corresponding to each change (each ‘1’) in the change indicator table. See Figure 2b. The indicator and final-value tables provide two more insights necessary for discovering policies. Resources subject to a policy have 1s in the columns corresponding to properties disciplined by the policy. Meanwhile, the respective values in the final-value table are the property’s desired state. In this way, the indicator and finalvalue tables contain the information needed to
discover the policies that were driving changes made during the attention window. In theory we should be able to group resources with identical indicator

Figure 2a. Change Indicators for Resources. A value 1 in row i column j indicates that resource i’s Pj landed in a nonbaseline value at the end of the attention window. A 0 designates that the value did not change, changed back to the baseline by the end of the window, or was inapplicable for the resource type. Properties such as usb.present are applicable only to ESX hosts and must be zero (MBZ) for resources of other types. Figure 2b. For each resource property, the final-value table records where that property value landed at the end of the attention window.

Figure 2a. Change Indicators for Resources. A value 1 in row i column j indicates that
resource i’s Pj landed in a nonbaseline value at the end of the attention window. A 0
designates that the value did not change, changed back to the baseline by the end of
the window, or was inapplicable for the resource type. Properties such as usb.present
are applicable only to ESX hosts and must be zero (MBZ) for resources of other types.
Figure 2b. For each resource property, the final-value table records where that
property value landed at the end of the attention window.

vectors. Each group represents a policy, each bit in the indicator vector represents a policy condition, and the respective value in the final-value table is the desired state. Finally, the resources in the group constitute an asset class. In practice however, things are more complicated than that. Grouping identical indicator vectors won’t work because of many issues:

  • Noise is present in the form of ad-hoc changes unrelated to policy.
  • People are inconsistent. (A change is made, then undone.)
  • Human actions can be incomplete or lack oversight. (A policy-prescribed change is never made.)
  • Earlier assumptions that identify relevant attributes can be imperfect.
  • Distinct but overlapping policies will discipline some common properties.
  • The number of policies being sought is unknown.

To deal with these difficulties, we use a more sophisticated type of grouping based on clustering techniques from machine learning. Clustering discovers relationships between subjects that are characterized by similar but not identical features. In a straightforward application we would use k-means clustering, take resource rows as subjects, and use the respective indicator vector as a feature vector. We would define a difference metric d(v,w) that measures the distance between two indicator vectors and run k-means over the indicator table. For example, in Figure 2, consider the three rows vm01, 02, 03. Vectors vm01 and vm03 are similar, and the common properties are P1 and Pb. K-means would propose a policy {P1 = 2, Pb =utcnist.colorado.edu } covering the asset class {vm01, vm03}.

Unfortunately, the straightforward application of k-means isn’t appropriate for this problem. The usual difference metrics like such as Euclidean distance don’t work well for measuring the difference between binary indicator vectors, and real-valued feature vectors are not unavailable because resource properties (e.g., in Tables 1—3) usually have Boolean and other non-numeric values. Instead of clustering- indication vectors, we cluster properties. We use the indicator table (i.e., the indicator vectors) as a parameter to create a metric of property correlation. The reciprocal of correlation becomes the clustering difference d(Px,Py). For the property correlation, we use the mutual information I(Px;Py) [10] from how often two properties Px and Py change from their baseline values according to the indicator table:

jain-10

(1) Here P is probability, and P(Px = 1) is the frequency that Px = 1 in the Px column of the indicator table, divided by the number of rows. The distance d(Px,Py) between two properties varies inversely with their mutual information.

Now we apply agglomerative clustering [11]. Initially the process places each property such as P1 into a singleton cluster {P1}. In each iteration, the closest two existing clusters are grouped until only one grouping remains. The difference between clusters is calculated as the distance between the closest members of the clusters (i.e., single linkage clustering). Agglomerative clustering returns the final grouping containing all the intermediate groupings it created during the process. This enables us to see a dendrogram tree of policy proposals.

4.3 Asset Classes and Varieties
An asset class is a set of resources disciplined by a common policy (i.e., having the same change indications). Within an asset class there are varieties. These are subclasses of resources with identical change indications and also having the same values in the final-value table. SCAP terminology calls a variety a tailored benchmark [8]. Puppet calls a variety a set of resources sharing a desired state [13]. The vSphere Hardening Guide calls them risk profile levels [1]. Figure 3 shows one asset class having three varieties. (Real asset classes and varieties would have more members than this example.). Resources vm01—4 share the same change indications to P1 and P3. This makes vm01—4 an asset class and {P1, P3} would be the policy discovered by clustering. Within the asset class, the subset {vm01, vm02} is a variety, distinguished by the desired-state {P1 = 2, P3 = 10.0.1.11}. Resource vm03 forms another variety {vm03}. It also exhibits changes to P1 and P3, but now the desired-state is {P1 = 1, P3 = 10.0.1.33}. This is a simplified version of an actual risk profile in the vSphere Hardening Guide, where P1 = 2 is from the virtual- machine hardening policy at risk profile level 3 and P1 = 1 is the same policy at risk profile level 1.

Figure 3. This final-value table shows that changes to properties P1 and P3 are highly correlated, making them a candidate policy. The respective asset class includes resources vm01–4, but not vm05. The value 0 indicates the baseline value. Non-zeros indicate the specific change. The changes produce three varieties: {vm01, vm02}, {vm03}, {vm04} distinguished by the respective property changes {P1=2, P3=10.0.1.11}, {P1=2, P3=10.0.1.33} and {P1=1, P3=10.0.1.33}.

Figure 3. This final-value table shows that changes to properties P1 and P3 are highly correlated, making them a candidate policy. The respective asset class includes resources vm01–4, but not vm05. The value 0 indicates the baseline value. Non-zeros indicate the specific change. The changes produce three varieties: {vm01, vm02}, {vm03}, {vm04} distinguished by the respective property changes {P1=2, P3=10.0.1.11}, {P1=2, P3=10.0.1.33} and {P1=1, P3=10.0.1.33}.

5. Summary

To discover policies, we begin with a change log (see Table 2) and knowledge of the baseline value of every property (see Table 3). We scan the changes from a bounded attention window within the log. Each log entry there records the change to a property value on a specific resource such as vm01 (see Figure 1). Collecting these changes, we create two tables: an indicator table (see Figure 2) that records for each resource which properties changed to nonbaseline values during the attention window, and a final-value table (see Figure 3) that records the final value of each resource property. Formula (1) defines a difference metric from correlation of two properties by using the mutual information in their indicator vectors. Using the difference metric, we apply agglomerative clustering of properties P1, P2, and so on. The discovered clusters (see Figure 4) are candidate policies. The properties in each cluster are the subjects of the policy’s condition tests. The respective values (see Figures 2 and 3) in the final-value table are the desired state. Any resource with an indicator vector matching properties disciplined by a discovered policy is classified as a member of the policy’s asset class. Within the asset class, resources with properties having the same desired state make up a variety. Varieties correspond to tailorings for SCAP and desired-state manifests for Puppet.

6. Results

We created a prototype to discover configuration guidelines from a log of vCenter changes. We verified that we could generate the change log by using an existing program, pyReplay, which listens to the vCenter inventory service Atom feed. When the feed reports a change to a resource, pyReplay retrieves the XML description of the resource; compares the XML with the previous state; generates a tuple that includes the resource ID, timestamp, property ID, old value, and new value; and stores the XML if there were changes. Using pyReplay output, we created the change log of Table 2. From the change log we wrote separate C and Python programs that generate the indicator and final-value tables. To perform k-means and agglomerative clustering we used C++, Python, and the Weka toolkit [12].
To conduct experiments, we also created a C++ program that peppers the indicator table of Figure 2 with varying amounts of random noise. The noise takes the form of a probability e. Each bit of the table is reversed (i.e., 0→10 or 1→0) with probability e. By increasing e we can study the negative effects of noise on policy discovery. When e = 0.5 the table is entirely noise.

For generating the change log, we generated changes to 1,000 virtual-machine resources, each having 26 properties tested by the vSphere Hardening Guide. We generated the property changes by enforcing policies that ranged in size from 2 to 13 of the 26 properties. Each trial tested one guideline that was repeatedly applied to a decreasing percentage (decreasing asset class size) of the 1,000 virtual machines. In addition, we injected noise in the form of increasing values of e.

Because each policy consists of condition tests to an unknown subset of the 26 properties, we calculated the policy discovery error to be the difference between the number of properties of the target policy and the number of properties in the smallest discovered cluster that contains all the properties of the target. Figure 4 illustrates this calculation.

Figure 4. The policy discovery error is the number of properties misclustered into the smallest subtree containing all the properties of the unknown target policy that generates the test problem. Here the hidden policy is P1–P7, and the error is 8.

Figure 4. The policy discovery error is the number of properties misclustered into
the smallest subtree containing all the properties of the unknown target policy that
generates the test problem. Here the hidden policy is P1–P7, and the error is 8.

Figure 5 plots policy discovery performance versus noise and asset class size. The target (unknown) policy disciplines 7 of 26 properties of three asset classes of size 10, 20, and 40 chosen from 1,000 resources. Recall that the asset class size is the number of resources disciplined by the hidden policy. Along the x-axis the noise is increasing from 0 to 0.3. At 0.3, 30% of the indicator vector bits are randomly reversed to represent non–policy-induced changes in the event log. The y-axis shows the policy discovery performance in terms of the error.

Figure 5. Performance at Detecting a Policy That Is Disciplining 7 of 26 Properties and 10, 20, and 40 of 1,000 Resources.

Figure 5. Performance at Detecting a Policy That Is Disciplining 7 of 26 Properties and
10, 20, and 40 of 1,000 Resources.

The results in Figure 6 show that our process accurately discovers an unknown policy disciplining as little as 1% of the resource inventory. The discovery error begins to increase with increasing amounts of noise but is counteracted (delayed) by increasing asset class size. A policy with a modest asset class size of 4% of inventory (i.e., 40/1000) is discovered accurately despite 15% noise.

7. Related Work

The National Institute of Standards (NIST) maintains a large collection of configuration guidelines that assess configuration compliance. These are part of the National Checklist Program [7] at the National Vulnerability Database. These are examples of manually created policies. Many are not codified for automatic assessment. Our work is trying to automate the discovery and encoding of similar guidelines that are timely and suitable for automatic assessment.

SCAP and OVAL [2] [3] are configuration assessment frameworks. These systems apply policies like the ones we discover. Currently, SCAP and OVAL guidelines (called benchmarks) are manually developed and codified using text editors and tools such as the Benchmark Editor and the Recommendation Tracker [13].

Enterprises sometimes require that their business processes, as well as configuration state, align with rules and policies. In the literature there is work related to compliance management for processes. In healthcare, for example, medical guidelines and clinical policies should be followed during patient treatments [16]. In control-flow compliance checking, Petri nets capture compliance rules in the form of patterns subsequently used to check the alignment of process behavior recorded in event logs [14]. Activity-oriented clustering determines the dependencies between process models and compliance rules with respect to a large number of business processes [15]. A datadriven approach has coordinated the behavior of business policies and their interactions [16]. Compliance Rule Graphs (CRG) detect the occurrence of business policies [17]. Our work differs from these process-oriented approaches in that our focus is on finding configuration policies expressed by users within logs of user activity. The Facter project gathers resource, property-value, and system information similar to the vCenter-plus-pyReplay process that we described above [5]. Facter returns a snapshot of a system’s current state. By differencing the state between Facter gatherings, pyReplaystyle, we could generate the change log of Table 2 from any inventory of Facter-support resource. This would extend the domain of our guideline discovery process to sources beyond vCenter.

Policies are models of desired state. The idea of using such models for data center automation is implemented by systems like Puppet [4] and CFEngine [6]. These systems require the administrator to first write the manifests and profiles manually. A future direction for our work is to discover and propose such manifests automatically.

8. Conclusions and Future Work

We have described a process for discovering configuration policies from a log of changes kept by vCenter during the operation of a data center. Our process does not require the manual entry of test conditions by an administrator. Instead, we discover policies and desired values by observing the resources that change, along with the affected property, new value, and knowledge of the baseline value. We cluster together properties by agglomerating ones that correlate. Two properties correlate if there is mutual information in the co-occurrence of their change to nonbaseline values. Clustered properties become the discovered policy. Resources disciplined by the policy form an asset class. We subdivide the policy into varieties by grouping resources within the asset class having the same desired values for the properties tested by the policy.
Automatically discovering configuration policies is a time-saving tool. Instead of editing test conditions, the user’s actions over time propose new policies and asset classes. This not only exploits the software-defined nature of the modern data center, but it also saves time and supports a DevOps-style assessment of data center operations. Policy discovery also has the potential to detect new guidelines that were not realized or articulated by the administrator. Our next step is to apply the policy discovery process to a live feed of vCenter changes and automatically propose desired states to assessment and remediation engines such as OVAL and Puppet.

Acknowledgments

We would like to thank Rob Helander for writing the pyReplay code that generates the configuration change events and resource states from the vCenter inventory service Atom feed. Thanks also to Rick Frantz for proposing and sponsoring the topic of automatic policy discovery as a research project.

References

1. vSphere 5.5 Hardening Guide, VMware Inc. 2013.
2. The Security Content Automation Protocol (SCAP), National Institute of Standards. Jan 2014.
3. The Open Vulnerability Assessment Language (OVAL), The MITRE Corporation. Mar 2014.
4. Puppet Enterprise. Puppet Labs. Mar 2014.
5. Facter 1.7. Puppet Labs. Mar 2014.
6. CFEngine. CFEngine AS. Mar 2014.
7. National Vulnerability Database, National Institute of Standards (NIST). 2014.
8. PCI SSC Data Security Standard. Payment Card Industry Security Standards Council. 2014.
9. Huttermann, M., DevOps for Developers. Apress. 2013.
10. Cover, T. M. and Thomas, J. A., Elements of Information Theory, John Wiley & Sons, Inc. 1991.
11. Whitten, I. H., Frank, E. and Hall, M. A. Data Mining: Practical Machine Learning Tools and Techniques, 3rd Ed., Morgan Kaufmann. 2011.
12. Weka 3: Data Mining Software in Java, University of Waikato. Mar 2014.