Simplifying Virtualization Management with Graph Databases

Vijayaraghavan Soundararajan
Performance Group, VMware
ravi@vmware.com           

Lawrence Spracklen*
Ayasdi
lawrence.spracklen@spracklen.info

Abstract

Facebook graph search allows arbitrary queries that leverage the vast amount of information available in Facebook. With such a treasure trove of information, it is possible to answer a large number of questions like “find all of my friends that ski.” Such queries leverage this specialized Facebook database. Moreover, these answers are unavailable through other conventional search interfaces like Google. In a similar way, the database in a virtualized infrastructure can also be thought of as containing graph information: namely, the relationships between various entities like hosts, VMs, and datacenters. Many questions of interest to an administrator can be answered easily when couched in terms of a graph database. For example, suppose a customer would like to know how many VMs have disks that span multiple storage arrays, or which datastore is most critical (because it has the most VMs and hosts on it). These questions can be expressed simply in terms of graph traversal. However, VMware vSphere® currently has no simple interface to answer these sorts of questions. Why not use such graph search to help with virtualization management?

In this paper, we propose coupling the virtualization management API, graph search, and social media in order to provide an easy-to-use portal for answering a wide variety of graph-oriented management questions in a virtualized environment. We expand upon previous work in social media approaches to virtualization management and use the metaphors of social media to define simplified relationships between entities (for example, follower/followee relationships and group membership). We use the VMware virtualization management API to define the entities (for example, VMs and hosts) and relationships (for example, datastore or network connectivity), and we populate a VMware Socialcast® social media portal with such data. Finally, we import the resulting graph into a graph database and leverage easy-to-use query languages and interfaces to access these graphs and answer common management questions. We show examples from a preliminary prototype to demonstrate the usefulness of such an approach and also describe other use cases that leverage this sort of data organization.

Categories and Subject Descriptors: D.m [Miscellaneous]: Virtual Machines, system management, cloud computing.

General Terms: Performance, Management, Design.

Keywords: Virtual Machine management, cloud computing, datacenter management tools.

1. Introduction

When Facebook [3] introduced its search feature, it marked a watershed moment in web search. The Facebook database is a vast source of information that has deep relevance to its users. While Google has access to a similarly vast and important database of information, this information is not as specific to the users as the personal information in Facebook. One important distinction between search in Facebook and Google is the type of data involved. Facebook is not trying to create a generic search engine for all data; rather, it is capitalizing on the sort of data that is stored, namely, connectivity data in the form of a large, distributed graph database [2]. In the case of Facebook, the type of data stored (connectivity data) is well-matched to the mechanism of storage (a graph database).

Why is Facebook search useful? It is useful because it connects people with the information they really care about, namely, personal data. For example, a question like “Find all of my friends that ski” cannot be answered with a pure Google search, but can be answered using data from a social media database (whether it be Facebook or Google+ [1]). A Google search is extremely useful for purely factual information, but often less useful for queries that are of a more personal nature.

In a similar way, a virtualization infrastructure has a wide variety of information of interest to the virtualization administrator. This data is currently organized using a combination of a relational database (the virtualization management database) and an XML database (an embedded inventory service). However, similar to a Facebook search, many questions of interest to an administrator can be couched simply in terms of the connectivity graph for the virtual inventory. For example, an administrator may wish to know the answer to the question “Is there an administrator with access to both datastore X as well as VM Y?” This question can easily be answered via well-structured API calls to the virtualization manager (for example, VMware vSphere [13]), with vSphere performing the appropriate queries to its relational database. However, an extremely simplistic graph database can answer the question much more easily, and such a query can be expressed more intuitively. Moreover, it can be extended to answer a large number of other queries fairly easily.

In this paper, we propose reorganizing virtualization inventory data into a graph database, combining it with relationships from social media, and using this connectivity structure to address many virtualization administrator pain points. We discuss results from a preliminary prototype, and describe areas of future work leveraging this computing model.

The structure of this paper is as follows. In section 2, we describe how we map virtual inventories into simple connectivity graphs. In section 3, we describe our prototype design for answering common queries using this implementation. In section 4, we provide a number of use cases and some preliminary results from running queries against a virtualization hierarchy. In section 5, we describe related work. Finally, we conclude in section 6.

2. A Virtualized Inventory as a Graph Database

A virtual infrastructure is basically a tree. In VMware vSphere, the root of the tree is the vCenter Server™. The children of vCenter are datacenter folders whose children are datacenters. The children of datacenters are folders for different entity types, such as hosts, datastores, VMs, and networks. Moreover, virtual infrastructure also includes clusters and resource pools for further organizing hosts and VMs.

One characteristic of this tree that makes it potentially confusing is that a given node can have multiple parents. For example, the parent of a VM is a VM folder. However, a VM can also belong to a resource pool. Moreover, a VM runs on a given host. Each of these could legitimately be the parent of the VM, but in the vSphere API [14] (i.e., the VIM API), the formal definition of the parent of a VM is a VM folder (or, more confusingly, a vApp in some cases). The other ‘parents’ (hosts and resource pools) are runtime properties of the VM, and we informally refer to them as alternate parents.

The current technique for storing this inventory data in vCenter is a standard relational database. A flat entity table stores an entity name and its parent name. There are other tables in vCenter that store VM and host information, and the alternate parent information is stored along with the VM or host data in these tables. To traverse such information in the VIM API requires the specification of a starting node (for example, the datacenter where the host is located) plus a specification for what nodes to grab during the traversal (for example, all hosts and VMs). This amounts to a number of database queries to retrieve this information.

In recent vSphere releases, a portion of inventory retrieval requests are satisfied not by the aforementioned relational database, but rather by an inventory service. The basic idea behind the inventory service is that it is backed by a database that is optimized for XML datatypes (in current releases, xDB). Because the data in the VIM API is already structured as XML, storing this data in xDB as XML documents effectively lowers the ‘impedance mismatch’ that occurs when a user queries for data. Essentially, the XML database is optimized for the document format used in vSphere. The query language used by this database is xQuery, optimized for queries of XML data. This is helpful for accessing lists of VMs under a folder or attributes of a VM, since each is effectively a property of some XML document. For example, to get the parent of each VM, the following query is used:

#1 for $vm in //VirtualMachine return $vm/parent

To get the resource pool alternate parent as well as the folder parent of a VM requires the following query:

#1 for $vm in //VirtualMachine 
#2 return ($vm/parent, $vm/runtime/host)

Both the relational database and the XML database are effective means for storing data, but neither of them is particularly well-suited for the types of queries we are considering in this paper, namely, graph traversals. In this paper, one of our goals is to convert a common set of queries with a wide range of uses into graph traversals. Consider, for example, the hierarchy in Figure 1, which consists of 2 clusters, each with 1 host (H1, H2) and 2 VMs (V1-V4). Node VMDK is a base disk shared by linked clones V2 and V3. Suppose an administrator would like to know if there are any
VMs that reside in different clusters but share a VMDK. This might be helpful when an administrator is trying to limit the
scope of dependencies of a VM to within its own clusters.

To answer this question using the VIM API, where VMDK is a property of a VM, the flow would go something like this:

  1. Find all VMs in cluster 1.
  2. Enumerate the VMDKs of each of these VMs.
  3. Find all VMs in cluster 2.
  4. Enumerate the VMDKs of each of these VMs.
  5. Find out if any VMDKs from step 2 match the VMDKs of step 4.

While this process is fairly simple, it relies on an understanding of the inventory model and knowledge of the location of the list of VMDKs associated with a VM. This list is in the somewhat non-intuitively named ‘VirtualMachine.layoutEx.file[]’ array.

Consider the same problem from the perspective of a graph, with the data stored in a graph database [4]. If we organize the data more closely to its logical representation—making hosts, VMs and VMDKs nodes; and representing datastores and clusters as groups—then answering the question would instead require simply traversing the list of VMDKs, finding ones with multiple parents, and then checking if those parents (which are VMs) are members of different clusters. Matching the data storage representation to the sorts of questions that are likely to be answered can make it easier to frame the questions and potentially faster to find the answers. In this paper, we will give a wide variety of examples to justify the need for organizing data sensibly in order to answer such questions.

Figure 1

Figure 1. Modeling an inventory as a graph. In this example, H1 and H2 are hosts, V1 through V4 are VMs, and VMDK is a base disk. V2 and V3 are linked clones that share the same base disk. With data stored like a graph, a question like “Are there any VMs in different clusters that share a VMDK?” can be solved by a simple graph traversal.

Given that vCenter already has a relational database and stores hierarchy information, why do we need a graph database? First of all, we are not looking for a general solution for vCenter data storage. Instead, we propose a graph database as a secondary storage medium in order to solve specific types of problems. The overall storage required for the data we are considering is on the order of 5MB (as opposed to several hundred MB to store all of the information for such an inventory). By pruning the type of data we care about, we are able to search it extremely fast: most queries complete in tens of milliseconds. Moreover, it is easily extensible to a wide variety of queries.

To some extent, vCenter already allows a user to view the inventory as a graph through the “Maps” tab in the UI. A simple example is shown in Figures 2 and 3. In Figure 2, we show a standard client view of a cluster. In Figure 3, we show the “Maps” tab in the UI. While this representation is available from the UI, there is currently no API to allow the user access to the graph itself.

Figure 2

Figure 2. Standard hierarchical view of a vSphere cluster.

Figure 3

Figure 3. A Graph-based view of a portion of a cluster using the Maps tab in the vSphere UI. The Maps tab shows the connections using a graph topology. In this figure, for example, host 10.135.192.8 is managing VM w1-pcloud-v154)

3. Prototype Design

In this section, we validate our ideas by describing a preliminary prototype of this approach to data storage. Our prototype combines Socialcast [7], the VIM API [14], and the Neo4j graph database [4]. Neo4j is an open-source graph database that is architected to be both fast and scalable (up to billions of entities), and provides a powerful, human-readable graph query language (called Cypher) that allows users to efficiently extract key insights from their data. Neo4j stores data in nodes connected by directed, typed relationships and provides the ability to associate properties with both. In our setup, the elements in the datacenter (e.g., vCenter, hosts and VMs) are represented as nodes and the edges between these nodes track their relationship to one another. For example, a host follows a VM, so a ‘follow’ relationship is established between these two nodes. Similarly, a datastore comprises VMs, so a datastore is represented as a node, and the VMs that comprise the datastore are related to datastore node with a ‘member’ relationship. We assign types to relationships (for example, hosts follow VMs, and VMs are members of networks) primarily for convenience and to make our queries easily understandable. Also for convenience, we introduce an additional node type called ‘user’ for connecting users to their VMs. Our choice of Neo4j is motivated by its wide use and high performance, but other graph databases with similar functionality could have been used instead.

The basic flow of information in our system is as follows:

  1. Using the VIM API, the inventory structure of a virtualization infrastructure is recorded. For example, a datacenter is traversed and hosts and VMs are recorded, as well as their datastores and networks.
  2. For certain entities (hosts or VMs), a user in Socialcast is created. This data is published to Socialcast. Follower relationships are created in this step. This work heavily leverages the social media work presented
    in [5].
  3. For other entities (e.g., datastores or networks), a group is created in Socialcast, and users are added to those groups.
  4. Neo4j retrieves data from the Socialcast server.
  5. Neo4j converts all Socialcast users and groups into nodes in a graph, and assigns types to the nodes as well as types to the relationships. For example, a network is a group in Socialcast, but a node in Neo4j. A VM is a member of a network group in Socialcast, but is connected to a network via a ‘member’ relationship in Neo4j.
  6. Graph queries sent through Socialcast (by sending a message to the Neo4j server) use the Neo4J backend
    to compute the result set, with results returned as a message to the user.

Our mapping between virtualization entities and Socialcast entities is based on [5], adding an additional VMDK entity for the disk file of a VM. A VM follows its VMDKs.

Even though we can also publish data directly from the VIM API to Neo4j, we choose to use Socialcast for several reasons. First, the Socialcast UI provides a convenient and intuitive means to visualize an inventory. Second, the Neo4j population code can operate independently of the vCenter server. If vCenter is down, Neo4j can still retrieve data from Socialcast and answer questions. Third, and perhaps most importantly, using the metaphors of social media in the Neo4j pre-population code vastly simplified the Neo4j code. For example, to pre-populate Neo4j, the Neo4j coder merely needed to understand two basic relationships: groups and followers. In addition, the Neo4J coder merely needed to understand that each node has a type (for example, a ‘VM’ type, ‘host’ type, or ‘User’ type). All of these relationships can be trivially retrieved from a Socialcast setup: the XML data for a user consists of ‘following’ fields and ‘group’ fields. Simply by reading the list of users and parsing these fields, the entire Neo4j graph database can be populated. It should be noted that the Neo4j code was written by someone with essentially zero knowledge of the VIM API, potential evidence of the simplicity that the Socialcast model affords.

Given that Socialcast already maintains a sort of connectivity graph of users, one might ask why we propose to maintain a second, separate graph database in Neo4j. For our initial implementation, we separate these databases primarily for ease of expressiveness and for performance. Socialcast stores its user relationships in a MySQL relational database [16] and does not expose this relational database to the end user for querying. Instead, only the Socialcast API can be used to query its data. As a result, the method of determining relationships is inherently serial: to find the follower of a follower in Socialcast requires using the Socialcast API to retrieve the first entity, getting its followers, and then explicitly querying those followers, again using the API. In this case, multiple round trips from a client to the Socialcast server/DB are required to retrieve connectivity data. In contrast, if we store the data in a separate graph database, we can use the graph database API for determining relationships. For example, a user can retrieve the followers of the followers of a given node with a single round trip to the graph database using only a few simple lines of code, as we show in the next section. From a performance perspective, a general-purpose relational database like MySQL is not specially tuned for these sorts of graph traversals, requiring index lookups followed by data lookups, while graph databases store edge information directly with each entity, making it faster to determine relationships [15]. Our preliminary results suggest that it takes milliseconds to retrieve relationship data from Neo4j, while it takes tens to hundreds of milliseconds to get the same data using the Socialcast API.

In terms of scale, Socialcast has been designed with enterprise social media in mind, and typical large enterprises may have up to tens of thousands of users. A simple MySQL database is entirely sufficient for this purpose. Once virtualization entities are included, however, this number may increase by an order of magnitude, potentially necessitating a different, more scalable storage backend. Going forward, it might be interesting to speculate on how to use a Neo4j-like backend for Socialcast with corresponding changes to the Socialcast API.

3.1 Design for scalability

Neo4j is a graph database specially tuned for graph type queries. One of the key ways in which we design for scalability is to simplify our Neo4J queries to carefully limit the types of relationships we allow between entities. As mentioned above, our relationship model is based on a social network, so entities can have a follower/followee relationship (for example, a host follows VMs, and VMs are followed by hosts) or a group relationship (e.g., a VM is a member of a datastore). For humans, we define a ‘user’ relationship: a user that is following a VM is said to be ‘using’ it. Strictly speaking, we do not need this extension, but it simplifies certain queries.

Another key way we can achieve scalability is by utilizing the scale-out nature of a graph database. Queries in Neo4j are simple enough to be written by administrators with little knowledge of the VIM API, and the query engine automatically parallelizes all queries, thus resulting in the potential to scale to millions of entities without having to rewrite queries. In contrast, for a user to write scalable queries against vSphere, the user must use the VIM API (since neither the relational DB schema nor the XDB schema is currently public) and must parallelize the queries by hand.

3.2 Querying the graph database

Neo4j supports a SQL-like language called Cypher for querying the graph database. An end user can issue queries to Neo4j interactively through a web interface or programmatically using various language bindings. For our implementation, we utilize the Python py2neo module [24] for accessing Neo4j.

An end user can choose to write sample queries directly to our Neo4j instance. In addition, we also allow access via Socialcast by creating a special Neo4j user. An end user sends a private message containing Cypher queries to the Neo4j user. A special Neo4j server listens for private messages to the Neo4j user and then dispatches these queries to the Neo4j instance. The results are returned in a private message to the requester. An example of this is shown in Figure 4, where user “sprack Human” (a human user with username sprack) is sending a Cypher query to the neo4j user. We give more details about this query and other possible queries in the next section.

VMW-SS-SAMPLE-Neo4j-QUERY-AND-RESULTS

Figure 4. Sample Neo4j query and results. We send a Cypher query to Neo4j through Socialcast and return the results in a private message to the requester. In this example, the query returns links to the entities that are the results of the query.

In addition to allowing the user to send Cypher queries to Neo4j, we also create shorthand versions of command queries. For example, a user can type “find VMs in vCenter A” to find all VMs associated with vCenter A. This can be faster than using the standard Socialcast user interface, in which a user would need to first browse to the vCenter user, then browse to each of its ‘host’ followers to find all VMs associated with that vCenter.

4. Use Cases

In this section, we describe 2 sets of use cases. First, we describe how reorganizing data into a graph database can provide assistance in risk analysis. Second, we describe how a graph database can help in standard day-to-day operations.

4.1 Risk analysis

For risk analysis of a virtualized infrastructure, there are a wide variety of interesting questions one can ask:

  1. Is there a single point of failure?
  2. Is there any single entity with an unusually large number of dependencies?
  3. Are there any datastores with a disproportionately large number of VMDKs?

It is important to note that our simple relationship model makes it very intuitive to express constraints yet still yield results to questions like these. For example, to determine the answers to the above questions, here are the queries required:

1. Is there a single point of failure?

Because this is a very broad question, let us ask a more specific version of the question. Suppose our datacenter has a policy that each host has multiple NICs, and that each NIC should connect to a different network. Therefore, each host should be connected to multiple networks. The code for this query is:

#1 start a=node(*)
#2 match a-[:member]-c
#3 where a.type = “host” and c.type = “Network”
#4 with a, count(c) as fcount
#5 where fcount = 1
#6 return a

In this query, lines #1 and #2 indicate that we should examine every node ‘a’ which is a member of some group ‘c’. Line #3 adds the constraint that the node ‘a’ is a host and the node ‘c’ is a network. Lines #4 and #5 add the constraint that the number of networks that the host is connected to is 1. Line #6 returns all such hosts.

Note that we could have generalized this query in a number of ways. We could have simply changed the group type to be
‘c.type = “datastore”’, and we would have all hosts connected to just a single datastore. No semantic understanding of the topology is required: the fact that networks and datastores are different types changes the query only slightly.

2. Is there a single entity (host or VM) with an unusually large number of dependencies?

This query is useful in case an administrator wishes to find some weak spots in the infrastructure. In this case, the number of dependencies is essentially equivalent to the number of relationships, so we will simply find the number of relationships per entity and return a sorted list. The Neo4j code is straightforward:

#1 start a=node(*)
#2 match a-->(x)
#3 where a.type = “host” or a.type = “VM”
#4 with a, count(*) as fcount
#5 return a, fcount
#6 order by fcount
#7 limit 5

In this query, lines #1 and #2 indicate that we examine every node ‘a’ which has any sort of relationship with any other node (i.e., follower or member of a group). In line #3, we constrain ‘a’ to be either a VM or a host. In line #4, we retain the count of such relationships, and in lines #5, #6, and #7, we show the top 5 such hosts or VMs.

3. Are there any datastores with a disproportionately large number of VMDKs?

Here, we consider datastores with > 500 VMDKs to see if any datastore is unusually highly loaded relative to others. The code for the query is as follows:

#1 start a=node(*)
#2 match a-[:member]-b
#3 where b.type = “VMDK” and a.type = “Datastore”
#4 with a, count(b) as fcount
#5 where fcount > 500
#6 return a, fcount
#7 order by fcount

In this query, lines #1 and #2 indicate that we examine every node ‘a’ where ‘a’ is a member of ‘b’ or ‘b’ is a member of ‘a’ (‘-‘ indicates a bidirectional relationship). In line 3, we constrain ‘a’ to be a datastore and ‘b’ to be a VMDK, since VMDKs are members of datastore groups. Finally, lines #4 through #7 return a sorted list of datastores and how many VMDKs they have. As an optimization, since the only group that a VMDK belongs to is a datastore, we actually do not need ‘a.type = “Datastore”’.

One might alternately ask if there are any datastores with a disproportionately large number of VMs. The query is similar.

4.2 Day-to-day operations

Our graph approach can also be used for day-to-day operations. We give a few examples in this section.

4.2.1 Assessing impact of downtime

Suppose an administrator wants to move VMs from one network to another. The administrator would like to know which users would be affected by this. Because of the ‘user’ type in our graph, this is trivial to determine. The query looks like this:

#1 start a=node:names(uuid=”X”)
#2 match a-[:member]-c-[:following]-d
#3 where c.type = “VM” and d.type = “User”
#4 return d.name

In line #1, we find the node whose UUID is ‘X’, where X is the network that we care about. In line #2, we then look for all nodes ‘c’ that are a member of this network ‘a’ and also have a follower ‘d’. In line #3, we further specify that ‘c’ is a VM and ‘d’ is a user. Line #4 returns the names of such users.

4.2.2 Disaster recovery planning

Another example is disaster recovery planning. It would be helpful to quickly assess how many VMs would be affected if a certain datastore/network combination went down (perhaps it is a converged fabric). Again, such questions can be answered using the VIM API, but involve somewhat onerous coding.

Here is one way we can perform such a query using Neo4j:

#1 start 
#2 a=node:names(uuid=”X”),b=node:names(uuid=”Y”)
#3 match a-[:member]-c
#4 where b-[:member]-c and c.type = “VM”
#5 return c

Line #1 starts the query. In line #2, we start with nodes ‘a’ and ‘b’ whose UUIDs are X and Y, representing the network and datastore that we care about. In line #3, we then find all nodes ‘c’ with a ‘member’ relationship to ‘a’. These could be hosts or VMs. In line #4, we take these nodes ‘c’ and prune them by determining if they have a member relationship with ‘b’ and are of type ‘VM’. These are the VMs that are members of network X and datastore Y.

The prior example assumes that a VM is a direct member of a datastore group. However, we might imagine instead making VMs follow VMDKs, and making VMDKs the only members of datastore groups. In that case, we could find the same information as in the previous query using the following code:

#1 start 
#2 a=node:names(uuid=”X”), b=node:names(uuid=”Y”)
#3 match 
#4 a-[:member]-c, c-[:following]-d-[:member]-b
#5 where c.type = “VM” and d.type = “VMDK”
#6 return distinct(c)

Here, we are finding nodes ‘c’ that are following node ‘d’, where node ‘c’ is a VM, node ‘d’ is a VMDK, and node ‘d’ has a member relationship with ‘b’. Because a VM may have multiple VMDKs, we use ‘distinct’ in line #6. 

4.2.3 Enforcing policies: Linked clones

A final example involves linked clones. We create a user for each VMDK in an infrastructure, and VMs that use a given VMDK are created as followers of VMDK. While it may seem a bit non-intuitive to treat VMDKs as users, we can use it to determine whether a VMDK is a linked clone, because such a VMDK would have multiple followers. Suppose for performance reasons an IT administrator would like to enforce a policy in which a base disk should be shared by no more than 8 linked clones. This query is expressed as follows:

#1 start 
#2 a=node(*)
#3 match a-[:following]-c
#4 where a.type = “VM” and c.type = “VMDK”
#5 with c, count(a) as fcount
#6 where fcount > 8
#7 return c, fcount
#8 order by fcount

5. Related Work

This work heavily leverages the initial ideas presented in [5]. The initial proposal in [5] involved utilizing concepts of social media (including ‘likes’ and ‘follows’) to reduce information overload and provide an intuitive interface for monitoring a virtualization infrastructure. In addition, the initial proposal also made brief mention of using connectivity. In this work, we take a step back and consider the issue of connectivity in more detail. We take a broader look at the kinds of questions that can be answered using connections and use a graph database populated by virtualization entities and relationships as both the storage medium and query engine.

Some of the questions we pose are related to connections between VMs, datastores, and networks. The network and storage fabrics in upcoming releases of the VMware vCloud® Suite 6 [9] will likely need to perform a similar task, requiring complex constraint solvers in order to provision VMs according to storage and network profiles. At present, the questions we address are more related to the connections themselves rather than the capabilities of the components. For example, we are less concerned with the actual IOPS provided by a “Gold” datastore, and are simply concerned with whether two VMs are both connected to the same datastore. We are basically considering the infrastructure itself as information, not the properties themselves. In addition, many of our constraints are simple equalities (e.g., can X talk to Y?).

Management tools like vCenter Operations Manager™ [17] (and indeed other management software [21][22]) contain a wealth of data about the connections between entities. However, to the best of our knowledge, these tools do not explicitly expose an API for determining these connections. In addition, these tools do not currently allow the same type of “what-if” queries that we propose in this paper, though this would be a straightforward extension of the current functionality. Graph database developers have mentioned that graph databases could be applicable to datacenter and network management [23], but we are unaware of any product that explicitly provides this functionality.

While it is not currently exposing its underlying graph database, Socialcast is now conforming [19] to the Open Graph protocol [18] used by Facebook graph search. In this protocol, user data is annotated with metadata that is accessible via a strongly typed API. The default metadata includes an actor, an object, an app, and an action. For example, an actor might be “User XYZ”, the object might be the book “War and Peace”, and app might be the “Goodreads” application [20], and the action might be “finished reading”, with the story being “User XYZ finished reading War and Peace on Goodreads.” As the example illustrates, this protocol is unrelated to the structure of the connectivity, and is instead useful for telling a story. We can, however, speculate about combining our work with the Open Graph Protocol. Rather than telling a conventional user story, however, we could extend this API and associate data with individual virtual machines, and the story could represent the lifecycle of the VM. For example, one story could be “VM X was created by extension Y and deployed on vCenter Z.”

Various companies have used Socialcast as a way to aggregate different message streams [6]. These proposals have not sought to analyze the connectivity graph, however, and build queries that leverage this graph.

6. Conclusions and Future Work

A vSphere installation contains a treasure trove of data, but the barrier to entry to accessing and using this data is very high. In this work, we propose extending the social media metaphors proposed in [5] to provide simple, intuitive access to connectivity data in vSphere. Specifically, we propose reorganizing some of the data in a virtualization inventory into a graph database in which relationships are modeled after social media relationships. This simple transformation of the data enables us to solve a number of virtualization administrator pain points. Some examples of such issues include determining weak spots in an infrastructure (such as single points of failure) or identifying precarious configurations (such as a single VM with many dependencies across multiple geographies). We describe our prototype implementation using the Neo4j graph database and present some preliminary results. We would like to emphasize that although all of the data we collect and store is currently available in the various databases in a virtualized infrastructure, none is specifically tuned for connectivity questions, even though these are among some of the more common questions that administrators would like to answer.

So far, we have only skimmed the surface in terms of the types of data we can leverage. For example, we could re-examine the data in Socialcast and ask questions like “which host has had the most comments in the last hour?” If we employ the Neo4j UI, which shows nodes as circles, we could size the circles according to the number of comments. For nodes that are hosts, we could collect VMware vSphere warning messages per ESX host and graphically depict hosts with more messages as larger circles. We could also ask questions like “find me a host with 2GB free and attached to network X.” As mentioned earlier, some of these questions are presumably relevant to the storage and network fabrics in vCloud Suite, although the data and relationship info resides in Neo4j rather than within the current vCloud Suite databases.

In our work, we have leveraged the fact that a social network has a very simple model for connectivity: users follow or are followed, and users are members of groups. Our queries use type information for entities (e.g., VM, host, datastore group, or human) as inputs to queries, but because of the simple model, the queries can easily be generalized. For example, a query that finds all VMs in a datastore can trivially be tailored to find all VMs in a network. Fairly simple joins are also possible: we have seen, for example, that a query for finding all VMs that share a certain network or datastore is quite easily expressible in our simple graph model. Going forward, we hope to add more dynamic relationships: for example, we could use Application Discovery Manager [11] or another tool to determine which VMs talk to one another. We already have prototype code to map vCenters to other elements of the vCloud Suite like vCloud Director® [8] (VCD) and to map network elements like vShield [12] Edge VMs to their network managers (vShield Manager). We can extend this work to encompass all entities in the Cloud Infrastructure Suite. Moreover, the compactness of our language and the close logical match between the query and the flow of the generated code perhaps suggests that a Facebook-graph-search interface may be possible for asking the kinds of infrastructure questions we have posed. The ability to simply ‘Facebook-search’ a question like “Are any of my datastores overloaded with too many VMs?” would be extremely helpful to customers.

References

  1. Google. Google Plus. https://plus.google.com
  2. Graph Databases. http://en.wikipedia.org/wiki/Graph_database
  3. Facebook. Graph Search. http://www.facebook.com/about/graphsearch
  4. Neo4j. http://www.neo4j.org
  5. Soundararajan, R., et al. A Social Media Approach to Virtualization Management. VMware Technical Journal, November 2012.
  6. Socialcast. Making Sharepoint Social: Integrating Socialcast and SharePoint Using Reach and API. http://blog.socialcast.com/making-sharepoint-social-integrating-socialcast-and-sharepoint-using-reach-and-api
  7. Socialcast. Socialcast Developer API. http://www.socialcast.com/resources/api.html
  8. VMware. vCloud Director. http://www.vmware.com/products/vcloud-director/overview.html
  9. VMware. vCloud Suite. http://www.vmware.com/products/datacenter-virtualization/vcloud-suite/overview.html
  10. VMware. vCenter Operations Manager. http://www.vmware.com/products/datacenter-virtualization/vcenter-operations-management/overview.html
  11. VMware. VMware vCenter Application Discovery Manager. http://www.vmware.com/products/application-discovery-manager/overview.html
  12. VMware. VMware vShield. http://www.vmware.com/products/vshield/overview.html
  13. VMware. VMware vSphere. http://www.vmware.com/products/datacenter-virtualization/vsphere/overview.html
  14. VMware. vSphere API Reference Documentation. https://www.vmware.com/support/developer/vc-sdk/visdk41pubs/ApiReference/index.html
  15. Merenyi, R. What are the differences between relational and graph databases? http://www.seguetech.com/blog/2013/02/04/what-are-the-differences-between-relational-and-graph-databases
  16. Socialcast Developer Resources. Deploying Socialcast. https://scdevresource.pbworks.com/w/page/3608421/Deploying%20Socialcast
  17. VMware. vCenter Operations Manager. http://www.vmware.com/products/datacenter-virtualization/vcenter-operations-management/overview.html
  18. Facebook. Open Graph Protocol. https://developers.facebook.com/docs/opengraph
  19. Socialcast is now a consumer of Open Graph Protocol. https://groups.google.com/forum/#!topic/open-graph-protocol/d_NNHb5h1AM
  20. Goodreads. http://www.goodreads.com
  21. IBM. IBM Tivoli Software: http://www-01.ibm.com/software/tivoli
  22. Hewlett-Packard. HP Insight. http://h18013.www1.hp.com/products/servers/management/index.html
  23. Eifrem, E. “Top 5 Reasons to Get Your Graph On”, http://siliconangle.com/blog/2013/01/22/top-5-reasons-to-get-your-graph-on/
  24. Py2neo. http://py2neo.org