DRS Doctor is a command line tool that can be used to diagnose DRS behavior in VMware vCenter clusters. When run against a DRS enabled cluster, it records information regarding the state of the cluster, the work load distribution, DRS moves, etc., in an easy to read log format.
The goal of DRS Doctor is to give VI admins better insight into DRS and the actions it performs. It is very useful for analyzing DRS actions and troubleshooting issues with very little overhead. This is also an easy way for support engineers to read into customer environments without having to rely on developers to debug DrmDump logs in order to troubleshoot simple DRS issues.
DRS Doctor connects to the vCenter server and tracks the list of cluster related tasks and actions. It also tracks DRS recommendations generated and reasons for each recommendation, which is currently only available in a hard-to-read format in DrmDump files. At the end of each log, it dumps the Host and VM resource consumption data to give a quick overview of cluster state. It also provides an operational audit at the end of each log file.
Prerequisites for Installation:
- Requires Python 2.7.6 or higher
- Requires Python modules “pyyaml” and “pyvmomi”
Note: The VMware vSphere API Python Bindings can be found here: https://github.com/vmware/pyvmomi
For Python versions less than 2.7.9, the pyVmomi version should be 5.5 (pip install pyvmomi==18.104.22.1684.1.1). If using Python 2.7.9+ the version 6.0 of pyvmomi can be used.
For certificate validation (SSL) support, Python 2.7.9 or above and pyVmomi 6.0 is required.
DRS must be in partially automated mode in order for DRS Doctor to work. If your cluster is in fully automated mode, DRS Doctor will automatically change the mode to partially automated mode and apply the load balancing recommendations based on the threshold configured. (It will act just as it would in fully automated mode.) *Note: If you close DRS doctor you will need to ensure that the DRS automation settings get reverted to fully-automated mode.
Try to run DRS Doctor in close network proximity to your vCenter Server
DRS Doctor is platform Independent and can run on any OS platform with a python environment.
Note: For installation steps on CentOS 6.5, please see the linked pdf.
How to Deploy/Use DRS Doctor:
- Install python 2.7.6 or higher
- Install python modules “pyyaml” and “pyvmomi”
- Review the ReadMe.txt in the source folder on how to configure and run the tool against your VC. The configuration is pretty straight forward.
- If you need to install pyvmomi, you can install with "pip install pyvmomi"
There is also a supplied document to explain how to install on CentOS 6.5.
Compatibility with vCenter Versions:
- DRS Doctor is compatible with vCenter versions 5.5 and 6.0
- SSL Certificate-based Authentication:
- Supported for vCenter 6.0 (see the Prerequisites section)
Frequency and Amount of Data Queried from vCenter:
There are three data collection modules:
- Tasks: Uses vCenter’s push model, where data is pushed to the client when available (very little performance overhead)
- Cluster Properties: Here again DRS Doctor uses the push model, where relevant cluster properties are pushed to the client when available
- VM Performance Statistics: These are collected for all VMs in sync with the default DRS interval (once every 5 minutes)
Typical Log Size and Count:
Log sizes depend on the size of the cluster being monitored, and is mostly driven by the number of VMs. Expected log sizes are between 300KB to 10MB.
Interpreting the Logs:
DRS Doctor maintains an easy-to-read log file format. There is one log file for every 5 mins (time interval that corresponds to a DRS run interval). Each log file contains the following information:
- DrmDump data in a readable format, collected in sync with DrmDump logs
- Tasks that ran in VC during that interval, and the reason for each task (If initiated by DRS) to help correlate tasks with DRS behavior
- Final audit log summarizing the metrics monitored for the given time interval
How Is the DRS Doctor Log Output Different From What We Can See in the vCenter Client UI?:
- UI doesn't report the reasons for every DRS move (it only posts the number of migrations initiated by DRS)
- UI doesn't provide a summary of cluster operations in the last DRS interval. The list of operations, along with reasons for DRS moves, can provide useful correlations to help understand DRS moves better
- UI doesn't provide the distribution of host/VM resource entitlement across the cluster. This is needed for estimating VM happiness
- Additionally, in the UI it is not immediately obvious as to what advanced DRS options are set in a cluster, or what special cluster configuration is enabled. These options/configurations can influence DRS behavior, so we need to be aware of them
- Changed log naming convention to avoid colon(:) in log names
- Fixed few typos