VMware engineers Alex Depoutovitch, Andrei Warkentin, and Hariharan Subramanian had papers accepted to the Linux symposium. You can read the abstracts here, or download the full documents via the links below.
Abstract: Linux MD software RAID1 is used ubiquitously by end users, corporations and as a core technology component of other software products and solutions, such as the VMware vSphere Storage Appliance (vSA). MD RAID1 mode provides data persistence and availability in face of hard drive failures by maintaining two or more copies (mirrors) of the same data. vSA makes data available even in the event of a failure of other hardware and software components, e.g. storage adapter, network, or the entire vSphere server. For recovering from a failure, MD has a mechanism for change tracking and mir- ror synchronization.
However, data synchronization can consume a significant amount of time and resources. In the worst case scenario, when one of the mirrors has to be replaced with a new one, it may take up to a few days to synchronize the data on a large multi-terabyte disk volume. During this time, the MD RAID1 volume and contained user data are vulnerable to failures and MD operates below optimal performance. Because disk sizes continue to grow at a much faster pace compared to disk speeds, this problem is going to become only worse in the nearest future.
This paper presents a solution for improving the syn- chronization of MD RAID1 volumes by leveraging information already tracked by file systems about disk uti- lization. We describe and compare three different implementations that tap into the file system and assist the MD RAID1 synchronization algorithm by avoiding copying unused data. With real-life average disk utilization of 43% , we expect that our method will halve the full synchronization time of a typical MD RAID1 volume compared to the existing synchronization mechanism.
Abstract: Conventionally, file systems manage storage space available to user programs and provide it through the file interface. Information about the physical location of used and unused space is hidden from users. This makes the file system free space unavailable to other storage stack kernel components because of performance or layering violation reasons. This forces file systems architects to integrate additional functionality, like snapshotting and volume management, inside file systems increasing their complexity.
We propose a simple and easy to implement file system interface that allows different software components to efficiently share free storage space with a file system at a block level. We demonstrate the benefits of the new interface by optimizing an existing volume manager to store snapshot data in the file system free space, instead of requiring the space to be reserved in advance making it unavailable for other uses.
Unix environments have traditionally been multiuser and diverse multicomputer configurations, backed by expensive network-attached storage. The recent growth and proliferation of desktop- and single machine- centric GUI environments, however, has made it very difficult to share a network-mounted home directory across multiple machines. This is especially noticeable in the context of concurrent graphical logins or logins into systems with a different installed software base. The typical offenders are “modern” bits of software such as desktop environments (e.g., GNOME), services (dbus, PulseAu- dio), and applications (Firefox), which all abuse dot- files. Frequent changes to configuration format mean the same set of configuration files cannot be easily used across even close versions of the same software. And whereas dotfiles historically contained read-once configuration, they are now misused for runtime lock files and writeable configuration databases, with no effort to guarantee correctness across concurrent accesses and differently-versioned components. Running such software concurrently, across different machines with a network mounted home directory, results in corruption, data loss, misbehavior and deadlock, as the majority of configuration is system-, machine- and installation- specific, rather than user-specific. This paper explores a simpler alternative to rewriting all existing broken software, namely, implementing separate host-specific profiles via filesystem redirection of dotfile accesses. Several approaches are discussed and the presented solution, the Host Profile File System, although Linux-centric, can be easily adapted to other similar environments such as OS X, Solaris and the BSDs.