msr-safe, libmsr and the Variorum project: Hoisting low-level processor features into userspace

Intel processors have a wealth of features for power and energy measurement and control, cache allocation and performance telemetry. Accessing these features requires "ring 0" access, which in practice means working within the operating system kernel. Performing kernel-level research on production supercomputers tends to be slow at best, with the review process for new kernel code being understandably conservative.

My team at LLNL has solved this problem by creating the msr-safe linux kernel module. msr-safe allows users to program individual model-specific registers (which control a majority of the features we're interested in), but also allows system administrators to provide group-level, bitwise whitelist control over which registers and portions of registers are exposed. This approach has allowed us to minimize the amount of code running in the kernel and moved the security evaluation to whether specific users may be trusted with specific capabilities. By doing so, we have been able to do cutting-edge systems research well before the official linux kernel provided "approved" device drivers, and this in turn has allowed us to influence the direction Intel is taking with future architectures. libmsr provides a "friendly --- well, friendlier" interface to the most commonly-used processor features. The Variorum project, funding in part by a recent TechBase round at LLNL, expands o ur approach to other architectures (Power9, nVidia, ARM, Xilinx) as well as other protocols (particularly IPMI and CSRs/PCI).

The talk will cover a handful of otherwise-inaccessible processor features and how we've leveraged them. I will also touch on practical issues of doing systems research on production machines, security issues, feature documentation issues, and how msr-safe has deepened our vendor collaboration and commitments. Feature requests are welcome, as are patches.

Argonne Leadership Computing Facility

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

msr-safe, libmsr and the Variorum project: Hoisting low-level processor features into userspace

10/18/2018, 6am CT