Introducing On-demand in LCRC: Towards a Convergence of On-demand and Batch Resource Allocation

Event Sponsor: 
CloudX Seminar
Start Date: 
Aug 30 2016 - 12:00pm
Building 240/Room 4301
Argonne National Laboratory
Francis Liu

The LCRC Pilot Project aims to explore a confluence of on-demand availability and environment management on one side, and batch scheduling on the other. The project seeks to develop methods combining on-demand, currently requested by our APS users, and support for batch computing, currently the mode of resource management available in LCRC.

Our proposed architecture is to at any given time dynamically assign and rebalance nodes in the cluster between two pools: an on-demand pool and a batch (on-availability) pool and implement a mechanism that will dynamically move nodes from one pool to the other to maximize on-demand availability, resource utilization, and reduce wait time for batch jobs.

The talk will describe an evaluation of the Balancer architecture developed by the project based on using real APS and LCRC workload traces from the past two years. Results show that our system can maintain high utilization, reduce batch job slowdown by ~50% while still maintaining SLA for on-demand users.

Miscellaneous Information: 

About CloudX: CloudX is a discussion group devoted to exploring synergies between activities within the lab relating to a group of recent innovations, currently known as cloud computing. You can join the mailing list at: