Scalable High throughput Analysis of APS Image Datasets on ALCF Systems

Shilpika Fnu
Seminar

High throughput image analysis is critical for experimental sciences facilities and enables one to glean timely insights of the various experiments and to better understand the physical phenomena being imaged. We present the design and evaluation of the bank of filters, the core building blocks for image analysis. We identify an optimum set of filter banks, with the ability to switch between different types of filters which could be used for scientific evaluation and analysis in many imaging applications. Toward realizing this vision, we analyze x-ray video images of fuel spray from different combustion engines imaged with variation in fuel quantity, composition, and nozzle pressure. Of particular interest is the morphology of droplets of fuel within the spray: their size and scale varies along and across the spray pattern.  Extracting quantitative data can inform computational models to understand and improve engine performance. We describe our infrastructure developed with Apache Spark. The analysis workflow is scaled to 800 cores of the Cray Urika-GX system and the Cooley system for the Combustion engine dataset imaged at the Advanced Photon Source at Argonne National Laboratory and observe significant speedups. This scalable infrastructure now opens doors to the application of a wide range of image processing algorithms and filters to the large-scale datasets being imaged at various light sources.