Project Highlights
Exploring Software-Based Parallel Volume Rendering on the IBM Blue Gene/P
![]() |
Image (generated by Tom Peterka, Rob Ross of Argonne National Laboratory and Hongfeng Yu, Kwan-Liu Ma of the University of California at Davis) shows the angular momentum at time step 1492 of a simulation of a core-collapse supernova. Dataset courtesy of John Blondin of North Carolina State University and Anthony Mezzacappa of Oak Ridge National Laboratory. This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357. [high-res] |
As data sizes and supercomputer architectures grow toward petascale and beyond, software-based visualization performed directly on parallel supercomputers offers an attractive alternative to rendering on graphics clusters. Benefits include the elimination of data movement between computation and visualization architectures; the economies of large-scale, tightly coupled parallelism; and the possibility of visualizing a simulation in situ. This direct visualization can be accomplished with volume rendering, a common general-purpose technique for visualizing scientific data sets.
Approach
In collaboration with the U.S. Department of Energy's SciDAC Institute for Ultrascale Visualization, researchers at Argonne National Laboratory are exploring the feasibility of software-based volume rendering at massively parallel scales. For the first time, the researchers implemented a parallel ray-casting volume rendering algorithm on the 557-teraflop IBM Blue Gene/P (BG/P) at the Argonne Leadership Computing Facility (ALCF) and demonstrated its scalability to more than 30,000 cores. In this initiative, they used the BG/P’s Intrepid and Surveyor systems to analyze astrophysics data from the simulation of a supernova core collapse. The dataset used – 30 time steps from a supernova simulation of 200 time steps – was provided by scientists at Oak Ridge National Laboratory (ORNL) and North Carolina State (NCS).
The algorithm is written using MPI (message passing interface) for both communication and collective I/O and is based on a sort-last parallelism approach. A set of experiments was run under a number of different conditions, including dataset size, number of processors, low- and high-quality rendering, offline storage of results, and streaming of images for remote display. The image size was chosen so that the number of pixels in the image scaled with the number of voxels in the volume.
By examining the relative costs of the I/O, rendering, and compositing portions of the algorithm, the researchers discovered room for improvement in data input, load balancing, memory usage, image compositing, and image output. By making four improvements to the basic algorithm, the researchers addressed the bottlenecks they encountered. They demonstrated the benefit of an alternative rendering distribution scheme that improves load balance and how to scale memory usage so that large data and image sizes do not overload system memory. To improve compositing, they experimented with a hybrid multithreaded MPI programming model. To mitigate the high cost of I/O, they implemented multiple parallel pipelines to partially hide the I/O cost when rendering many time steps.
Results/Accomplishments
The researchers successfully demonstrated strong scaling, large data and image sizes, improved compositing, improved and benchmarked I/O, load balancing, memory scalability, a hybrid programming model, and parallel pipelines.
“We were able to scale up to large problem sizes of over 80 billion voxels per time step and generated images up to 16 megapixels,” notes Tom Peterka, postdoctoral appointee in the Mathematics and Computer Science Division at Argonne.
Measuring the benefits of these techniques at scale reinforces the conclusion that the Blue Gene/P is an effective platform for volume rendering of large datasets and that the researchers’ volume rendering algorithm scales to both large problem and system sizes.
Future Efforts
Researchers are continuing to scale up to larger data and increased cores. In doing so, they are discovering new challenges, such as rewriting the compositing part of the algorithm. More efficient compositing is a priority for successful operation at tens of thousands of cores.
In addition, they are focusing on improving aggregate I/O bandwidth and I/O scalability, which directly affect the application’s performance. They are investigating different I/O file formats and ways to increase the aggregate I/O bandwidth.
The researchers also are exploring the possibilities of other visualization algorithms, as well as the potential for coupling these large-scale visualizations with interactive desktop environments.
The researchers welcome involvement from scientists who could provide input on their large-scale visualization and analysis work. If interested, please e-mail one of the contacts shown below.
Contacts
Tom Peterka
Rob Ross
Mark Hereld

