Scalable Yet Rigorous Floating-Point Analysis with Application to Error Detection

Arnab Das and Ganesh Gopalakrishnan, University of Utah

As HPC applications have become mainstream in both industry and academia, it has been frequently acknowledged that building and programming such system will be challenging with more complex and heterogeneous system. The inherent complexity of these applications begs the question of how to suitably specify correctness requirements given a wide array of potential vectors to introduce inaccuracies. Additionally, HPC systems are primarily data driven, hence a significant time is either spent in extensive numerical computations or data movement. Round-off errors introduced due to every floating-point operation hence becomes a critical component of correctness specification at such massive scale. While there has been a slew of research directed towards obtaining rigorous round-off error bounds, they have not scaled beyond few dozen operators to be effectively applied to even smaller kernels of HPC sub blocks. Furthermore, adoption of system resilience solutions have been severely affected due to performance impacts and high false positive rates that aggravates the problem of judiciously identifying error sources even further. In this talk, we discuss a scalable yet rigorous technique for analyzing floating-point applications. Furthermore, we show how this technique can be further made effective to synthesize error detection strategies. In particular, we point out how soft-error detection methods can also help guard against incorrect polyhedral compilations that also may create aberrant values. Our methodology improves over current state of the art by four orders of magnitude. Next, we introduce a novel synthesis technique to enable analysis over codes with branch conditions. This enables us to extend our rigorous analysis technique to be applicable to conditional codes prevalent in many geometric libraries and control software. Lastly, we show how such analysis techniques can be suitably geared towards application specific error detector synthesis. To this end, we exploit the floating-point behavior of applications (in our case for stencils) to efficiently synthesize detectors that are optimally placed for detecting logical and soft errors with robust precision guarantees.

As HPC applications have become mainstream in both industry and academia, it has been frequently acknowledged that building and programming such system will be challenging with more complex and heterogeneous system. The inherent complexity of these applications begs the question of how to suitably specify correctness requirements given a wide array of potential vectors to introduce inaccuracies. Additionally, HPC systems are primarily data driven, hence a significant time is either spent in extensive numerical computations or data movement. Round-off errors introduced due to every floating-point operation hence becomes a critical component of correctness specification at such massive scale. While there has been a slew of research directed towards obtaining rigorous round-off error bounds, they have not scaled beyond few dozen operators to be effectively applied to even smaller kernels of HPC sub blocks. Furthermore, adoption of system resilience solutions have been severely affected due to performance impacts and high false positive rates that aggravates the problem of judiciously identifying error sources even further. In this talk, we discuss a scalable yet rigorous technique for analyzing floating-point applications. Furthermore, we show how this technique can be further made effective to synthesize error detection strategies. In particular, we point out how soft-error detection methods can also help guard against incorrect polyhedral compilations that also may create aberrant values. Our methodology improves over current state of the art by four orders of magnitude. Next, we introduce a novel synthesis technique to enable analysis over codes with branch conditions. This enables us to extend our rigorous analysis technique to be applicable to conditional codes prevalent in many geometric libraries and control software. Lastly, we show how such analysis techniques can be suitably geared towards application specific error detector synthesis. To this end, we exploit the floating-point behavior of applications (in our case for stencils) to efficiently synthesize detectors that are optimally placed for detecting logical and soft errors with robust precision guarantees.

Please use this link to attend the virtual seminar.

https://bluejeans.com/128275691