Optimizations on GPU

Abstract:
Accelerator based scientific data parallel computing has led to new application or architecture specific optimizations. One such optimization proposed is ‘kernel coalesce’, that is used for optimizing the concurrent kernel execution on NVIDIA Fermi GPU. GPU consists of Streaming Multiprocessors (SMs) and each SM will have fixed amount of resources in terms of thread blocks, number of threads and registers. Each kernel is defined in terms of grid and each grid is executed in terms of thread blocks. If a grid occupies all the resources, then another grid cannot execute leading to serialization of kernel execution. Kernel coalesce optimization is proposed to prevent kernel serialization due to lack of resources. Thread level coalesce partitions the resources to each kernel by modifying their grid and thread block dimensions to enable concurrent execution. Multi clock cycle coalescing allows sharing of the resources across the kernels. Warp interleaving based coalescing allows slicing of the resources to enable concurrent kernel execution. Further, GPUs are not effective in accessing indirect addresses. Most of the real time applications process data in sparse matrix format that uses indirect addressing. A new format named Bit Level Shift Indexing (BLSI) is proposed to reduce the memory footprint and number of memory accesses per FLOP on GPU. A framework named ‘Sparse Matrix AnalyzeR Tool (SMART)’ is proposed to predict an optimal sparse matrix format for SpMV computations on GPU based on the statistics of the input sparse matrix and the given architecture.

Speaker Biography: Dr. B. Neelima is working as Professor and Head, Department of Information Science and Engineering at NMAM Institute of Technology, Nitte, Karnataka, India. Dr. Neelima has completed her Ph. D. from National Institute of Technology Karnataka (NITK), Surathkal, Karnataka, India in the area of high performance computing. She has completed a R&D project from Department of Science and Technology, Government of India. She has published her work at CCPE, SCPE and JPDC and presented her work at IPDPSW, SC-W, ISPDC, HiPC-SRS to name a few and has around 50 publications in various international and national journals and conferences. She has reviewed papers in CCPE and CLUSTER journals and member of program committee and organization committee of various conferences such as ISPDC-15, ISPDC-16, DISCOVER-16, IC3-17, HiPC, IPDPS-15. She is the coordinator for the GPU Education Centre at NMAMIT, awarded by NVIDIA, USA (July-2012 to Dec.,-2016). She is a senior member of IEEE and ACM, life member of CSI, ISTE, IE-India, member of ACM-W and WIE. She is a member, Executive Committee of IEEE Mangalore subsection and IEEE Computer Society Bangalore Section. She is a faculty sponsor for the NMAMIT IEEE Student Women Affinity Group and NMAMIT ACM-W Student Chapter.

Argonne Leadership Computing Facility

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

03/21/2017, 5:30am CT