Research Experiences for Undergraduates on HPC and Deep Learning (Virtual)
Deep learning (DL) has rapidly evolved to a state-of-the-art technique in many science and technology disciplines, such as scientific exploration, national security, smart environment, and healthcare. Many of these DL applications require using high-performance computing (HPC) resources to process large amounts of data. Researchers and scientists, for instance, are employing extreme-scale DL applications in HPC infrastructures to classify extreme weather patterns and high-energy particles. In recent years, using Graphics Processing Units (GPUs) to accelerate DL applications has attracted increasing attention. However, the ever-increasing scales of DL applications bring many challenges to today’s GPU-based HPC infrastructures. The key challenge is the huge gap (e.g., one to two orders of magnitude) between the memory requirement and its availability on GPUs. This project aims to fill this gap by developing a novel framework to reduce the memory demand effectively and efficiently via data compression technologies for extreme-scale DL applications. The proposed research will enhance the GPU-based HPC infrastructures in broad communities for many scientific disciplines that rely on DL technologies. The project will connect machine learning and HPC communities and increase interactions between them. Educational and engagement activities include developing new curriculum related to data compression, mentoring a selected group of high school students in a year-long research project for a regional Science Fair competition, and increasing the community's understanding of leveraging HPC infrastructures for DL technologies. The project will also encourage student interest in research related to DL technologies on HPC environment and promote research collaborations with multiple national laboratories.
Development and Integration of Optimized GPU-based Huffman Coding: Huffman coding is arguably the most efficient Entropy coding algorithm in information theory, such that it could be found as a fundamental step in many modern compression algorithms such as cuSZ (Tian et al. 2020). However, Huffman encoding suffers from low throughput on GPUs, resulting in a significant bottleneck in the entire data processing. In our recent work (Tian et al. 2021), we proposed and implemented an efficient Huffman encoding approach for modern GPU architectures (such as NVIDIA V100) using CUDA, but the current code is tightly coupled with the cuSZ lossy compressor. Thus, in this task, we will help develop high-level CUDA C++ Huffman encoder/decoder APIs with lightweight or zero dependency based on the current tightly coupled code, and then form a standalone Huffman coding library.
Development and Integration of Adaptive Memory-Reduction Solution: In our recent work (S. Jin et al. 2021), we proposed COMET utilizing cuSZ to reduce the memory consumption during DNN training with theoretical analysis of accuracy impact. One of the insights we found is that for some convolutional layers with small-sized kernels (such as 1x1 kernels), compression introduces a relatively large overhead compared to the low time cost of convolution computation. Thus, for these layers, recomputation could be a better approach than compression to reduce memory consumption. To this end, one REU student will help characterize layers in terms of type, kernel size, convolution computation and compression speed, and determine the best-fit memory-reduction approach (compression or recomputation) for each class of layers, and integrate the proposed adaptive solution to COMET based on TensorFlow and provide necessary APIs for fine tuning.
Application deadline: June 15, 2021
Tentative program period: June 15 - August 15, 2021
Support for travel.
U.S. citizen or permanent resident.
Must be enrolled in an undergraduate institution (2 or 4-year) and graduate after September 2021.
Sophomore or higher class standing with 3.25 or higher cumulative GPA.
Computer science, mathematics, electrical engineering, or other computational sciences.
Programming experiences in C/C++ and Python under Linux platform. Preferred parallel programming experience (e.g., OpenMP, MPI, CUDA) and/or deep learning programming experience (e.g., TensorFlow, PyTorch).
Students from underrepresented minorities and community colleges that do not offer research opportunities for undergraduates are strongly encouraged to apply!
Prof. Dingwen Tao (firstname.lastname@example.org)
These REU activities are supported by NSF Grant OAC-2034169. Thanks for NSF's support!