I am a Ph.D. student in Computer Science advised by Dr. Michela Taufer at the University of Tennessee, Knoxville. I received my Bachelors in Computer Science in Spring 2024 from UTK. My research interests are developing tools for performance analysis in High Performance Computing (HPC) environments.
Oct 2022 - Present
Knoxville, TN
Aug 2024 - Present
Oct 2022 - May 2024
May 2024 - Aug 2024
Livermore, CA
May 2024 - Aug 2024
Junk 2023 - Aug 2023
Los Alamos, NM
Junk 2023 - Aug 2023
University of Tennessee2024-Present PhD in Computer Science (High-Performance Computing Concentration) | ||
University of Tennessee2020-2024 BSc in Computer Science |
Thicket is a Python-based toolkit for analyzing ensemble performance data. It is also built on top of Hatchet, allowing for the same benefits that Hatchet provides.
Hatchet is a Python library that enables users to analyze performance data generated by different HPC profilers. Its main advantage over other tools is that it is capable of ingesting data from different profilers into a common representation, allowing users to use the same code to analyze performance data from different sources.
ANACIN-X is a suite of tools designed for trace-based analysis of non-deterministic behavior in MPI applications, helping developers and scientists identify root sources of non-determinism. It features a framework for characterizing non-determinism through graph similarity, consisting of execution trace collection, event graph construction, kernel analysis, and distance visualization. Additionally, it includes use cases focused on communication patterns to enhance understanding and reproducibility in HPC applications.
Maintaining performant code in a world of fast-evolving computer architectures and programming models poses a significant challenge to scientists. Typically, benchmark codes are used to model some aspects of a large application code’s performance, and are easier to build and run. Such benchmarks can help assess the effects of code or algorithm changes, system updates, and new hardware. However, most performance benchmarks are not written using a wide range of GPU programming models. The RAJA Performance Suite provides a comprehensive set of computational kernels implemented in a variety of programming models. We integrated the performance measurement and analysis tools Caliper and Thicket into the RAJAPerf to facilitate performance comparison across kernel implementations and architectures. This paper describes the RAJAPerf, performance metrics that can be collected, and experimental analysis with case studies.
Ensuring reproducibility in high-performance computing (HPC) applications is a significant challenge, particularly when nondeterministic execution can lead to untrustworthy results. Traditional methods that compare final results from multiple runs often fail because they provide sources of discrepancies only a posteriori and require substantial resources, making them impractical and unfeasible. This paper introduces an innovative method to address this issue by using scalable capture and comparing intermediate multi-run results. By capitalizing on intermediate checkpoints and hash-based techniques with user-defined error bounds, our method identifies divergences early in the execution paths. We employ Merkle trees for checkpoint data to reduce the I/O overhead associated with loading historical data. Our evaluations on the nondeterministic HACC cosmology simulation show that our method effectively captures differences above a predefined error bound and significantly reduces I/O overhead. Our solution provides a robust and scalable method for improving reproducibility, ensuring that scientific applications on HPC systems yield trustworthy and reliable results.
Served as a Student Volunteer at SC24. In this role, I helped ensure the sessions of the conference ran smoothly. Additionally, I performed other miscellaneous tasks, such as keeping track of the number of attendees in the sessions at which I was working.
This work focuses on performance portability and proposes a methodological approach to assessing and explaining how different kernels behave across various hardware architectures using the RAJA Performance Suite (RAJAPerf). Our methodology leverages metrics from the Intel top-down pipeline and clustering techniques to sort the kernels based on performance characteristics. We assess the methodology on 54 RAJAPerf’s computational kernels on Intel Xeon and NVIDIA V100 platforms. Our results confirm the effectiveness of our methodology in automatically characterizing performance differentials and speedups, particularly in memory-bound kernels.
Participated in the Graduate Student track of the ACM Student Research Competition, presenting my poster “Cluster-Based Methodology for Characterizing the Performance of Portable Applications”
Awarded the Graduate Fellowship at the University of Tennessee, Knoxville