Author Image

Befikir T. Bogale

Befikir T. Bogale

Graduate Research Assistant at Global Computing Lab

I am a Ph.D. student in Computer Science advised by Dr. Michela Taufer at the University of Tennessee, Knoxville. I received my Bachelors in Computer Science in Spring 2024 from UTK. My research interests are developing tools for performance analysis in High Performance Computing (HPC) environments.

Experiences

1
University of Tennessee

Oct 2022 - Present

Knoxville, TN

Graduate Research Assistant

Aug 2024 - Present

Responsibilities:
  • Building an LLVM pass plugin to expose compiler remark information to annotation profiling tools like Caliper
  • Conducting performance analysis to evaluate the impact of different compilers and optimization levels on application performance
Graduate Research Assistant

Oct 2022 - May 2024

Responsibilities:
  • Developed containerized images using Singularity/Apptainer to enhance portability and reproducibility of HPC applications
  • Researched and mitigated sources of non-determinism in scientific HPC applications to improve reliability and accuracy
  • Implemented a checkpointing framework for neural networks leveraging deduplication to efficiently store epoch history
  • Collaborated with researchers at Lawrence Livermore National Laboratory and Argonne National Laboratory on HPC projects.

Lawrence Livermore National Laboratory

May 2024 - Aug 2024

Livermore, CA

Graduate Computing Summer Intern

May 2024 - Aug 2024

Responsibilities:
  • Developed a cluster-based methodology to characterize the performance of portable HPC applications across diverse architectures
  • Conducted a performance study of different CPUs and GPUs with different types of memory using the RAJA Performance Suite with other members of the Thicket team
2

3
Los Alamos National Laboratory

Junk 2023 - Aug 2023

Los Alamos, NM

Parallel Computing Intern

Junk 2023 - Aug 2023

Responsibilities:
  • Parallelized X-ray transport simulations to improve computational efficiency and scalability
  • Leveraged Kokkos for portability across multiple architectures, utilizing vectorization and thread team policies
  • Optimized performance, achieving over 13× speedup in parallelized code compared to the serial implementation

Skills

Education

University of Tennessee
2024-Present
PhD in Computer Science (High-Performance Computing Concentration)
University of Tennessee
2020-2024
BSc in Computer Science

Research Projects

Thicket
Thicket
Developer and Researcher Jan 2024 - Present

Thicket is a Python-based toolkit for analyzing ensemble performance data. It is also built on top of Hatchet, allowing for the same benefits that Hatchet provides.

Hatchet
Hatchet
Developer and Researcher Jan 2024 - Present

Hatchet is a Python library that enables users to analyze performance data generated by different HPC profilers. Its main advantage over other tools is that it is capable of ingesting data from different profilers into a common representation, allowing users to use the same code to analyze performance data from different sources.

ANACIN-X
ANACIN-X
Developer and Researcher Oct 2022 - May 2024

ANACIN-X is a suite of tools designed for trace-based analysis of non-deterministic behavior in MPI applications, helping developers and scientists identify root sources of non-determinism. It features a framework for characterizing non-determinism through graph similarity, consisting of execution trace collection, event graph construction, kernel analysis, and distance visualization. Additionally, it includes use cases focused on communication patterns to enhance understanding and reproducibility in HPC applications.

Publications

Maintaining performant code in a world of fast-evolving computer architectures and programming models poses a significant challenge to scientists. Typically, benchmark codes are used to model some aspects of a large application code’s performance, and are easier to build and run. Such benchmarks can help assess the effects of code or algorithm changes, system updates, and new hardware. However, most performance benchmarks are not written using a wide range of GPU programming models. The RAJA Performance Suite provides a comprehensive set of computational kernels implemented in a variety of programming models. We integrated the performance measurement and analysis tools Caliper and Thicket into the RAJAPerf to facilitate performance comparison across kernel implementations and architectures. This paper describes the RAJAPerf, performance metrics that can be collected, and experimental analysis with case studies.

Towards Affordable Reproducibility Using Scalable Capture and Comparison of Intermediate Multi-Run Results

Ensuring reproducibility in high-performance computing (HPC) applications is a significant challenge, particularly when nondeterministic execution can lead to untrustworthy results. Traditional methods that compare final results from multiple runs often fail because they provide sources of discrepancies only a posteriori and require substantial resources, making them impractical and unfeasible. This paper introduces an innovative method to address this issue by using scalable capture and comparing intermediate multi-run results. By capitalizing on intermediate checkpoints and hash-based techniques with user-defined error bounds, our method identifies divergences early in the execution paths. We employ Merkle trees for checkpoint data to reduce the I/O overhead associated with loading historical data. Our evaluations on the nondeterministic HACC cosmology simulation show that our method effectively captures differences above a predefined error bound and significantly reduces I/O overhead. Our solution provides a robust and scalable method for improving reproducibility, ensuring that scientific applications on HPC systems yield trustworthy and reliable results.

Professional Services

Served as a Student Volunteer at SC24. In this role, I helped ensure the sessions of the conference ran smoothly. Additionally, I performed other miscellaneous tasks, such as keeping track of the number of attendees in the sessions at which I was working.

Posters

Cluster-Based Methodology for Characterizing the Performance of Portable Application

This work focuses on performance portability and proposes a methodological approach to assessing and explaining how different kernels behave across various hardware architectures using the RAJA Performance Suite (RAJAPerf). Our methodology leverages metrics from the Intel top-down pipeline and clustering techniques to sort the kernels based on performance characteristics. We assess the methodology on 54 RAJAPerf’s computational kernels on Intel Xeon and NVIDIA V100 platforms. Our results confirm the effectiveness of our methodology in automatically characterizing performance differentials and speedups, particularly in memory-bound kernels.

Achievements, Honors, and Scholarships

Participated in the Graduate Student track of the ACM Student Research Competition, presenting my poster “Cluster-Based Methodology for Characterizing the Performance of Portable Applications”

Graduate Fellowship

Awarded the Graduate Fellowship at the University of Tennessee, Knoxville