Befikir Bogale

Experiences

University of Tennessee

Oct 2022 - Present

Knoxville, TN

Graduate Research Assistant

Aug 2024 - Present

Responsibilities:

Building an LLVM pass plugin to expose compiler remark information to annotation profiling tools like Caliper
Conducting performance analysis to evaluate the impact of different compilers and optimization levels on application performance

Graduate Research Assistant

Oct 2022 - May 2024

Responsibilities:

Developed containerized images using Singularity/Apptainer to enhance portability and reproducibility of HPC applications
Researched and mitigated sources of non-determinism in scientific HPC applications to improve reliability and accuracy
Implemented a checkpointing framework for neural networks leveraging deduplication to efficiently store epoch history
Collaborated with researchers at Lawrence Livermore National Laboratory and Argonne National Laboratory on HPC projects.

Lawrence Livermore National Laboratory

May 2024 - Aug 2024

Livermore, CA

Graduate Computing Summer Intern

May 2024 - Aug 2024

Responsibilities:

Developed a cluster-based methodology to characterize the performance of portable HPC applications across diverse architectures
Conducted a performance study of different CPUs and GPUs with different types of memory using the RAJA Performance Suite with other members of the Thicket team

Los Alamos National Laboratory

Junk 2023 - Aug 2023

Los Alamos, NM

Parallel Computing Intern

Junk 2023 - Aug 2023

Responsibilities:

Parallelized X-ray transport simulations to improve computational efficiency and scalability
Leveraged Kokkos for portability across multiple architectures, utilizing vectorization and thread team policies
Optimized performance, achieving over 13× speedup in parallelized code compared to the serial implementation

Research Projects

Thicket

Developer and Researcher Jan 2024 - Present

Thicket is a Python-based toolkit for analyzing ensemble performance data. It is also built on top of Hatchet, allowing for the same benefits that Hatchet provides.

gclab llnl

Hatchet

Developer and Researcher Jan 2024 - Present

Hatchet is a Python library that enables users to analyze performance data generated by different HPC profilers. Its main advantage over other tools is that it is capable of ingesting data from different profilers into a common representation, allowing users to use the same code to analyze performance data from different sources.

gclab llnl

ANACIN-X

Developer and Researcher Oct 2022 - May 2024

ANACIN-X is a suite of tools designed for trace-based analysis of non-deterministic behavior in MPI applications, helping developers and scientists identify root sources of non-determinism. It features a framework for characterizing non-determinism through graph similarity, consisting of execution trace collection, event graph construction, kernel analysis, and distance visualization. Additionally, it includes use cases focused on communication patterns to enhance understanding and reproducibility in HPC applications.

gclab

Publications

RAJA Performance Suite: Performance Portability Analysis with Caliper and Thicket

2024 International Workshop on Performance, Portability & Productivity in HPC at SC'24

Olga Pearce Jason Burmark Rich Hornung Befikir Bogale Ian Lumsden Michael McKinsey Dewi Yokelson David Boehme Stephanie Brink Michela Taufer Tom Scogland

Maintaining performant code in a world of fast-evolving computer architectures and programming models poses a significant challenge to scientists. Typically, benchmark codes are used to model some aspects of a large application code’s performance, and are easier to build and run. Such benchmarks can help assess the effects of code or algorithm changes, system updates, and new hardware. However, most performance benchmarks are not written using a wide range of GPU programming models. The RAJA Performance Suite provides a comprehensive set of computational kernels implemented in a variety of programming models. We integrated the performance measurement and analysis tools Caliper and Thicket into the RAJAPerf to facilitate performance comparison across kernel implementations and architectures. This paper describes the RAJAPerf, performance metrics that can be collected, and experimental analysis with case studies.

Details

Towards Affordable Reproducibility Using Scalable Capture and Comparison of Intermediate Multi-Run Results

2024 25th ACM/IFIP International Middleware Conference

Nigel Tan Kevin Assogba Walter J. Ashworth Befikir T. Bogale Frank Capello M. Mustafa Rafique Michela Taufer Bogdan Nicolae

Ensuring reproducibility in high-performance computing (HPC) applications is a significant challenge, particularly when nondeterministic execution can lead to untrustworthy results. Traditional methods that compare final results from multiple runs often fail because they provide sources of discrepancies only a posteriori and require substantial resources, making them impractical and unfeasible. This paper introduces an innovative method to address this issue by using scalable capture and comparing intermediate multi-run results. By capitalizing on intermediate checkpoints and hash-based techniques with user-defined error bounds, our method identifies divergences early in the execution paths. We employ Merkle trees for checkpoint data to reduce the I/O overhead associated with loading historical data. Our evaluations on the nondeterministic HACC cosmology simulation show that our method effectively captures differences above a predefined error bound and significantly reduces I/O overhead. Our solution provides a robust and scalable method for improving reproducibility, ensuring that scientific applications on HPC systems yield trustworthy and reliable results.

Details

Professional Services

Student Volunteer

2024 ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) November 2024

Served as a Student Volunteer at SC24. In this role, I helped ensure the sessions of the conference ran smoothly. Additionally, I performed other miscellaneous tasks, such as keeping track of the number of attendees in the sessions at which I was working.

Posters

Cluster-Based Methodology for Characterizing the Performance of Portable Application

2024 Student Research Competition at SC'24

Befikir Bogale Olga Pearce Michela Taufer Tom Scogland

This work focuses on performance portability and proposes a methodological approach to assessing and explaining how different kernels behave across various hardware architectures using the RAJA Performance Suite (RAJAPerf). Our methodology leverages metrics from the Intel top-down pipeline and clustering techniques to sort the kernels based on performance characteristics. We assess the methodology on 54 RAJAPerf’s computational kernels on Intel Xeon and NVIDIA V100 platforms. Our results confirm the effectiveness of our methodology in automatically characterizing performance differentials and speedups, particularly in memory-bound kernels.

Details

Achievements, Honors, and Scholarships

ACM Student Research Competition Participant

2024 ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) November 2024

Participated in the Graduate Student track of the ACM Student Research Competition, presenting my poster “Cluster-Based Methodology for Characterizing the Performance of Portable Applications”

Graduate Fellowship

University of Tennessee, Knoxville Feburary 2025 - Present

Awarded the Graduate Fellowship at the University of Tennessee, Knoxville

Befikir T. Bogale

Befikir T. Bogale

Graduate Research Assistant at Global Computing Lab

Experiences

University of Tennessee

Graduate Research Assistant

Responsibilities:

Graduate Research Assistant

Responsibilities:

Lawrence Livermore National Laboratory

Graduate Computing Summer Intern

Responsibilities:

Los Alamos National Laboratory

Parallel Computing Intern

Responsibilities:

Skills

Git

C/C++

Python

Python Data Analysis Tools

Shell Scripting

Education

University of Tennessee

PhD in Computer Science (High-Performance Computing Concentration)

University of Tennessee

BSc in Computer Science

Research Projects

Thicket

Hatchet

ANACIN-X

Publications

RAJA Performance Suite: Performance Portability Analysis with Caliper and Thicket

Towards Affordable Reproducibility Using Scalable Capture and Comparison of Intermediate Multi-Run Results

Professional Services

Student Volunteer

Posters

Cluster-Based Methodology for Characterizing the Performance of Portable Application

Achievements, Honors, and Scholarships

ACM Student Research Competition Participant

Graduate Fellowship

		University of Tennessee 2024-Present PhD in Computer Science (High-Performance Computing Concentration)
		University of Tennessee 2020-2024 BSc in Computer Science