Scalable LU Factorizations
February 2020 - Present
Experimenting with the use of various approaches to improve the performance of distributed, GPU-accelerated LU factorizations. Such factorizations require partial pivoting for numerical stability; however, this introduces significant overheads to search for and apply such pivots. The primary branch of this work to date involves the use of Randomized Butterfly Transforms to shuffle the matrix in such a way that pivoting is unnecessary. Other efforts have included optimizing the pivoted and non-pivoted implementations of LU in SLATE, a dense linear algebra library for distributed, heterogenous systems. The current work uses the SLATE dense linear algebra library.
- Replacing Pivoting in Distributed Gaussian Elimination with Randomized Techniques - paper at the 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Atom for Common Lisp
April 2019 - Present
I maintain two packages for developing Lisp (especially Common Lisp) in the Atom text editor. First is SLIMA, which provides interactive Common Lisp development, based on the Emacs plugin SLIME. This is a fork of Steve Levine’s Atom-Slime. Second is Lisp-Paredit, which provides commands for editing any S-expression based language, originally developed by Jon Spalding.
- SLIMA package page
- Swank-Client - The NPM package that handles the underlying swank remote calls for SLIMA.
- Lisp-Paredit package page
Scalable Interpolation for Thermal-Fluids Applications
May 2021 - September 2021
Ported interpolation routines to the OCCA runtime system in NekRS, a GPU-accelerated, spectral-element code for simulation fluids and their temperature. These routines were used to implement both particle tracking and multiple, overlapped meshes.
Mixed precision GMRES
August 2019 - June 2021
Exploration of the use of different precisions for different parts of the solver affects the performance and convergence of GMRES. The work primarily focused on achieving the accuracy of a double precision GMRES implementation while selectively using lower precision to reduce data movement costs.
- Accelerating Restarted GMRES with Mixed Precision Arithmetic - paper in the IEEE Transactions on Parallel and Distributed Systems
- Improving the Performance of the GMRES method using Mixed-Precision Techniques - paper at the 2020 Smokey Mountains Conference
Reducing Memory Access Costs using Data Compression in Conjugate Gradient
May 2017 - April 2019
Exploration into whether the performance of sparse linear solvers (specifically Conjugate Gradient) can be improved by reducing data movement using compression.
July 2017 - May 2019
An implementation of Trilinos’s Petra Object Model in Julia. The project tried to understand how well Julia works for distributed, high performance computing.
- JuliaPetra.jl - The implementation
- Obtaining Performance from a Julia-Implementation of Trilinos Data Librairies - presented at the 2019 SIAM Conference on Computational Science and Engineering
- TypeStability.jl - A Julia package to automate type stability checks