A Distributed Framework for Causal Modeling of Performance Variability in GPU Traces
Published in arXiv preprint arXiv:2510.18300, 2025
Recommended citation: Ankur Lahiry, Ayush Pokharel, Banooqa Banday, Seth Ockerman, Amal Gueroudji, Mohammad Zaeed, Tanzima Z Islam, and Line Pouchard. (2025). "A Distributed Framework for Causal Modeling of Performance Variability in GPU Traces." arXiv preprint arXiv:2510.18300. https://arxiv.org/abs/2510.18300
A distributed framework for causal modeling of performance variability in GPU traces, enabling identification of root causes behind performance anomalies in HPC workloads through causal inference over large-scale GPU trace data.
Authors: Ankur Lahiry, Ayush Pokharel, Banooqa Banday, Seth Ockerman, Amal Gueroudji, Mohammad Zaeed, Tanzima Z Islam, and Line Pouchard
