publications
2024
- NeurIPS 2024 ATTRIB WSQuanda: An Interpretability Toolkit for Training Data Attribution Evaluation and BeyondIn Second NeurIPS Workshop on Attributing Model Behavior at Scale, Dec 2024
- ICML 2024 Mechanistic Interpretability WSManipulating Feature Visualizations with Gradient SlingshotsIn ICML 2024 Workshop on Mechanistic Interpretability, Jul 2024
- Finding the Right XAI Method—A Guide for the Evaluation and Ranking of Explainable AI Methods in Climate ScienceArtificial Intelligence for the Earth Systems, Jul 2024
- CVPR 2024 SAIAD WSReactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias SuppressionIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Jun 2024
2023
- Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and BeyondJournal of Machine Learning Research, Jun 2023