IG2: Integrated Gradient on Iterative Gradient Path for eXplainable AI
Feature attribution explains Artificial Intelligence (AI) at the instance level by providing importance scores of input features' contributions to the model prediction. Integrated Gradients (IG) is the prevalent path attribution method for deep neural networks, which integrates gradients along a path between the explained input (explicand) and a counterfactual instance called the baseline. However, the existing variant IG-based methods only consider the gradient of explicand's output, but we find that the gradient of counterfactual output also has a significant effect on feature attribution. To achieve this, we propose \underline{I}terative \underline{G}radient path \underline{I}ntegrated \underline{G}radients (IG2), considering both two gradients. IG2 incorporates the counterfactual gradient iteratively into the integration path, consequently obtaining a novel path (\emph{GradPath}) and a novel baseline (\emph{GradCF}). These two novel essential IG components significantly solve the problems of attribution noise and arbitrary baseline choice in previous IG methods. As a path method, IG2 stratifies many desirable axioms, which are theoretically justified in the paper. The experiments are built on the synthetic tabular XAI benchmark and multiple real-world datasets, including classification tasks of ImageNet, TREC questions, and wafer map failure patterns. The qualitative and quantitative results validate that IG2 provides superior feature attributions to the previous attribution techniques.
History
Email Address of Submitting Author
zhuoy1995@zju.edu.cnSubmitting Author's Institution
Zhejiang UniversitySubmitting Author's Country
- China