loading page

IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution
  • Yue Zhuo ,
  • Zhiqiang Ge
Yue Zhuo
Zhejiang University, Zhejiang University

Corresponding Author:[email protected]

Author Profile
Zhiqiang Ge
Author Profile

Abstract

Feature attribution explains Artificial Intelligence (AI) at the instance level by providing importance scores of input features’ contributions to model prediction. Integrated Gradients (IG) is the prevalent path attribution method for deep neural networks, involving the integration of gradients along a path from the explained input (explicand) to a counterfactual instance (baseline). Nonetheless, current IG variants exclusively consider the gradient of explicand’s output, but we have discovered that the gradient of the counterfactual output also exerts a substantial influence on feature attribution. To achieve this, we propose Iterative Gradient path Integrated Gradients (IG2), considering both two gradients. IG2 incorporates the counterfactual gradient iteratively into the integration path, consequently obtaining a novel path (GradPath) and a novel baseline (GradCF). These two novel IG components effectively address the issues of attribution noise and arbitrary baseline choice in earlier IG methods. IG2, as a path method, stratifies many desirable axioms, which are theoretically justified in the paper. Experimental results on XAI benchmark, ImageNet, MNIST, TREC questions answering, wafer-map failure patterns, and CelebA face attributes validate that IG2 delivers superior feature attributions compared to the state-of-the-art techniques.