Additive feature attribution methods

As explained in [@lundbergUnifiedApproachInterpreting] additive feature attribution methods are those methods that use an explanation model which is a linear function of binary variables, i.e.

g (z^{'}) = ϕ_{0} + i = 1 \sum M ϕ_{i} z_{i}^{'}

where $z^{'} \in {0, 1}^{M}$ is the number of simplified input features, and $ϕ_{i} \in R$ , which models the effect that each feature has on the final outcome. The summation of al these effects approximates the output $f (x)$ of the original opaque model.

In other words, differently from generating counterfactual examples, which explanation is given by a counterexample, additive feature attribution methods explain the decision by assigning to each feature a score, showing which are the feature that contributed the most to the final prediction.

Note that simplified inputs of the original $x$ are defined as points $x^{'}$ such that $h_{x} (x^{'}) = x$ ( $h_{x}$ being a specific function to the current input $x$ ). This is an approach that is used by a lot of methods.

For example, in LIME [@ribeiroWhyShouldTrust2016] they use an additive feature attribution method since they locally approximate the prediction of the model around a given prediction using a local linear explanation model.

tags: ai-explainability

Quartz 4

Explorer

Additive feature attribution methods

Graph View