Adversarial Attacks

In the context of machine learning, an adversarial attack is defined as finding the minimum noise $ϵ$ that when added to the original input data $x + ϵ$ the output of the classifier $F$ is different.

This concept is similar to Counterfactual Explanation: in adversarial attacks we are searching for the minimum amount of noise, while in counterfactual explanations we are searching for the minimum change in the input in order to change the output. One can be reduce to the other, apart from the fact that counterfactual explanations should respect a series of properties to be considered good counterfactuals. These properties are different from adversarial attacks, since the purpose is different.

tags: machine-learning

Quartz 4

Explorer

Adversarial Attacks

Graph View

Backlinks