Plausible Counterfactuals: 
						
						Auditing Deep Learning Classifiers with Realistic Adversarial Examples

Barredo-Arrieta, Alejandro; Del Ser, Javier

Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples

dc.contributor.author	Barredo-Arrieta, Alejandro
dc.contributor.author	Del Ser, Javier
dc.contributor.institution	Tecnalia Research & Innovation
dc.contributor.institution	IA
dc.date.issued	2020-07
dc.description	Publisher Copyright: © 2020 IEEE.
dc.description.abstract	The last decade has witnessed the proliferation of Deep Learning models in many applications, achieving unrivaled levels of predictive performance. Unfortunately, the black-box nature of Deep Learning models has posed unanswered questions about what they learn from data. Certain application scenarios have highlighted the importance of assessing the bounds under which Deep Learning models operate, a problem addressed by using assorted approaches aimed at audiences from different domains. However, as the focus of the application is placed more on non-expert users, it results mandatory to provide the means for him/her to trust the model, just like a human gets familiar with a system or process: by understanding the hypothetical circumstances under which it fails. This is indeed the angular stone for this research work: to undertake an adversarial analysis of a Deep Learning model. The proposed framework constructs counterfactual examples by ensuring their plausibility, e.g. there is a reasonable probability that a human could generate them without resorting to a computer program. Therefore, this work must be regarded as valuable auditing exercise of the usable bounds a certain model is constrained within, thereby allowing for a much greater understanding of the capabilities and pitfalls of a model used in a real application. To this end, a Generative Adversarial Network (GAN) and multi-objective heuristics are used to furnish a plausible attack to the audited model, efficiently trading between the confusion of this model, the intensity and plausibility of the generated counterfactual. Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.	en
dc.description.sponsorship	The authors would like to thank the Basque Government for its support through the EMAITEK and ELKARTEK funding programs. Javier Del Ser also receives support from the Consolidated Research Group MATHMODE (IT1294-19) granted by the Department of Education of the Basque Government.
dc.description.status	Peer reviewed
dc.identifier.citation	Barredo-Arrieta , A & Del Ser , J 2020 , Plausible Counterfactuals : Auditing Deep Learning Classifiers with Realistic Adversarial Examples . in 2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings . , 9206728 , Proceedings of the International Joint Conference on Neural Networks , Institute of Electrical and Electronics Engineers Inc. , 2020 International Joint Conference on Neural Networks, IJCNN 2020 , Virtual, Glasgow , United Kingdom , 19/07/20 . https://doi.org/10.1109/IJCNN48605.2020.9206728
dc.identifier.citation	conference
dc.identifier.doi	10.1109/IJCNN48605.2020.9206728
dc.identifier.isbn	9781728169262
dc.identifier.url	http://www.scopus.com/inward/record.url?scp=85093831442&partnerID=8YFLogxK
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers Inc.
dc.relation.ispartof	2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings
dc.relation.ispartofseries	Proceedings of the International Joint Conference on Neural Networks
dc.relation.projectID	Department of Education of the Basque Government
dc.relation.projectID	Eusko Jaurlaritza, IT1294-19
dc.rights	info:eu-repo/semantics/openAccess
dc.subject.keywords	Counterfactuals
dc.subject.keywords	Deep Learning
dc.subject.keywords	Explainable Artificial Intelligence
dc.subject.keywords	Generative Adversarial Networks
dc.subject.keywords	Meta-heuristics
dc.subject.keywords	Multiobjective Optimization
dc.subject.keywords	Software
dc.subject.keywords	Artificial Intelligence
dc.title	Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples	en
dc.type	conference output

Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples

Files