With the improvement of computing power and the availability of large datasets, deep learning models based on CNNs can achieve excellent performance in facial expression recognition tasks. However, when the model makes a prediction, it is difficult to understand what is the basis for the prediction of the model and which facial features contribute to the classification. This paper introduces a pipeline for explainable facial expression analysis, combining Grad-CAM heatmaps, OpenFace Action Unit (AU) detection, and GPT-4 for natural language explanations. The process aligns saliency maps with facial landmarks and uses a weighted approach to merge AU intensities with activation regions. Explanations describe facial movements driving the classification, tailored for non-expert audiences. The system enhances transparency and fosters user trust, as validated through user studies. Future work aims to reduce the computational cost, integrating image captioning with large language models for streamlined explanations.

XFERa: Xplainable Emotion Recognition for improving transparency and trust

De Carolis B.;Loglisci C.;Losavio V. N.;Miccoli M. G.;Palestra G.
2025-01-01

Abstract

With the improvement of computing power and the availability of large datasets, deep learning models based on CNNs can achieve excellent performance in facial expression recognition tasks. However, when the model makes a prediction, it is difficult to understand what is the basis for the prediction of the model and which facial features contribute to the classification. This paper introduces a pipeline for explainable facial expression analysis, combining Grad-CAM heatmaps, OpenFace Action Unit (AU) detection, and GPT-4 for natural language explanations. The process aligns saliency maps with facial landmarks and uses a weighted approach to merge AU intensities with activation regions. Explanations describe facial movements driving the classification, tailored for non-expert audiences. The system enhances transparency and fosters user trust, as validated through user studies. Future work aims to reduce the computational cost, integrating image captioning with large language models for streamlined explanations.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/542946
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact