With the improvement of computing power and the availability of large datasets, deep learning models based on CNNs can achieve excellent performance in facial expression recognition tasks. However, when the model makes a prediction, it is difficult to understand what is the basis for the prediction of the model and which facial features contribute to the classification. This paper introduces a pipeline for explainable facial expression analysis, combining Grad-CAM heatmaps, OpenFace Action Unit (AU) detection, and GPT-4 for natural language explanations. The process aligns saliency maps with facial landmarks and uses a weighted approach to merge AU intensities with activation regions. Explanations describe facial movements driving the classification, tailored for non-expert audiences. The system enhances transparency and fosters user trust, as validated through user studies. Future work aims to reduce the computational cost, integrating image captioning with large language models for streamlined explanations.
XFERa: Xplainable Emotion Recognition for improving transparency and trust
De Carolis B.;Loglisci C.;Losavio V. N.;Miccoli M. G.;Palestra G.
2025-01-01
Abstract
With the improvement of computing power and the availability of large datasets, deep learning models based on CNNs can achieve excellent performance in facial expression recognition tasks. However, when the model makes a prediction, it is difficult to understand what is the basis for the prediction of the model and which facial features contribute to the classification. This paper introduces a pipeline for explainable facial expression analysis, combining Grad-CAM heatmaps, OpenFace Action Unit (AU) detection, and GPT-4 for natural language explanations. The process aligns saliency maps with facial landmarks and uses a weighted approach to merge AU intensities with activation regions. Explanations describe facial movements driving the classification, tailored for non-expert audiences. The system enhances transparency and fosters user trust, as validated through user studies. Future work aims to reduce the computational cost, integrating image captioning with large language models for streamlined explanations.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


