Recognizing attributes of unknown artworks relies on more than visual information: prior knowledge and emotional context can play a crucial role. Building an AI system mimicking this perception requires a multi-modal model integrating computer vision and contextual factors. In this paper, we propose a new model that uses vision transformers and graph attention networks to learn new artworks’ visual and contextual features and predict their style, genre, and emotion. Contextual features are acquired from an extended version of our ArtGraph knowledge graph, enriched with emotion information from the ArtEmis dataset. Our inductive end-to-end multi-task architecture enables real-time execution and resilience to graph evolutions. Combining computer vision and knowledge graphs could facilitate a deeper understanding of the fine arts, bridging the gap between computer science and the humanities (The new version of the graph is available at https://doi.org/10.5281/zenodo.8172374, while the code is available at https://github.com/CILAB-ArtGraph/multi-modal-end-to-end-art-classifier).
Recognizing the Style, Genre, and Emotion of a Work of Art Through Visual and Knowledge Graph Embeddings
Castellano, Giovanna;Scaringi, Raffaele
;Vessio, Gennaro
2023-01-01
Abstract
Recognizing attributes of unknown artworks relies on more than visual information: prior knowledge and emotional context can play a crucial role. Building an AI system mimicking this perception requires a multi-modal model integrating computer vision and contextual factors. In this paper, we propose a new model that uses vision transformers and graph attention networks to learn new artworks’ visual and contextual features and predict their style, genre, and emotion. Contextual features are acquired from an extended version of our ArtGraph knowledge graph, enriched with emotion information from the ArtEmis dataset. Our inductive end-to-end multi-task architecture enables real-time execution and resilience to graph evolutions. Combining computer vision and knowledge graphs could facilitate a deeper understanding of the fine arts, bridging the gap between computer science and the humanities (The new version of the graph is available at https://doi.org/10.5281/zenodo.8172374, while the code is available at https://github.com/CILAB-ArtGraph/multi-modal-end-to-end-art-classifier).I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.