Vision-Language Models (VLMs) have demonstrated remarkable multimodal understanding. Due to their extensive training, they excel in tasks such as visual question answering and image retrieval. Their impressive generalization ability enables them to address novel and complex challenges. In this study, we evaluate the capability of VLMs for the Visual Word Sense Disambiguation (VWSD) task. Specifically, we examine their ability to select the correct image from a set of candidates for a given lemma based on minimal contextual information (few additional words). Additionally, we evaluate the ability of VLMs to solve this task across multiple languages and analyze the performance of multimodal encoder-based and generative VLMs.

Assessing and Improving the Multilingual Visual Word Sense Disambiguation Ability of Vision-Language Models

Lucia Siciliani;Pierpaolo Basile;Giovanni Semeraro
2025-01-01

Abstract

Vision-Language Models (VLMs) have demonstrated remarkable multimodal understanding. Due to their extensive training, they excel in tasks such as visual question answering and image retrieval. Their impressive generalization ability enables them to address novel and complex challenges. In this study, we evaluate the capability of VLMs for the Visual Word Sense Disambiguation (VWSD) task. Specifically, we examine their ability to select the correct image from a set of candidates for a given lemma based on minimal contextual information (few additional words). Additionally, we evaluate the ability of VLMs to solve this task across multiple languages and analyze the performance of multimodal encoder-based and generative VLMs.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/556689
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact