Assessing and Improving the Multilingual Visual Word Sense Disambiguation Ability of Vision-Language Models

IRIS

Vision-Language Models (VLMs) have demonstrated remarkable multimodal understanding. Due to their extensive training, they excel in tasks such as visual question answering and image retrieval. Their impressive generalization ability enables them to address novel and complex challenges. In this study, we evaluate the capability of VLMs for the Visual Word Sense Disambiguation (VWSD) task. Specifically, we examine their ability to select the correct image from a set of candidates for a given lemma based on minimal contextual information (few additional words). Additionally, we evaluate the ability of VLMs to solve this task across multiple languages and analyze the performance of multimodal encoder-based and generative VLMs.

Assessing and Improving the Multilingual Visual Word Sense Disambiguation Ability of Vision-Language Models

Elio Musacchio;Lucia Siciliani;Pierpaolo Basile;Giovanni Semeraro

2025-01-01

Abstract

Vision-Language Models (VLMs) have demonstrated remarkable multimodal understanding. Due to their extensive training, they excel in tasks such as visual question answering and image retrieval. Their impressive generalization ability enables them to address novel and complex challenges. In this study, we evaluate the capability of VLMs for the Visual Word Sense Disambiguation (VWSD) task. Specifically, we examine their ability to select the correct image from a set of candidates for a given lemma based on minimal contextual information (few additional words). Additionally, we evaluate the ability of VLMs to solve this task across multiple languages and analyze the performance of multimodal encoder-based and generative VLMs.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2025

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/556689

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact