Generative AI is rapidly advancing, enabling the creation of diverse and high-quality content across multiple modalities, including text, images, audio, and video. However, much of this progress remains confined to general-purpose tasks, often neglecting real-world applications’ unique complexities and domain-specific requirements. In this paper, we present a unified perspective on our research at the Computational Intelligence Laboratory (CILab), University of Bari Aldo Moro, aimed at advancing domain-aware generative AI across five core tasks: text-to-text, text-to-image, text-to-video, image-to-text, and image-to-audio. Our work introduces methodological innovations such as text-guided multi-mask inpainting with localized attention, explainable image to-text pipelines for medical reporting, and generative sign language video synthesis using diffusion models. We further contribute new datasets and propose novel evaluation frameworks tailored to domain-specific constraints, including creativity, factual alignment, inclusivity, and human-centered assessment. Through applications in healthcare, cultural heritage, accessibility, and human capital management, we show how generative AI can be used for content creation and as a driver of creativity, inclusivity, and knowledge transfer. This paper argues for a shift toward more grounded, explainable, and domain-specific generative models. We aim to pave the way for impactful AI applications in complex, real-world scenarios by addressing key gaps in current practices.

Generative AI across modalities: insights from our research on domain-aware content generation

Giovanna Castellano;Emanuele Colonna;Nicola Fanelli;Lucrezia Laraspata;Ivan Rinaldi;Alberto Gaetano Valerio;Gennaro Vessio
2025-01-01

Abstract

Generative AI is rapidly advancing, enabling the creation of diverse and high-quality content across multiple modalities, including text, images, audio, and video. However, much of this progress remains confined to general-purpose tasks, often neglecting real-world applications’ unique complexities and domain-specific requirements. In this paper, we present a unified perspective on our research at the Computational Intelligence Laboratory (CILab), University of Bari Aldo Moro, aimed at advancing domain-aware generative AI across five core tasks: text-to-text, text-to-image, text-to-video, image-to-text, and image-to-audio. Our work introduces methodological innovations such as text-guided multi-mask inpainting with localized attention, explainable image to-text pipelines for medical reporting, and generative sign language video synthesis using diffusion models. We further contribute new datasets and propose novel evaluation frameworks tailored to domain-specific constraints, including creativity, factual alignment, inclusivity, and human-centered assessment. Through applications in healthcare, cultural heritage, accessibility, and human capital management, we show how generative AI can be used for content creation and as a driver of creativity, inclusivity, and knowledge transfer. This paper argues for a shift toward more grounded, explainable, and domain-specific generative models. We aim to pave the way for impactful AI applications in complex, real-world scenarios by addressing key gaps in current practices.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/560683
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact