Generative AI is rapidly advancing, enabling the creation of diverse and high-quality content across multiple modalities, including text, images, audio, and video. However, much of this progress remains confined to general-purpose tasks, often neglecting real-world applications’ unique complexities and domain-specific requirements. In this paper, we present a unified perspective on our research at the Computational Intelligence Laboratory (CILab), University of Bari Aldo Moro, aimed at advancing domain-aware generative AI across five core tasks: text-to-text, text-to-image, text-to-video, image-to-text, and image-to-audio. Our work introduces methodological innovations such as text-guided multi-mask inpainting with localized attention, explainable image to-text pipelines for medical reporting, and generative sign language video synthesis using diffusion models. We further contribute new datasets and propose novel evaluation frameworks tailored to domain-specific constraints, including creativity, factual alignment, inclusivity, and human-centered assessment. Through applications in healthcare, cultural heritage, accessibility, and human capital management, we show how generative AI can be used for content creation and as a driver of creativity, inclusivity, and knowledge transfer. This paper argues for a shift toward more grounded, explainable, and domain-specific generative models. We aim to pave the way for impactful AI applications in complex, real-world scenarios by addressing key gaps in current practices.
Generative AI across modalities: insights from our research on domain-aware content generation
Giovanna Castellano;Emanuele Colonna;Nicola Fanelli;Lucrezia Laraspata;Ivan Rinaldi;Alberto Gaetano Valerio;Gennaro Vessio
2025-01-01
Abstract
Generative AI is rapidly advancing, enabling the creation of diverse and high-quality content across multiple modalities, including text, images, audio, and video. However, much of this progress remains confined to general-purpose tasks, often neglecting real-world applications’ unique complexities and domain-specific requirements. In this paper, we present a unified perspective on our research at the Computational Intelligence Laboratory (CILab), University of Bari Aldo Moro, aimed at advancing domain-aware generative AI across five core tasks: text-to-text, text-to-image, text-to-video, image-to-text, and image-to-audio. Our work introduces methodological innovations such as text-guided multi-mask inpainting with localized attention, explainable image to-text pipelines for medical reporting, and generative sign language video synthesis using diffusion models. We further contribute new datasets and propose novel evaluation frameworks tailored to domain-specific constraints, including creativity, factual alignment, inclusivity, and human-centered assessment. Through applications in healthcare, cultural heritage, accessibility, and human capital management, we show how generative AI can be used for content creation and as a driver of creativity, inclusivity, and knowledge transfer. This paper argues for a shift toward more grounded, explainable, and domain-specific generative models. We aim to pave the way for impactful AI applications in complex, real-world scenarios by addressing key gaps in current practices.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


