Describing an image scene in Natural Language is a very complex procedure for a machine. Many researchers have used Natural Language Processing approaches. In this paper Machine Learning and Computer Vision models will be illustrated with the purpose of describing a picture in the wild. Action Recognition models, Face Recognition with gender and age and Clothing Recognition will be performed in combination with the purpose of generating a textual sentence belonging to natural language describing the scene in the picture. The proposed technique can target multiple domains, specifically useful for preventing cyber bullying situations. In addition, an attempt will be made to exceed for each model the current SoA.
Human Description in the Wild: Description of the Scene with Ensembles of AI Models
Vincenzo Dentamaro;Vincenzo Gattulli;Paolo Giglio;Donato Impedovo;Giuseppe Pirlo
2022-01-01
Abstract
Describing an image scene in Natural Language is a very complex procedure for a machine. Many researchers have used Natural Language Processing approaches. In this paper Machine Learning and Computer Vision models will be illustrated with the purpose of describing a picture in the wild. Action Recognition models, Face Recognition with gender and age and Clothing Recognition will be performed in combination with the purpose of generating a textual sentence belonging to natural language describing the scene in the picture. The proposed technique can target multiple domains, specifically useful for preventing cyber bullying situations. In addition, an attempt will be made to exceed for each model the current SoA.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.