Human Description in the Wild: Description of the Scene with Ensembles of AI Models

Dentamaro, Vincenzo; Gattulli, Vincenzo; Giglio, Paolo; Impedovo, Donato; Pirlo, Giuseppe

doi:10.1007/978-3-031-23028-8_32

Describing an image scene in Natural Language is a very complex procedure for a machine. Many researchers have used Natural Language Processing approaches. In this paper Machine Learning and Computer Vision models will be illustrated with the purpose of describing a picture in the wild. Action Recognition models, Face Recognition with gender and age and Clothing Recognition will be performed in combination with the purpose of generating a textual sentence belonging to natural language describing the scene in the picture. The proposed technique can target multiple domains, specifically useful for preventing cyber bullying situations. In addition, an attempt will be made to exceed for each model the current SoA.