Action Quality Assessment (AQA) plays an important role in evaluating human performance in different domains, including fitness, sports, and healthcare. This work introduces a novel AQA approach by fine-Tuning large multimodal models (LMMs) for personalized activity evaluation. We used the Fitness-AQA Dataset, which provides detailed annotations of exercise errors under realistic conditions, and we adapt the LLaVA-Video model, a state-of-The-Art LMM comprising the Qwen2 large language model and the SigLIP vision encoder. We have implemented a customized data preparation pipeline that transforms video-based exercise annotations into a conversational format specific for fine-Tuning. To our knowledge, this study is among the first to fine-Tune LMMs for AQA tasks and the very first to explore activity evaluation in this context. The experimental evaluation shows that our model achieves results slightly lower than the baseline, even though it is able to generalize across multiple exercises. The full-reproducible code is available on GitHub https://github.com/GaetanoDibenedetto/UMAP25.
Fine-Tuning Large Multimodal Models for Fitness Action Quality Assessment
Dibenedetto, Gaetano
Conceptualization
;Polignano, Marco
Conceptualization
;Lops, PasqualeSupervision
2025-01-01
Abstract
Action Quality Assessment (AQA) plays an important role in evaluating human performance in different domains, including fitness, sports, and healthcare. This work introduces a novel AQA approach by fine-Tuning large multimodal models (LMMs) for personalized activity evaluation. We used the Fitness-AQA Dataset, which provides detailed annotations of exercise errors under realistic conditions, and we adapt the LLaVA-Video model, a state-of-The-Art LMM comprising the Qwen2 large language model and the SigLIP vision encoder. We have implemented a customized data preparation pipeline that transforms video-based exercise annotations into a conversational format specific for fine-Tuning. To our knowledge, this study is among the first to fine-Tune LMMs for AQA tasks and the very first to explore activity evaluation in this context. The experimental evaluation shows that our model achieves results slightly lower than the baseline, even though it is able to generalize across multiple exercises. The full-reproducible code is available on GitHub https://github.com/GaetanoDibenedetto/UMAP25.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


