Fine-Tuning Large Multimodal Models for Fitness Action Quality Assessment

IRIS

Action Quality Assessment (AQA) plays an important role in evaluating human performance in different domains, including fitness, sports, and healthcare. This work introduces a novel AQA approach by fine-Tuning large multimodal models (LMMs) for personalized activity evaluation. We used the Fitness-AQA Dataset, which provides detailed annotations of exercise errors under realistic conditions, and we adapt the LLaVA-Video model, a state-of-The-Art LMM comprising the Qwen2 large language model and the SigLIP vision encoder. We have implemented a customized data preparation pipeline that transforms video-based exercise annotations into a conversational format specific for fine-Tuning. To our knowledge, this study is among the first to fine-Tune LMMs for AQA tasks and the very first to explore activity evaluation in this context. The experimental evaluation shows that our model achieves results slightly lower than the baseline, even though it is able to generalize across multiple exercises. The full-reproducible code is available on GitHub https://github.com/GaetanoDibenedetto/UMAP25.

Fine-Tuning Large Multimodal Models for Fitness Action Quality Assessment

Dibenedetto, Gaetano^{Conceptualization};Polignano, Marco^{Conceptualization};Lops, Pasquale^Supervision

2025-01-01

Abstract

Action Quality Assessment (AQA) plays an important role in evaluating human performance in different domains, including fitness, sports, and healthcare. This work introduces a novel AQA approach by fine-Tuning large multimodal models (LMMs) for personalized activity evaluation. We used the Fitness-AQA Dataset, which provides detailed annotations of exercise errors under realistic conditions, and we adapt the LLaVA-Video model, a state-of-The-Art LMM comprising the Qwen2 large language model and the SigLIP vision encoder. We have implemented a customized data preparation pipeline that transforms video-based exercise annotations into a conversational format specific for fine-Tuning. To our knowledge, this study is among the first to fine-Tune LMMs for AQA tasks and the very first to explore activity evaluation in this context. The experimental evaluation shows that our model achieves results slightly lower than the baseline, even though it is able to generalize across multiple exercises. The full-reproducible code is available on GitHub https://github.com/GaetanoDibenedetto/UMAP25.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2025

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/550749

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

2

1

social impact