This study proposes a novel, scalable framework for the automated classification and synthesis of survey literature by integrating state-of-the-art Large Language Models (LLMs) with robust ensemble voting techniques. The framework consolidates predictions from three independent models—GPT-4, LLaMA 3.3, and Claude 3—to generate consensus-based classifications, thereby enhancing reliability and mitigating individual model biases. We demonstrate the generalizability of our approach through comprehensive evaluation on two distinct domains: Question Answering (QA) systems and Computer Vision (CV) survey literature, using a dataset of 1154 real papers extracted from arXiv. Comprehensive visual evaluation tools, including distribution charts, heatmaps, confusion matrices, and statistical validation metrics, are employed to rigorously assess model performance and inter-model agreement. The framework incorporates advanced statistical measures, including k-fold cross-validation, Fleiss’ kappa for inter-rater reliability, and chi-square tests for independence to validate classification robustness. Extensive experimental evaluations demonstrate that this ensemble approach achieves superior performance compared to individual models, with accuracy improvements of 10.0% over the best single model on QA literature and 10.9% on CV literature. Furthermore, comprehensive cost–benefit analysis reveals that our automated approach reduces manual literature synthesis time by 95% while maintaining high classification accuracy (F1-score: 0.89 for QA, 0.87 for CV), making it a practical solution for large-scale literature analysis. The methodology effectively uncovers emerging research trends and persistent challenges across domains, providing researchers with powerful tools for continuous literature monitoring and informed decision-making in rapidly evolving scientific fields.

Integrated Survey Classification and Trend Analysis via LLMs: An Ensemble Approach for Robust Literature Synthesis

Bernasconi E.;Redavid D.;Ferilli S.
2025-01-01

Abstract

This study proposes a novel, scalable framework for the automated classification and synthesis of survey literature by integrating state-of-the-art Large Language Models (LLMs) with robust ensemble voting techniques. The framework consolidates predictions from three independent models—GPT-4, LLaMA 3.3, and Claude 3—to generate consensus-based classifications, thereby enhancing reliability and mitigating individual model biases. We demonstrate the generalizability of our approach through comprehensive evaluation on two distinct domains: Question Answering (QA) systems and Computer Vision (CV) survey literature, using a dataset of 1154 real papers extracted from arXiv. Comprehensive visual evaluation tools, including distribution charts, heatmaps, confusion matrices, and statistical validation metrics, are employed to rigorously assess model performance and inter-model agreement. The framework incorporates advanced statistical measures, including k-fold cross-validation, Fleiss’ kappa for inter-rater reliability, and chi-square tests for independence to validate classification robustness. Extensive experimental evaluations demonstrate that this ensemble approach achieves superior performance compared to individual models, with accuracy improvements of 10.0% over the best single model on QA literature and 10.9% on CV literature. Furthermore, comprehensive cost–benefit analysis reveals that our automated approach reduces manual literature synthesis time by 95% while maintaining high classification accuracy (F1-score: 0.89 for QA, 0.87 for CV), making it a practical solution for large-scale literature analysis. The methodology effectively uncovers emerging research trends and persistent challenges across domains, providing researchers with powerful tools for continuous literature monitoring and informed decision-making in rapidly evolving scientific fields.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/582105
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact