This research investigates the application of advanced computational methodologies to the analysis of unstructured textual data derived from post-purchase product reviews on online platforms. Recognizing the inherent linguistic variability and noise within this data modality, a rigorous pre-processing pipeline is proposed. This pipeline emphasizes critical steps such as text normalization, feature extraction, and data balancing techniques to mitigate class imbalance and ensure the generation of high-quality input for subsequent modeling. The study introduces a novel theoretical and computational framework leveraging high-performance Machine Learning (ML) algorithms, with a specific focus on Ensemble Learning paradigms. The research primarily investigates Ensemble Machine Learning Methods, which derive their effectiveness from combining multiple decision trees. The study focuses on identifying and refining the most efficient ensemble method for this task. Furthermore, the research emphasizes the importance of evaluating performance metrics pertinent to minority classes and presents a comparative analysis of model outcomes achieved with both balanced and imbalanced datasets, highlighting the impact of pre-processing strategies on predictive accuracy and fairness.

Optimizing Sentiment Classification in Non-structured Text Data Using Ensemble Method

Firza Najada
;
Viola Domenico;
2025-01-01

Abstract

This research investigates the application of advanced computational methodologies to the analysis of unstructured textual data derived from post-purchase product reviews on online platforms. Recognizing the inherent linguistic variability and noise within this data modality, a rigorous pre-processing pipeline is proposed. This pipeline emphasizes critical steps such as text normalization, feature extraction, and data balancing techniques to mitigate class imbalance and ensure the generation of high-quality input for subsequent modeling. The study introduces a novel theoretical and computational framework leveraging high-performance Machine Learning (ML) algorithms, with a specific focus on Ensemble Learning paradigms. The research primarily investigates Ensemble Machine Learning Methods, which derive their effectiveness from combining multiple decision trees. The study focuses on identifying and refining the most efficient ensemble method for this task. Furthermore, the research emphasizes the importance of evaluating performance metrics pertinent to minority classes and presents a comparative analysis of model outcomes achieved with both balanced and imbalanced datasets, highlighting the impact of pre-processing strategies on predictive accuracy and fairness.
2025
978-3-031-95995-0
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/542512
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact