This research investigates the application of advanced computational methodologies to the analysis of unstructured textual data derived from post-purchase product reviews on online platforms. Recognizing the inherent linguistic variability and noise within this data modality, a rigorous pre-processing pipeline is proposed. This pipeline emphasizes critical steps such as text normalization, feature extraction, and data balancing techniques to mitigate class imbalance and ensure the generation of high-quality input for subsequent modeling. The study introduces a novel theoretical and computational framework leveraging high-performance Machine Learning (ML) algorithms, with a specific focus on Ensemble Learning paradigms. The research primarily investigates Ensemble Machine Learning Methods, which derive their effectiveness from combining multiple decision trees. The study focuses on identifying and refining the most efficient ensemble method for this task. Furthermore, the research emphasizes the importance of evaluating performance metrics pertinent to minority classes and presents a comparative analysis of model outcomes achieved with both balanced and imbalanced datasets, highlighting the impact of pre-processing strategies on predictive accuracy and fairness.
Optimizing Sentiment Classification in Non-structured Text Data Using Ensemble Method
Firza Najada
;Viola Domenico;
2025-01-01
Abstract
This research investigates the application of advanced computational methodologies to the analysis of unstructured textual data derived from post-purchase product reviews on online platforms. Recognizing the inherent linguistic variability and noise within this data modality, a rigorous pre-processing pipeline is proposed. This pipeline emphasizes critical steps such as text normalization, feature extraction, and data balancing techniques to mitigate class imbalance and ensure the generation of high-quality input for subsequent modeling. The study introduces a novel theoretical and computational framework leveraging high-performance Machine Learning (ML) algorithms, with a specific focus on Ensemble Learning paradigms. The research primarily investigates Ensemble Machine Learning Methods, which derive their effectiveness from combining multiple decision trees. The study focuses on identifying and refining the most efficient ensemble method for this task. Furthermore, the research emphasizes the importance of evaluating performance metrics pertinent to minority classes and presents a comparative analysis of model outcomes achieved with both balanced and imbalanced datasets, highlighting the impact of pre-processing strategies on predictive accuracy and fairness.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


