A dynamic system for the automatic classification of complex resources in digital libraries: design and preliminary evaluation

Barbuti, Nicola; Caldarola, Tommaso

doi:10.1108/dlp-05-2025-0063

Purpose – Digital image automatic classification has become a critical field in machine learning. With the exponential increase in the availability of digital data and the growing complexity of applications, the need to develop accurate and efficient data automatic classification models has become urgent across multiple sectors, as they contribute to enhancing operational efficiency. This scenario underscores the necessity of exploring and developing new approaches capable of overcoming these challenges, further improving the accuracy and efficiency of classification techniques. This paper aims to present the ongoing research for development and testing an automatic image classification model for digital libraries, based on complex neural networks (CNNs). Design/methodology/approach – Despite the significant advancements achieved with the advent of deep learning approaches, the challenges of automatic classification of digital resources remain in terms of generalization, model interpretability and reducing dependence on large training data sets. After outlining the state-of-the-art digital resources’ automatic classification, the paper describes the model research and design and the pilot of implemented application workflow. Finally, preliminary research results are assessed, considering that experimentation is still ongoing to evaluate the potential integration of AI tools to enhance model performance. Findings – The research addresses the challenge of developing a model effective for classifying digital resources referring to the huge and various contexts of digital libraries including resources representative of manuscript, early printed and modern books. The process to develop the first pilot of the automatic classification system the researchers designed and developed has been clearly outlined. The workflow and the generation of a specific CNN for classifying digital libraries are detailed by examples, figures and tables that show each step of the process describing the methods, techniques and technologies used. Research limitations/implications – There are no research limitations/implications. Practical implications – There are no practical implications. Social implications – There are no social implications. Originality/value – The experimentation results provide an encouraging overall picture of the developed models’ performance, highlighting their potential for analyzing and classifying the structures of the considered materials. An extensive series of tests were conducted on a diverse data set to assess their effectiveness, accompanied by a rigorous validation procedure on an even larger sample. The pilot model shows remarkable performance in accuracy, achieving an average correct classification rate of 78% for the three analyzed types and over the full validation data set. The learning curve displayed good convergence, suggesting that further optimizations could improve the overall precision, particularly fine-tuning the hyperparameters regulating the training process and refining the machine learning model topology. Such improvements would increase accuracy and reduce uncertainty for classes that showed greater variability.