In contemporary online interactions, identifying inappropriate language and safeguarding minors from harmful communication is a critical challenge. This study explores the use of Large Language Models (LLMs) to analyze text, detecting patterns indicative of age-specific language and the presence of sexual or pornographic references. A fine-tuning of the LLaMAntino model was performed, using a dataset of synthetically generated sentences designed to replicate real-world scenarios. The fine-tuned model demonstrated enhanced performance compared to its baseline (given by LLaMAntino 3 ANITA 8B), providing detailed and context-sensitive explanations for its classifications. The results highlight the potential of LLMs in addressing sensitive linguistic phenomena with precision, offering a foundation for detecting indirect combinations of sexual references in conversations involving minors. Future work can focus on incorporating real conversational data and involving subject matter experts to refine the model’s interpretability and reliability. Additionally, the exploration of advanced architectures and fine-tuning techniques will be considered to further balance model complexity and processing efficiency.
LLMs to Detect Cyber Child Abuse in the in Textual Conversations
Baldassarre M. T.;Barletta V. S.;Caivano D.;Lippolis Andrea;Piccinno Antonio
2025-01-01
Abstract
In contemporary online interactions, identifying inappropriate language and safeguarding minors from harmful communication is a critical challenge. This study explores the use of Large Language Models (LLMs) to analyze text, detecting patterns indicative of age-specific language and the presence of sexual or pornographic references. A fine-tuning of the LLaMAntino model was performed, using a dataset of synthetically generated sentences designed to replicate real-world scenarios. The fine-tuned model demonstrated enhanced performance compared to its baseline (given by LLaMAntino 3 ANITA 8B), providing detailed and context-sensitive explanations for its classifications. The results highlight the potential of LLMs in addressing sensitive linguistic phenomena with precision, offering a foundation for detecting indirect combinations of sexual references in conversations involving minors. Future work can focus on incorporating real conversational data and involving subject matter experts to refine the model’s interpretability and reliability. Additionally, the exploration of advanced architectures and fine-tuning techniques will be considered to further balance model complexity and processing efficiency.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


