Large Language Models (LLMs) offer numerous benefits, but they also pose threats, such as cybercriminals creating fake, convincing content such as phishing emails. LLMs are more convenient for criminals than handcrafting, making phishing campaigns more likely and more widespread in the future. To combat these attacks, detecting whether an email is generated by LLMs is critical. However, previous attempts have resulted in solutions that are uninterpretable and resource-intensive due to their complexity. This results in warning dialogs that do not adequately protect users. This work aims to address this problem using traditional, lightweight machine learning models that are easy to interpret and require fewer computational resources. This approach allows users to understand why an email is AI-generated, improving their decision-making in the case of phishing emails. This study has shown that logistic regression can achieve excellent performance in detecting emails generated by LLMs, while still providing the transparency needed to provide useful explanations to users.

David versus Goliath: Can Machine Learning Detect LLM-Generated Text? A Case Study in the Detection of Phishing Emails

Francesco Greco
;
Giuseppe Desolda;Andrea Esposito;
2024-01-01

Abstract

Large Language Models (LLMs) offer numerous benefits, but they also pose threats, such as cybercriminals creating fake, convincing content such as phishing emails. LLMs are more convenient for criminals than handcrafting, making phishing campaigns more likely and more widespread in the future. To combat these attacks, detecting whether an email is generated by LLMs is critical. However, previous attempts have resulted in solutions that are uninterpretable and resource-intensive due to their complexity. This results in warning dialogs that do not adequately protect users. This work aims to address this problem using traditional, lightweight machine learning models that are easy to interpret and require fewer computational resources. This approach allows users to understand why an email is AI-generated, improving their decision-making in the case of phishing emails. This study has shown that logistic regression can achieve excellent performance in detecting emails generated by LLMs, while still providing the transparency needed to provide useful explanations to users.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/489120
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact