Introduction Large Language Models (LLMs) are increasingly used in oncology, but their application in neuroendocrine neoplasms (NENs) is still unexplored. Aim To evaluate the accuracy, clarity, and completeness of LLM responses to clinically relevant NEN management questions. Material and methods A study was conducted from October to December 2024, during which a team of experts posed nine key NEN management questions to three LLMs: ChatGPT Plus, Microsoft Copilot, and Perplexity. Responses were assessed by 22 expert physicians across specialties using a 5-point Likert scale based on scientific accuracy, clarity, and completeness, and additionally evaluated by 24 NEN patients for clarity and relevance. Primary outcomes included LLM performance across evaluation criteria and factors influencing ratings, analyzed using a linear mixed-effects model. Results ChatGPT Plus scored highest (M = 3.72, SE = 0.19), followed by Copilot (M = 3.54, SE = 0.19) and Perplexity (M = 3.22, SE = 0.19), with clarity rating the highest. The chemotherapy indications question received the lowest scores, underscoring LLM challenges in handling complex clinical decisions. Discussion This study highlights LLMs' potential in NEN management as informative tools with clear but variably accurate responses. Continuous improvement and clinician oversight are essential for their successful integration into patient communication.

Enhancing patient-centered care with AI: a study of responses to neuroendocrine neoplasms queries

Cives, Mauro;
2025-01-01

Abstract

Introduction Large Language Models (LLMs) are increasingly used in oncology, but their application in neuroendocrine neoplasms (NENs) is still unexplored. Aim To evaluate the accuracy, clarity, and completeness of LLM responses to clinically relevant NEN management questions. Material and methods A study was conducted from October to December 2024, during which a team of experts posed nine key NEN management questions to three LLMs: ChatGPT Plus, Microsoft Copilot, and Perplexity. Responses were assessed by 22 expert physicians across specialties using a 5-point Likert scale based on scientific accuracy, clarity, and completeness, and additionally evaluated by 24 NEN patients for clarity and relevance. Primary outcomes included LLM performance across evaluation criteria and factors influencing ratings, analyzed using a linear mixed-effects model. Results ChatGPT Plus scored highest (M = 3.72, SE = 0.19), followed by Copilot (M = 3.54, SE = 0.19) and Perplexity (M = 3.22, SE = 0.19), with clarity rating the highest. The chemotherapy indications question received the lowest scores, underscoring LLM challenges in handling complex clinical decisions. Discussion This study highlights LLMs' potential in NEN management as informative tools with clear but variably accurate responses. Continuous improvement and clinician oversight are essential for their successful integration into patient communication.
File in questo prodotto:
File Dimensione Formato  
s12020-025-04294-9.pdf

non disponibili

Descrizione: Article
Tipologia: Documento in Versione Editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.03 MB
Formato Adobe PDF
1.03 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/543301
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 4
social impact