Issue Classification with LLMs: an Empirical Study of the NASA Flight Software Systems

IRIS

NASA collects vast amounts of problem data for space projects, which includes not only descriptions of defects but also enhancements and other issue reports. The growing complexity of Flight Software has led to an increase in the volume of problem reports, presenting both opportunities and challenges in data analysis. This paper explores AI-based solutions for classifying software issue reports in NASA’s spacecraft control systems. In particular, we aim to develop an accurate classifier for identifying bug tickets, building on previous research in automated issue labeling. We conduct a benchmark study for comparing various language models and provide insights on their performance and deployment costs, with the goal of improving issue classification for NASA’s growing software complexity. Based on our empirical results, we provide empirically-driven guidelines on how to address the tradeoff between the need for manual labeling of training data and the computational costs associated with the on-premise deployment of LLMs that could be used in a zero-shot setting.

Issue Classification with LLMs: an Empirical Study of the NASA Flight Software Systems

Giuseppe Colavito;Filippo Lanubile;Nicole Novielli;Christopher Arreza;Ying Shi

2026-01-01

Abstract

NASA collects vast amounts of problem data for space projects, which includes not only descriptions of defects but also enhancements and other issue reports. The growing complexity of Flight Software has led to an increase in the volume of problem reports, presenting both opportunities and challenges in data analysis. This paper explores AI-based solutions for classifying software issue reports in NASA’s spacecraft control systems. In particular, we aim to develop an accurate classifier for identifying bug tickets, building on previous research in automated issue labeling. We conduct a benchmark study for comparing various language models and provide insights on their performance and deployment costs, with the goal of improving issue classification for NASA’s growing software complexity. Based on our empirical results, we provide empirically-driven guidelines on how to address the tradeoff between the need for manual labeling of training data and the computational costs associated with the on-premise deployment of LLMs that could be used in a zero-shot setting.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2026

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/570620

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact