NASA collects vast amounts of problem data for space projects, which includes not only descriptions of defects but also enhancements and other issue reports. The growing complexity of Flight Software has led to an increase in the volume of problem reports, presenting both opportunities and challenges in data analysis. This paper explores AI-based solutions for classifying software issue reports in NASA’s spacecraft control systems. In particular, we aim to develop an accurate classifier for identifying bug tickets, building on previous research in automated issue labeling. We conduct a benchmark study for comparing various language models and provide insights on their performance and deployment costs, with the goal of improving issue classification for NASA’s growing software complexity. Based on our empirical results, we provide empirically-driven guidelines on how to address the tradeoff between the need for manual labeling of training data and the computational costs associated with the on-premise deployment of LLMs that could be used in a zero-shot setting.
Issue Classification with LLMs: an Empirical Study of the NASA Flight Software Systems
Giuseppe Colavito;Filippo Lanubile;Nicole Novielli;
2026-01-01
Abstract
NASA collects vast amounts of problem data for space projects, which includes not only descriptions of defects but also enhancements and other issue reports. The growing complexity of Flight Software has led to an increase in the volume of problem reports, presenting both opportunities and challenges in data analysis. This paper explores AI-based solutions for classifying software issue reports in NASA’s spacecraft control systems. In particular, we aim to develop an accurate classifier for identifying bug tickets, building on previous research in automated issue labeling. We conduct a benchmark study for comparing various language models and provide insights on their performance and deployment costs, with the goal of improving issue classification for NASA’s growing software complexity. Based on our empirical results, we provide empirically-driven guidelines on how to address the tradeoff between the need for manual labeling of training data and the computational costs associated with the on-premise deployment of LLMs that could be used in a zero-shot setting.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


