Conversational Recommender Systems (CRSs) facilitate item discovery through multi-turn dialogues that elicit user preferences via natural language interaction. This field has gained significant attention following advancements in Natural Language Processing (NLP) enabled by Large Language Models (LLMs). However, current CRS research remains constrained by datasets with fundamental limitations. Human-generated datasets suffer from inconsistent dialogue quality, limited domain expertise, and insufficient scale for real-world application, while synthetic datasets created with proprietary LLMs ignore the diversity of real-world user behavior and present significant barriers to accessibility and reproducibility. The development of effective CRSs depends critically on addressing these deficiencies.To this end, we present DistillRecDial, a novel conversational recommendation dataset generated through a knowledge distillation pipeline that leverages smaller, more accessible open LLMs. Crucially, DistillRecDial simulates a range of user types with varying intentions, preference expression styles, and initiative levels, capturing behavioral diversity that is largely absent from prior work. Human evaluation demonstrates that our dataset significantly outperforms widely adopted CRS datasets in dialogue coherence and domain-specific expertise, indicating its potential to advance the development of more realistic and effective conversational recommender systems.
DistillRecDial: A Knowledge-Distilled Dataset Capturing User Diversity in Conversational Recommendation
Martina, Alessandro Francesco Maria;Petruzzelli, Alessandro;Musto, Cataldo;de Gemmis, Marco;Lops, Pasquale;Semeraro, Giovanni
2025-01-01
Abstract
Conversational Recommender Systems (CRSs) facilitate item discovery through multi-turn dialogues that elicit user preferences via natural language interaction. This field has gained significant attention following advancements in Natural Language Processing (NLP) enabled by Large Language Models (LLMs). However, current CRS research remains constrained by datasets with fundamental limitations. Human-generated datasets suffer from inconsistent dialogue quality, limited domain expertise, and insufficient scale for real-world application, while synthetic datasets created with proprietary LLMs ignore the diversity of real-world user behavior and present significant barriers to accessibility and reproducibility. The development of effective CRSs depends critically on addressing these deficiencies.To this end, we present DistillRecDial, a novel conversational recommendation dataset generated through a knowledge distillation pipeline that leverages smaller, more accessible open LLMs. Crucially, DistillRecDial simulates a range of user types with varying intentions, preference expression styles, and initiative levels, capturing behavioral diversity that is largely absent from prior work. Human evaluation demonstrates that our dataset significantly outperforms widely adopted CRS datasets in dialogue coherence and domain-specific expertise, indicating its potential to advance the development of more realistic and effective conversational recommender systems.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


