The last few years have seen an increasing interest of the RecSys community in the multimodal recommendation research field, as shown by the numerous contributions proposed in the literature. Our paper falls in this research line, as we released a multimodal extension of three state-of-the-art datasets (MovieLens-1M, DBbook, Last.fm-2K) in the movie, book, music recommendation domains, respectively. Although these datasets have been widely adopted for classical recommendation tasks (e.g., collaborative filtering), their use in multimodal recommendation has been hindered by the absence of multimodal information. To fill this gap, we have manually collected multimodal item raw files from different modalities (text, images, audio, and video, when available) for each dataset.Specifically, we have collected, for MovieLens-1M, movie plots (textual information), movie posters (images) and movie trailers (audio and video); for Last.fm-2K, we have collected, for each artist, the tags provided by users (textual information), the most popular album covers (images), and the most popular songs (audio); finally, for DBbook we have collected book abstracts (textual information) and book covers (image). We encoded all this information using state-of-the-art feature encoders, and we released the extended datasets, which include the mappings to the raw multimodal information and the encoded features. Finally, we conduct a benchmark analysis of various recommendation models using MMRec as a multimodal recommendation framework. Our results show that multimodal information can further enhance the quality of recommendations in these domains compared to single collaborative filtering. We release the multimodal version of such datasets to foster this research line, including links to download the raw multimodal files and the encoded item features.
See the Movie, Hear the Song, Read the Book: Extending MovieLens-1M, Last.fm-2K, and DBbook with Multimodal Data
Spillo, Giuseppe;Musto, Cataldo;de Gemmis, Marco;Lops, Pasquale;Semeraro, Giovanni
2025-01-01
Abstract
The last few years have seen an increasing interest of the RecSys community in the multimodal recommendation research field, as shown by the numerous contributions proposed in the literature. Our paper falls in this research line, as we released a multimodal extension of three state-of-the-art datasets (MovieLens-1M, DBbook, Last.fm-2K) in the movie, book, music recommendation domains, respectively. Although these datasets have been widely adopted for classical recommendation tasks (e.g., collaborative filtering), their use in multimodal recommendation has been hindered by the absence of multimodal information. To fill this gap, we have manually collected multimodal item raw files from different modalities (text, images, audio, and video, when available) for each dataset.Specifically, we have collected, for MovieLens-1M, movie plots (textual information), movie posters (images) and movie trailers (audio and video); for Last.fm-2K, we have collected, for each artist, the tags provided by users (textual information), the most popular album covers (images), and the most popular songs (audio); finally, for DBbook we have collected book abstracts (textual information) and book covers (image). We encoded all this information using state-of-the-art feature encoders, and we released the extended datasets, which include the mappings to the raw multimodal information and the encoded features. Finally, we conduct a benchmark analysis of various recommendation models using MMRec as a multimodal recommendation framework. Our results show that multimodal information can further enhance the quality of recommendations in these domains compared to single collaborative filtering. We release the multimodal version of such datasets to foster this research line, including links to download the raw multimodal files and the encoded item features.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


