This work presents a methodology for adapting an open Large Language Model (LLM) to the Italian legal domain. We construct a legal document corpus from the Normattiva website and develop a custom scraper to ensure high-quality text extraction. The resulting corpus is used to adapt the Llama-3.1-8b model through continuous pre-training and Low-Rank Adaptation (LoRA). The adapted model's performance is evaluated by assessing its ability to complete sentences coherently within the new domain. Results demonstrate that the adapted model surpasses the original model across all metrics, considering various prompt lengths and different sizes of the training corpus.
Adapting a Large Language Model to the Legal Domain: A Case Study in Italian
Basile P.;de Gemmis M.
2024-01-01
Abstract
This work presents a methodology for adapting an open Large Language Model (LLM) to the Italian legal domain. We construct a legal document corpus from the Normattiva website and develop a custom scraper to ensure high-quality text extraction. The resulting corpus is used to adapt the Llama-3.1-8b model through continuous pre-training and Low-Rank Adaptation (LoRA). The adapted model's performance is evaluated by assessing its ability to complete sentences coherently within the new domain. Results demonstrate that the adapted model surpasses the original model across all metrics, considering various prompt lengths and different sizes of the training corpus.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


