The innovation demand and supply matching requires an accurate and time-consuming analysis of patents and the identification of their technological domains; since these tasks can be particularly challenging, this is why recent studies have evaluated the possibility of adopting Artificial Intelligence based on NLP techniques. Here, we present an automated workflow for patent analysis and classification devoted to the Italian patent scenario. High-quality data from the online platform KnowledgeShare (KS) were investigated: KS is the first patent management platform on the Italian innovation scene. A not secondary aspect consisted in determining which words mostly influenced patent classification, thus characterizing the corresponding research areas. Several models were compared to ensure the workflow’s robustness; Logistic Regression (LR) resulted in the best-performing model, and its performance compared well with the State of the Art. For each technological domain in the KS database, we evaluated and discussed its characteristic words; furthermore, a further analysis was focused on explaining why some domains, such as “Packaging” and “Environment,” were particularly confounding. This last aspect is of paramount importance to identify cross-contamination effects among research areas.
An Italian Patent Multi-Label Classification System to Support the Innovation Demand and Supply Matching
Amoroso, Nicola;Pantaleo, Ester;Tangaro, Sabina;Monaco, Alfonso;Bellotti, Roberto
2025-01-01
Abstract
The innovation demand and supply matching requires an accurate and time-consuming analysis of patents and the identification of their technological domains; since these tasks can be particularly challenging, this is why recent studies have evaluated the possibility of adopting Artificial Intelligence based on NLP techniques. Here, we present an automated workflow for patent analysis and classification devoted to the Italian patent scenario. High-quality data from the online platform KnowledgeShare (KS) were investigated: KS is the first patent management platform on the Italian innovation scene. A not secondary aspect consisted in determining which words mostly influenced patent classification, thus characterizing the corresponding research areas. Several models were compared to ensure the workflow’s robustness; Logistic Regression (LR) resulted in the best-performing model, and its performance compared well with the State of the Art. For each technological domain in the KS database, we evaluated and discussed its characteristic words; furthermore, a further analysis was focused on explaining why some domains, such as “Packaging” and “Environment,” were particularly confounding. This last aspect is of paramount importance to identify cross-contamination effects among research areas.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


