The innovation demand and supply matching requires an accurate and time-consuming analysis of patents and the identification of their technological domains; since these tasks can be particularly challenging, this is why recent studies have evaluated the possibility of adopting Artificial Intelligence based on NLP techniques. Here, we present an automated workflow for patent analysis and classification devoted to the Italian patent scenario. High-quality data from the online platform KnowledgeShare (KS) were investigated: KS is the first patent management platform on the Italian innovation scene. A not secondary aspect consisted in determining which words mostly influenced patent classification, thus characterizing the corresponding research areas. Several models were compared to ensure the workflow’s robustness; Logistic Regression (LR) resulted in the best-performing model, and its performance compared well with the State of the Art. For each technological domain in the KS database, we evaluated and discussed its characteristic words; furthermore, a further analysis was focused on explaining why some domains, such as “Packaging” and “Environment,” were particularly confounding. This last aspect is of paramount importance to identify cross-contamination effects among research areas.

An Italian Patent Multi-Label Classification System to Support the Innovation Demand and Supply Matching

Amoroso, Nicola;Pantaleo, Ester;Tangaro, Sabina;Monaco, Alfonso;Bellotti, Roberto
2025-01-01

Abstract

The innovation demand and supply matching requires an accurate and time-consuming analysis of patents and the identification of their technological domains; since these tasks can be particularly challenging, this is why recent studies have evaluated the possibility of adopting Artificial Intelligence based on NLP techniques. Here, we present an automated workflow for patent analysis and classification devoted to the Italian patent scenario. High-quality data from the online platform KnowledgeShare (KS) were investigated: KS is the first patent management platform on the Italian innovation scene. A not secondary aspect consisted in determining which words mostly influenced patent classification, thus characterizing the corresponding research areas. Several models were compared to ensure the workflow’s robustness; Logistic Regression (LR) resulted in the best-performing model, and its performance compared well with the State of the Art. For each technological domain in the KS database, we evaluated and discussed its characteristic words; furthermore, a further analysis was focused on explaining why some domains, such as “Packaging” and “Environment,” were particularly confounding. This last aspect is of paramount importance to identify cross-contamination effects among research areas.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/588280
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact