The need to deal with the inherent uncertainty in real-world relational or networked data leads to the proposal of new probabilistic models, such as probabilistic graphs. Every edge in a probabilistic graph is associated with a probability whose value represents the likelihood of its existence, or the strength of the relation between the entities it connects. The aim of this paper is to propose two machine learning techniques for the link classification problem in relational data exploiting the probabilistic graph representation. Both the proposed methods will exploit a language-constrained reachability method to infer the probability of possible hidden relationships that may exists between two nodes in a probabilistic graph. Each hidden relationships between two nodes may be viewed as a feature (or a factor), and its corresponding probability as its weight, while an observed relationship is considered as a positive instance for its corresponding link label. Given a training set of observed links, the first learning approach is to use a propositionalization technique adopting a L2-regularized Logistic Regression to learn a model able to predict unobserved link labels. Since in some cases the edges’ probability may be not known in advance or they could not be precisely defined for a classification task, the second xposed approach is to exploit the inference method and to use a mean squared technique to learn the edges’ probabilities. Both the proposed methods have been evaluated on real world data sets and the corresponding results proved their validity.
Link classification with probabilistic graphs
DI MAURO, NICOLA;TARANTO, CLAUDIO;ESPOSITO, Floriana
2014-01-01
Abstract
The need to deal with the inherent uncertainty in real-world relational or networked data leads to the proposal of new probabilistic models, such as probabilistic graphs. Every edge in a probabilistic graph is associated with a probability whose value represents the likelihood of its existence, or the strength of the relation between the entities it connects. The aim of this paper is to propose two machine learning techniques for the link classification problem in relational data exploiting the probabilistic graph representation. Both the proposed methods will exploit a language-constrained reachability method to infer the probability of possible hidden relationships that may exists between two nodes in a probabilistic graph. Each hidden relationships between two nodes may be viewed as a feature (or a factor), and its corresponding probability as its weight, while an observed relationship is considered as a positive instance for its corresponding link label. Given a training set of observed links, the first learning approach is to use a propositionalization technique adopting a L2-regularized Logistic Regression to learn a model able to predict unobserved link labels. Since in some cases the edges’ probability may be not known in advance or they could not be precisely defined for a classification task, the second xposed approach is to exploit the inference method and to use a mean squared technique to learn the edges’ probabilities. Both the proposed methods have been evaluated on real world data sets and the corresponding results proved their validity.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.