Semi-supervised learning has gained great interest because of its ability to combine unlabeled data with-potentially few-labeled observations in a training process. However, in some application contexts, one can question whether all available labels are equally valid. For example, in the context of bipolar disorder (BD) remote monitoring, a common practice is to extrapolate the psychiatrist's assessment onto some fixed time window surrounding the visit, the so-called ground truth period. In consequence, all data from this period are labeled with the same category. Such an approach may potentially result in misguided supervision affecting the model's performance. In this paper, we consider the problem of label uncertainty, assuming that the labels are crisp, but they may be assigned to particular observations with varying confidence. We propose a novel method called Confidence Path Regularization (CPR) that incorporates this uncertainty into the fuzzy c-means semi-supervised learning. The proposed CPR approach is a novel method for automatic, data-driven handling of label uncertainty. We achieve it by estimating the confidence factor for each labeled observation. In addition, CPR allows for the exploration of potential class-specific patterns in the adjusted confidence. The proposed method is illustrated with experiments on partially labeled data about speech characteristics collected from smartphone application for BD monitoring. In this particular applied scenario, we also use additional contextual data to improve the construction of confidence paths. It is shown that the proposed CPR approach enables to reflect the varying confidence in labels as compared with the nominal approach which assigns the majority of observations to the same class associated with relevant ground truth period
Confidence path regularization for handling label uncertainty in semi-supervised learning: use case in bipolar disorder monitoring
Casalino Gabriella;Giovanna Castellano;
2022-01-01
Abstract
Semi-supervised learning has gained great interest because of its ability to combine unlabeled data with-potentially few-labeled observations in a training process. However, in some application contexts, one can question whether all available labels are equally valid. For example, in the context of bipolar disorder (BD) remote monitoring, a common practice is to extrapolate the psychiatrist's assessment onto some fixed time window surrounding the visit, the so-called ground truth period. In consequence, all data from this period are labeled with the same category. Such an approach may potentially result in misguided supervision affecting the model's performance. In this paper, we consider the problem of label uncertainty, assuming that the labels are crisp, but they may be assigned to particular observations with varying confidence. We propose a novel method called Confidence Path Regularization (CPR) that incorporates this uncertainty into the fuzzy c-means semi-supervised learning. The proposed CPR approach is a novel method for automatic, data-driven handling of label uncertainty. We achieve it by estimating the confidence factor for each labeled observation. In addition, CPR allows for the exploration of potential class-specific patterns in the adjusted confidence. The proposed method is illustrated with experiments on partially labeled data about speech characteristics collected from smartphone application for BD monitoring. In this particular applied scenario, we also use additional contextual data to improve the construction of confidence paths. It is shown that the proposed CPR approach enables to reflect the varying confidence in labels as compared with the nominal approach which assigns the majority of observations to the same class associated with relevant ground truth periodI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.