Search for a command to run...
We study a matrix completion problem where both the ground truth R matrix and the unknown sampling distribution P over observed entries are low-rank matrices, and share a common subspace. We assume that a large amount M of unlabeled data drawn from the sampling distribution P is available, together with a small amount N of "labeled" data drawn from the same distribution and noisy estimates of the corresponding ground truth entries. This setting is inspired by recommender systems scenarios where the unlabeled data corresponds to "implicit feedback" (consisting in interactions such as purchase, click, etc. ) and the labeled data corresponds to the "explicit feedback", consisting of interactions where the user has given an explicit rating to the item. Leveraging powerful results from the theory of low-rank subspace recovery, together with classic generalization bounds for matrix completion models, we show error bounds consisting of a sum of two error terms corresponding to sample complexities of nd and dr respectively (ignoring log factors), where d is the rank of P and r is the rank of M. In synthetic experiments, we confirm that the true generalization error naturally splits into independent error terms corresponding to the estimations of P and the ground truth matrix G respectively. In real-life experiments on Douban and MovieLens with most explicit ratings removed, we demonstrate that the method can outperform baselines relying only on the explicit ratings, demonstrating that our assumptions provide a valid toy theoretical setting to study the interaction between explicit and implicit feedbacks in recommender systems.
Published in: Proceedings of the AAAI Conference on Artificial Intelligence
Volume 40, Issue 27, pp. 22769-22777