**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Scalable Collaborative Bayesian Preference Learning

Abstract

Learning about users’ utilities from preference, discrete choice or implicit feedback data is of integral importance in e-commerce, targeted advertising and web search. Due to the sparsity and diffuse nature of data, Bayesian approaches hold much promise, yet most prior work does not scale up to realistic data sizes. We shed light on why inference for such settings is computationally difficult for standard machine learning methods, most of which focus on predicting explicit ratings only. To simplify the difficulty, we present a novel expectation maximization algorithm, driven by expectation propagation approximate inference, which scales to very large datasets without requiring strong factorization assumptions. Our utility model uses both latent bilinear collaborative filtering and non-parametric Gaussian process (GP) regression. In experiments on large real-world datasets, our method gives substantially better results than either matrix factorization or GPs in isolation, and converges significantly faster.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related concepts (24)

Related publications (85)

Inference

Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word infer means to "carry forward". Inference is theoretically traditionally divided into deductio

Machine learning

Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machin

Bayesian inference

Bayesian inference (ˈbeɪziən or ˈbeɪʒən ) is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes avail

Loading

Loading

Loading

We develop approximate inference and learning methods for facilitating the use of probabilistic modeling techniques motivated by applications in two different areas. First, we consider the ill-posed inverse problem of recovering an image from an underdetermined system of linear measurements corrupted by noise. Second, we consider the problem of inferring user preferences for items from counts, pairwise comparisons and user activity logs, instances of implicit feedback. Plausible models for images and the noise, incurred when recording them, render posterior inference intractable, while the scale of the inference problem makes sampling based approximations ineffective. Therefore, we develop deterministic approximate inference algorithms for two different augmentations of a typical sparse linear model: first, for the rectified-linear Poisson likelihood, and second, for tree-structured super-Gaussian mixture models. The rectified-linear Poisson likelihood is an alternative noise model, applicable in astronomical and biomedical imaging applications, that operate in intensity regimes in which quantum effects lead to observations that are best described by counts of particles arriving at a sensor, as well as in general Poisson regression problems arising in various fields. In this context we show, that the model-specific computations for Expectation Propagation can be robustly solved by a simple dynamic program. Next, we develop a scalable approximate inference algorithm for structured mixture models, that uses a discrete graphical model to represent dependencies between the latent mixture components of a collection of mixture models. Specifically, we use tree-structured mixtures of super-Gaussians to model the persistence across scales of large coefficients of the Wavelet transform of an image for improved reconstruction. In the second part on models of user preference, we consider two settings: the global static and the contextual dynamic setting. In the global static setting, we represent user-item preferences by a latent low-rank matrix. Instead of using numeric ratings we develop methods to infer this latent representation for two types of implicit feedback: aggregate counts of users interacting with a service and the binary outcomes of pairwise comparisons. We model count data using a latent Gaussian bilinear model with Poisson likelihoods. For this model, we show that the Variational Gaussian approximation can be further relaxed to be available in closed-form by adding additional constraints, leading to an efficient inference algorithm. In the second implicit feedback scenario, we infer the latent preference matrix from pairwise preference statements. We combine a low-rank bilinear model with non-parameteric item- feature regression and develop a novel approximate variational Expectation Maximization algorithm that mitigates the computational challenges due to latent couplings induced by the pairwise comparisons. Finally, in the contextual dynamic setting, we model sequences of user activity at the granularity of single interaction events instead of aggregate counts. Routinely gathered in the background at a large scale in many applications, such sequences can reveal temporal and contextual aspects of user behavior through recurrent patterns. To describe such data, we propose a generic collaborative sequence model based on recurrent neural networks, that combines ideas from collaborative filtering and language modeling.

Mohammad Emtiyaz Khan, Young Jun Ko

Bilinear models of count data with Poisson distribution are popular in applications such as matrix factorization for recommendation systems, modeling of receptive fields of sensory neurons, and modeling of neural-spike trains. Bayesian inference in such models remains challenging due to the product term of two Gaussian random vectors. In this paper, we propose new algorithms for such models based on variational Gaussian (VG) inference. We make two contributions. First, we show that the VG lower bound for these models, previously known to be intractable, is available in closed form under certain non-trivial constraints on the form of the posterior. Second, we show that the lower bound is biconcave and can be efficiently optimized for mean-field approximations. We also show that bi-concavity generalizes to the larger family of log-concave likelihoods, that subsume the Poisson distribution. We present new inference algorithms based on these results and demonstrate better performance on real-world problems at the cost of a modest increase in computation. Our contributions in this paper, therefore, provide more choices for Bayesian inference in terms of a speed-vs-accuracy tradeoff.

2014Musical grammar describes a set of principles that are used to understand and interpret the structure of a piece according to a musical style.
The main topic of this study is grammar induction for harmony --- the process of learning structural principles from the observation of chord sequences.
The question how grammars are learnable by induction from sequential data is an instance of the more general question how abstract knowledge is inducible from the observation of data --- a central question of cognitive science.
Under the assumption that human learning approximately follows the principles of rational reasoning, Bayesian models of cognition can be used to simulate learning processes.
This study investigates what prior knowledge makes it possible to learn musical grammar inductively from Jazz chord sequences using Bayesian models and computational simulations.
The theoretical part of the thesis presents how questions about learnability can be studied in a unified framework involving music analysis, cognitive modeling, Bayesian statistics, and computational simulations.
A new grammar formalism, called Probabilistic Abstract Context-Free Grammar (PACFG), is proposed that allows for flexible probability models which facilitate the grammar-induction experiments of this study. PACFG can jointly model multiple musical dimensions such as harmony and rhythm, and can use coordinate ascent variational inference for grammar learning.
The empirical part of the thesis reports supervised and unsupervised grammar-learning experiments.
To train and evaluate grammar models, a ground-truth dataset of hierarchical analyses of complete Jazz standards, called the Jazz Harmony Treebank (JHT), was created.
The supervised grammar-learning experiments, in which grammars for Jazz harmony are learned from the JHT analyses, show that jointly modeling harmony and rhythm significantly improves the grammar models' prediction of the ground truth.
The performance and robustness of the grammars are further improved by a transpositionally invariant parameterization of rule probabilities.
Following the supervised grammar learning, unsupervised grammar learning was performed by inducing harmony grammars merely from Jazz chord sequences, without the observation of the JHT trees.
The results show that the best induced grammar performs similarly well as the best supervised grammar.
In particular, the goal-directedness of functional harmony does not need to be assumed a priori, but can be learned without usage of music-specific prior knowledge.
The findings of this thesis show that general prior knowledge enables an ideal learner to acquire abstract musical principles by statistical learning.
In conclusion, it is plausible that much aspects of musical grammar have been learned by Jazz musicians and listeners, instead of being innate predispositions or explicitly taught concepts.
This thesis is moreover embedded into the context of empirical music research and digital humanities.
Current studies either describe complex musical structures qualitatively or investigate simpler aspects quantitatively.
The computational models developed in this thesis demonstrate that deep insights into music and statistical analyses are not mutually exclusive.
They enable a new kind of data-driven music theory and musicology, for instance through comparative analyses of musical grammar for different styles such as Jazz, Rock, and Western classical music.