**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Applications of Approximate Learning and Inference for Probabilistic Models

Abstract

We develop approximate inference and learning methods for facilitating the use of probabilistic modeling techniques motivated by applications in two different areas. First, we consider the ill-posed inverse problem of recovering an image from an underdetermined system of linear measurements corrupted by noise. Second, we consider the problem of inferring user preferences for items from counts, pairwise comparisons and user activity logs, instances of implicit feedback. Plausible models for images and the noise, incurred when recording them, render posterior inference intractable, while the scale of the inference problem makes sampling based approximations ineffective. Therefore, we develop deterministic approximate inference algorithms for two different augmentations of a typical sparse linear model: first, for the rectified-linear Poisson likelihood, and second, for tree-structured super-Gaussian mixture models. The rectified-linear Poisson likelihood is an alternative noise model, applicable in astronomical and biomedical imaging applications, that operate in intensity regimes in which quantum effects lead to observations that are best described by counts of particles arriving at a sensor, as well as in general Poisson regression problems arising in various fields. In this context we show, that the model-specific computations for Expectation Propagation can be robustly solved by a simple dynamic program. Next, we develop a scalable approximate inference algorithm for structured mixture models, that uses a discrete graphical model to represent dependencies between the latent mixture components of a collection of mixture models. Specifically, we use tree-structured mixtures of super-Gaussians to model the persistence across scales of large coefficients of the Wavelet transform of an image for improved reconstruction. In the second part on models of user preference, we consider two settings: the global static and the contextual dynamic setting. In the global static setting, we represent user-item preferences by a latent low-rank matrix. Instead of using numeric ratings we develop methods to infer this latent representation for two types of implicit feedback: aggregate counts of users interacting with a service and the binary outcomes of pairwise comparisons. We model count data using a latent Gaussian bilinear model with Poisson likelihoods. For this model, we show that the Variational Gaussian approximation can be further relaxed to be available in closed-form by adding additional constraints, leading to an efficient inference algorithm. In the second implicit feedback scenario, we infer the latent preference matrix from pairwise preference statements. We combine a low-rank bilinear model with non-parameteric item- feature regression and develop a novel approximate variational Expectation Maximization algorithm that mitigates the computational challenges due to latent couplings induced by the pairwise comparisons. Finally, in the contextual dynamic setting, we model sequences of user activity at the granularity of single interaction events instead of aggregate counts. Routinely gathered in the background at a large scale in many applications, such sequences can reveal temporal and contextual aspects of user behavior through recurrent patterns. To describe such data, we propose a generic collaborative sequence model based on recurrent neural networks, that combines ideas from collaborative filtering and language modeling.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related concepts (39)

Inference

Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word infer means to "carry forward". Inference is theoretically traditionally divided into deductio

Collaborative filtering

Collaborative filtering (CF) is a technique used by recommender systems. Collaborative filtering has two senses, a narrow one and a more general one.
In the newer, narrower sense, collaborative fil

Statistical inference

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for ex

Related publications (119)

Loading

Loading

Loading

Mohammad Emtiyaz Khan, Young Jun Ko

Bilinear models of count data with Poisson distribution are popular in applications such as matrix factorization for recommendation systems, modeling of receptive fields of sensory neurons, and modeling of neural-spike trains. Bayesian inference in such models remains challenging due to the product term of two Gaussian random vectors. In this paper, we propose new algorithms for such models based on variational Gaussian (VG) inference. We make two contributions. First, we show that the VG lower bound for these models, previously known to be intractable, is available in closed form under certain non-trivial constraints on the form of the posterior. Second, we show that the lower bound is biconcave and can be efficiently optimized for mean-field approximations. We also show that bi-concavity generalizes to the larger family of log-concave likelihoods, that subsume the Poisson distribution. We present new inference algorithms based on these results and demonstrate better performance on real-world problems at the cost of a modest increase in computation. Our contributions in this paper, therefore, provide more choices for Bayesian inference in terms of a speed-vs-accuracy tradeoff.

2014Mohammad Emtiyaz Khan, Young Jun Ko, Matthias Seeger

Learning about users’ utilities from preference, discrete choice or implicit feedback data is of integral importance in e-commerce, targeted advertising and web search. Due to the sparsity and diffuse nature of data, Bayesian approaches hold much promise, yet most prior work does not scale up to realistic data sizes. We shed light on why inference for such settings is computationally difficult for standard machine learning methods, most of which focus on predicting explicit ratings only. To simplify the difficulty, we present a novel expectation maximization algorithm, driven by expectation propagation approximate inference, which scales to very large datasets without requiring strong factorization assumptions. Our utility model uses both latent bilinear collaborative filtering and non-parametric Gaussian process (GP) regression. In experiments on large real-world datasets, our method gives substantially better results than either matrix factorization or GPs in isolation, and converges significantly faster.

2014We are living in the era of "Big Data", an era characterized by a voluminous amount of available data. Such amount is mainly due to the continuing advances in the computational capabilities for capturing, storing, transmitting and processing data. However, it is not always the volume of data that matters, but rather the "relevant" information that resides in it.
Exactly 70 years ago, Claude Shannon, the father of information theory, was able to quantify the amount of information in a communication scenario based on a probabilistic model of the data. It turns out that Shannon's theory can be adapted to various probability-based information processing fields, ranging from coding theory to machine learning. The computation of some information theoretic quantities, such as the mutual information, can help in setting fundamental limits and devising more efficient algorithms for many inference problems.
This thesis deals with two different, yet intimately related, inference problems in the fields of coding theory and machine learning. We use Bayesian probabilistic formulations for both problems, and we analyse them in the asymptotic high-dimensional regime. The goal of our analysis is to assess the algorithmic performance on the first hand and to predict the Bayes-optimal performance on the second hand, using an information theoretic approach. To this end, we employ powerful analytical tools from statistical physics.
The first problem is a recent forward-error-correction code called sparse superposition code. We consider the extension of such code to a large class of noisy channels by exploiting the similarity with the compressed sensing paradigm. Moreover, we show the amenability of sparse superposition codes to perform
joint distribution matching and channel coding.
In the second problem, we study symmetric rank-one matrix factorization, a prominent model in machine learning and statistics with many applications ranging from community detection to sparse principal component analysis. We provide an explicit expression for the normalized mutual information and the minimum mean-square error of this model in the asymptotic limit. This allows us to prove the optimality of a certain iterative algorithm on a large set of parameters.
A common feature of the two problems stems from the fact that both of them are represented on dense graphical models. Hence, similar message-passing algorithms and analysis tools can be adopted. Furthermore, spatial coupling, a new technique introduced in the context of low-density parity-check (LDPC) codes, can be applied to both problems. Spatial coupling is used in this thesis as a "construction technique" to boost the algorithmic performance and as a "proof technique" to compute some information theoretic quantities.
Moreover, both of our problems retain close connections with spin glass models studied in statistical mechanics of disordered systems. This allows us to use sophisticated techniques developed in statistical physics. In this thesis, we use the potential function predicted by the replica method in order to prove the threshold saturation phenomenon associated with spatially coupled models. Moreover, one of the main contributions of this thesis is proving that the predictions given by the "heuristic" replica method are exact. Hence, our results could be of great interest for the statistical physics community as well, as they help to set a rigorous mathematical foundation of the replica predictions.