**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Publication# On multivariate calibration with unlabeled data

Abstract

In principal component regression (PCR) and partial least-squares regression (PLSR), the use of unlabeled data, in addition to labeled data, helps stabilize the latent subspaces in the calibration step, typically leading to a lower prediction error. A non-sequential approach based on optimal filtering (OF) has been proposed in the literature to use unlabeled data with PLSR. In this work, a sequential version of the OF-based PLSR and a PCA-based PLSR (PLSR applied to PCA-preprocessed data) are proposed. It is shown analytically that the sequential version of the OF-based PLSR is equivalent to PCA-based PLSR, which leads to a new interpretation of OF. Simulated and experimental data sets are used to point out the usefulness and pitfalls of using unlabeled data. Unlabeled data can replace labeled data to some extent, thereby leading to an economic benefit. However, in the presence of drift, the use of unlabeled data can result in an increase in prediction error compared to that obtained with a model based on labeled data alone.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (33)

Related concepts (27)

Ontological neighbourhood

Partial least squares regression

Partial least squares regression (PLS regression) is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a linear regression model by projecting the predicted variables and the observable variables to a new space. Because both the X and Y data are projected to new spaces, the PLS family of methods are known as bilinear factor models.

Mean squared prediction error

In statistics the mean squared prediction error (MSPE), also known as mean squared error of the predictions, of a smoothing, curve fitting, or regression procedure is the expected value of the squared prediction errors (PE), the square difference between the fitted values implied by the predictive function and the values of the (unobservable) true value g. It is an inverse measure of the explanatory power of and can be used in the process of cross-validation of an estimated model.

Linear regression

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Nikolaos Geroliminis, Emmanouil Barmpounakis

The work proposes a multi-modal regional mean speed regression analysis for the city network of Athens, Greece. The dataset from pNUEMA experiment is used in the present context. Accumulations and mean speeds of different modes are estimated and compared t ...

2021A key challenge across many disciplines is to extract meaningful information from data which is often obscured by noise. These datasets are typically represented as large matrices. Given the current trend of ever-increasing data volumes, with datasets grow ...

We study the problem of estimating an unknown function from noisy data using shallow ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the squared Euclidean norm of the ...