Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Productivity, quality, safety, and environmental concerns have driven major advancements in the development of process analyzers. Analyzers generate measurement data that are useful for characterizing product and process attributes (key variables), thereby benefiting the drive towards automatic control and optimization. However, these objectives may be severely compromised when key variables are determined at low sampling rates through off-line analysis. It is sometimes possible to relate more easily available secondary measurements (predictors) to key variables (predictands) using data-driven soft sensors or calibration models. These models can then be used to deliver information about key variables at a higher sampling rate and/or at lower financial burden. This work studies multivariate calibration for spectroscopic measurements (such as near-infrared, mid-infrared, ultra-violet, Raman spectra, or nuclear magnetic resonance) that are linked to concentrations of one or more analytes using an inverse regression model based on principal component regression (PCR) or partial least-squares regression (PLSR). Spectroscopic measurements are typically corrupted with both random zero-mean measurement errors (noise) and systematic variations (drift) caused by instrumental, operational and process changes. The prediction error can be decomposed into the error due to noise in the calibration data and bias resulting from truncation in PCR/PLSR, and the error due to drift and noise in the prediction data. To correct for these errors, this work proposes three subspace correction methods that use new information in addition to calibration data. Firstly, latent subspace correction using unlabeled data (secondary measurements for which the key variables are unknown) helps reduce the error due to noise in the calibration data and truncation. Secondly, drift subspace correction is achieved following a two-step procedure. In the first step, the drift subspace is estimated using slave data with drift and master data with no drift. In the second step, the original calibration data are corrected for the estimated drift subspace using shrinkage or orthogonal projection. The third subspace correction method involves data reconciliation, which is the procedure of adjusting predicted key variables to obtain estimates that are consistent with balance equations. The various methodologies are illustrated using both simulated and experimental data.
Nikolaos Geroliminis, Semin Kwak
Nicola Marzari, Lorenzo Bastonero
Nikolaos Geroliminis, Emmanouil Barmpounakis