Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
A classification methodology for the automatic detection of start- and endpoints of chemical and biotechnological reaction systems from spectral reaction data is proposed. In the calibration phase, several batch experiments must be conducted covering the expected operational variability (e.g., initial concentrations, dosage amount and time). The start/endpoint of the reaction of interest (e.g. start of a side reaction, stop of the main reaction) is determined by using Evolving Factor Analysis (EFA) and cross-checked with common process variables (temperature, mass, heat flow) or chromatographic measurements if available. A Partial Least-Squares Discriminant Analysis (PLS-DA) model is built on the calibration batches and then used for on-line detection of the start/stop of the reaction of interest based on spectral data from a new batch. The methodology is illustrated for (i) simulated spectra of a fed-batch reactor exhibiting two consecutive reactions with a limiting initial reactant for the first reaction, and (ii) measured infrared spectra of an aldol reaction exhibiting a side reaction. A PLS-DA model is built to on-line detect the stop of the main reaction. The effect of data pre-treatment methods and the choice of the number of latent variables on various classification performance indices (efficiency, false positive rate and false negative rate) are evaluated. For the simulated data, the best model is obtained with four latent variables and mean-centered data. The classification efficiency for a validation set of five experiments is 99.0% and the delay for the endpoint detection is about 1 min which corresponds to the sampling time. For the measured data, the best model is obtained with five latent variables and pre-treatment using standard normal variate. The classification efficiency for a validation set of two experiments is 95.9% and the delay for the endpoint detection is about 8 min for typical batch duration of 250 min.
Nicola Marzari, Giovanni Pizzi, Marco Gibertini
Urs von Gunten, Minju Lee, Peter Rudolf Tentscher