Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Despite the high number of investments for data-based models in the expansion of Industry 4.0, too little effort has been made to ensure the maintenance of those models. In a data-streaming environment, data-based models are subject to concept drifts. A concept drift is a change in data distribution which will, at some point, decrease the accuracy of the model. To address this problem, various frameworks are presented in the literature, but there is no optimal methodology for implementing them. This paper presents a methodology to implement a problem-oriented complete solution to ensure the maintenance of an industrial data-based model. The final drift-handling solution is composed of a sampling decision system and an update system. The methodology begins with a concept-drift identification phase. Solutions are then pre-selected based on the identified concept drifts. Next, an optimization problem is designed to select the solution that optimizes the costs and respects the constraints. To better link the concept drift characteristics and the drift-handling solutions, a causal concept-drift classification system is proposed. The industrial implementation of such a solution is discussed and several questions are raised. This paper presents an original and detailed methodology that shows encouraging results to address the model-maintenance challenge; however, concept drift identification, and links between concept-drift charac-teristics and drift detection, require further research.
Pascal Frossard, Roberto Gerson De Albuquerque Azevedo, Chaofan He