Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Water quality prediction in the spatially heterogeneous environment is challenging as the importance of water quality parameters (WQPs) and the performance of prediction models may vary across space. Thus, this study proposed spatially adaptive machine learning models to predict water quality status in Hong Kong. First, spatial clusters with relatively homogeneous water quality were adaptively detected using dynamically constrained agglomerative clustering and partitioning. Then, the optimal prediction models were constructed for each cluster by locally evaluating the prediction performance of six standalone machine learning models, including multi-layer perceptron neural network (MLPNN), support vector machine (SVM), random forest (RF), extremely ran-domized tree (ET), eXtreme gradient boosting (XGBoost) and categorical gradient boosting (CatBoost), as well as four novel hybrid models (MLPNN-SVM, ET-CatBoost, MLPNN-CatBoost and XGBoost-CatBoost). Finally, a sensitivity analysis was conducted to explore the minimum sets of indicative WQPs to achieve more cost-efficient water quality prediction based on locally optimal prediction models. The results revealed that the water quality in the study area was spatially heterogeneous and four spatially contiguous clusters were identified. MLPNN-SVM, ET-CatBoost, MLPNN-CatBoost and CaBboost performed best in Cluster 1 to Cluster 4, with R2 values of 0.917, 0.906, 0.901 and 0.937 and RMSE values of 1.978, 0.843, 2.020 and 1.572, respectively. The results of the sensitivity analysis indicated that acceptable local prediction results can be obtained using fewer WQPs. It is conducive to issuing timely water quality warnings and striving for more time for water pollution remediation.
,