Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Aims: Understanding the correlations between underlying medical and personal characteristics of a patient with cancer and the risk of lung metastasis may improve clinical management and outcomes. We used machine learning methodologies to predict the risk of lung metastasis using readily available predictors. Materials and methods: We retrospectively analysed a cohort of 11 164 oncological patients, with clinical records gathered between 2000 and 2020. The input data consisted of 94 parameters, including age, body mass index (BMI), sex, social history, 81 primary cancer types, underlying lung disease and diabetes mellitus. The strongest underlying predictors were discovered with the analysis of the highest performing method among four distinct machine learning methods. Results: Lung metastasis was present in 958 of 11 164 oncological patients. The median age and BMI of the study population were 63 (±19) and 25.12 (±5.66), respectively. The random forest method had the most robust performance among the machine learning methods. Feature importance analysis revealed high BMI as the strongest predictor. Advanced age, smoking, male gender, alcohol dependence, chronic obstructive pulmonary disease and diabetes were also strongly associated with lung metastasis. Among primary cancers, melanoma and renal cancer had the strongest correlation. Conclusions: Using a machine learning-based approach, we revealed new correlations between personal and medical characteristics of patients with cancer and lung metastasis. This study highlights the previously unknown impact of predictors such as obesity, advanced age and underlying lung disease on the occurrence of lung metastasis. This prediction model can assist physicians with preventive risk factor control and treatment strategies.
,
Lijing Xin, Yan Li, Yubo Zhao, Yan Lin, Wei Ye