Publication

Self-Supervised Learning for Patient Stratification and Survival Analysis in Computational Pathology: An Application to Colorectal Cancer

Christian Robert Abbet
2023
EPFL thesis
Abstract

Over the years, clinical institutes accumulated large amounts of digital slides from resected tissue specimens. These digital images, called whole slide images (WSIs), are high-resolution tissue snapshots that depict the complex interaction of cells at the microscopic level. WSIs are critical to pathologists as they are used to identify disease status and target appropriate patient treatments. However, the abundance of WSIs comes with one main drawback, which is the absence or scarcity of annotations. The accessibility to labeled data is usually limited to critical information such as the patient's clinical reports. The reason is that generating additional annotations is tedious and time-expensive for pathologists and, hence, should be avoided. Unfortunately, traditional supervised machine learning relies on fully labeled data to be trained, which is unavailable in this context. As a result, a significant part of the data ends up being discarded.Out of the various approaches developed to tackle the inherent problem of label scarcity, self-supervised learning (SSL) appears as a viable solution. SSL is based on the supervision of data itself. In other words, it uses data structure as a pretext task to learn feature representations. As a result, self-supervised approaches can take advantage of the largely available clinical cohorts to train robust tissue descriptors without prior knowledge of data labels. SSL models are mainly used as initialization for downstream tasks such as classification, segmentation, or survival analysis. Downstream tasks that are initialized with per-trained models generally require few labeled data to be trained, thus reducing the impact of label sparsity.Unfortunately, learning tissue representation from pathological data itself is challenging. WSIs include various structural and visual biases that can hinder the performance of our per-trained models. For example, data acquired from different institutes might show visual differences in terms of staining intensity. This discrepancy appears as a strong domain shift in the learned feature space, which makes per-trained models less efficient for inter-clinical applications. Another critical aspect is the inherent data complexity and heterogeneity, which is not reflected in publicly available cohorts. These are often composed of curated data that represent homogeneous tissue structures. This asymmetry can harm the quality of tissue segmentation in downstream tasks as well as clinical metrics assessment.In this thesis, we address the mentioned issues on computation pathology and label availability. We propose novel approaches that take advantage of SSL to learn and build complex tissue descriptors while avoiding access to labeled data. More specifically, we first present a simple way to benefit from WSIs staining information to learn robust feature spaces using SSL. Secondly, we tackle the problem of domain shift and data heterogeneity by allowing the use of multi-source data to strengthen the quality of feature representation. Next, we investigate the limitations of SSL when applied to tissue segmentation and propose an alternative based on coarsely-annotated data. Finally, we conclude this work by building clinically-relevant metrics based on our previously designed architectures. By doing so, we aim to demonstrate the applicability of our research by creating a bridge between theory and practice.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.