Publication

Self-Supervised Learning for Patient Stratification and Survival Analysis in Computational Pathology: An Application to Colorectal Cancer

Christian Robert Abbet
2023
Thèse EPFL
Résumé

Over the years, clinical institutes accumulated large amounts of digital slides from resected tissue specimens. These digital images, called whole slide images (WSIs), are high-resolution tissue snapshots that depict the complex interaction of cells at the microscopic level. WSIs are critical to pathologists as they are used to identify disease status and target appropriate patient treatments. However, the abundance of WSIs comes with one main drawback, which is the absence or scarcity of annotations. The accessibility to labeled data is usually limited to critical information such as the patient's clinical reports. The reason is that generating additional annotations is tedious and time-expensive for pathologists and, hence, should be avoided. Unfortunately, traditional supervised machine learning relies on fully labeled data to be trained, which is unavailable in this context. As a result, a significant part of the data ends up being discarded.Out of the various approaches developed to tackle the inherent problem of label scarcity, self-supervised learning (SSL) appears as a viable solution. SSL is based on the supervision of data itself. In other words, it uses data structure as a pretext task to learn feature representations. As a result, self-supervised approaches can take advantage of the largely available clinical cohorts to train robust tissue descriptors without prior knowledge of data labels. SSL models are mainly used as initialization for downstream tasks such as classification, segmentation, or survival analysis. Downstream tasks that are initialized with per-trained models generally require few labeled data to be trained, thus reducing the impact of label sparsity.Unfortunately, learning tissue representation from pathological data itself is challenging. WSIs include various structural and visual biases that can hinder the performance of our per-trained models. For example, data acquired from different institutes might show visual differences in terms of staining intensity. This discrepancy appears as a strong domain shift in the learned feature space, which makes per-trained models less efficient for inter-clinical applications. Another critical aspect is the inherent data complexity and heterogeneity, which is not reflected in publicly available cohorts. These are often composed of curated data that represent homogeneous tissue structures. This asymmetry can harm the quality of tissue segmentation in downstream tasks as well as clinical metrics assessment.In this thesis, we address the mentioned issues on computation pathology and label availability. We propose novel approaches that take advantage of SSL to learn and build complex tissue descriptors while avoiding access to labeled data. More specifically, we first present a simple way to benefit from WSIs staining information to learn robust feature spaces using SSL. Secondly, we tackle the problem of domain shift and data heterogeneity by allowing the use of multi-source data to strengthen the quality of feature representation. Next, we investigate the limitations of SSL when applied to tissue segmentation and propose an alternative based on coarsely-annotated data. Finally, we conclude this work by building clinically-relevant metrics based on our previously designed architectures. By doing so, we aim to demonstrate the applicability of our research by creating a bridge between theory and practice.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Concepts associés (39)
Apprentissage de représentations
En apprentissage automatique, l'apprentissage des caractéristiques ou apprentissage des représentations est un ensemble de techniques qui permet à un système de découvrir automatiquement les représentations nécessaires à la détection ou à la classification des caractéristiques à partir de données brutes. Cela remplace l'ingénierie manuelle des fonctionnalités et permet à une machine d'apprendre les fonctionnalités et de les utiliser pour effectuer une tâche spécifique.
Apprentissage auto-supervisé
L'apprentissage auto-supervisé ("self-supervised learning" en anglais) (SSL) est une méthode d'apprentissage automatique. Il apprend à partir d'échantillons de données non étiquetés. Il peut être considéré comme une forme intermédiaire entre l'apprentissage supervisé et non supervisé. Il est basé sur un réseau de neurones artificiels. Le réseau de neurones apprend en deux étapes. Tout d'abord, la tâche est résolue sur la base de pseudo-étiquettes qui aident à initialiser les poids du réseau.
Weak supervision
Weak supervision, also called semi-supervised learning, is a paradigm in machine learning, the relevance and notability of which increased with the advent of large language models due to large amount of data required to train them. It is characterized by using a combination of a small amount of human-labeled data (exclusively used in more expensive and time-consuming supervised learning paradigm), followed by a large amount of unlabeled data (used exclusively in unsupervised learning paradigm).
Afficher plus
Publications associées (74)

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation

Arnout Jan J Devos

Machine learning (ML) enables artificial intelligent (AI) agents to learn autonomously from data obtained from their environment to perform tasks. Modern ML systems have proven to be extremely effective, reaching or even exceeding human intelligence.Althou ...
EPFL2024

Robust machine learning for neuroscientific inference

Steffen Schneider

Modern neuroscience research is generating increasingly large datasets, from recording thousands of neurons over long timescales to behavioral recordings of animals spanning weeks, months, or even years. Despite a great variety in recording setups and expe ...
EPFL2024

Learning Informative Health Indicators Through Unsupervised Contrastive Learning

Olga Fink

Monitoring the health of complex industrial assets is crucial for safe and efficient operations. Health indicators that provide quantitative real-time insights into the health status of industrial assets over time serve as valuable tools for, e.g., fault d ...
Ieee-Inst Electrical Electronics Engineers Inc2024
Afficher plus
MOOCs associés (8)
Neuronal Dynamics - Computational Neuroscience of Single Neurons
The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.
Neuronal Dynamics - Computational Neuroscience of Single Neurons
The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.
Afficher plus

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.