**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# CryoGAN: A New Reconstruction Paradigm for Single-Particle Cryo-EM Via Deep Adversarial Learning

Laurène Donati, Harshit Gupta, Michael Thompson McCann, Michaël Unser

*IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, *2021

Article

Article

Résumé

We present CryoGAN, a new paradigm for single-particle cryo-electron microscopy (cryo-EM) reconstruction based on unsupervised deep adversarial learning. In single-particle cryo-EM, the structure of a biomolecule needs to be reconstructed from a large set of noisy tomographic projections with unknown orientations. Current reconstruction techniques are based on a marginalized maximum-likelihood formulation that requires calculations over the set of all possible poses for each projection image, a computationally demanding procedure. Our approach is to seek a 3D structure that has simulated projections that match the real data in a distributional sense, thereby sidestepping pose estimation or marginalization. We prove that, in an idealized mathematical model of cryo-EM, this approach results in recovery of the correct structure. Motivated by distribution matching, we propose CryoGAN, a specialized GAN that consists of a 3D structure, a cryo-EM physics simulator, and a discriminator neural network. During reconstruction, the 3D structure is optimized so that its projections obtained through the simulator resemble real data (to the discriminator). Simultaneously, the discriminator is trained to distinguish real projections from simulated projections. CryoGAN takes as input only real projection images and the distribution of the cryo-EM imaging parameters. It involves neither prior training nor an initial estimation of the 3D structure. CryoGAN currently achieves a 10.8 angstrom resolution on a realistic synthetic dataset. Preliminary results on experimental beta-galactosidase and 80S ribosome data demonstrate the ability of CryoGAN to exploit data statistics under standard experimental imaging conditions. We believe that this paradigm opens the door to a family of novel likelihood-free algorithms for cryo-EM reconstruction.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Publications associées (4)

Chargement

Chargement

Chargement

Concepts associés (18)

Tomographic reconstruction

Tomographic reconstruction is a type of multidimensional inverse problem where the challenge is to yield an estimate of a specific system from a finite number of projections. The mathematical basis f

Structure

A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buil

Statistique

La statistique est la discipline qui étudie des phénomènes à travers la collecte de données, leur traitement, leur analyse, l'interprétation des résultats et leur présentation afin de rendre ces don

The topic of this thesis is the development of new reconstruction methods for cryo-electron microscopy (cryo-EM). Cryo-EM has revolutionized the field of structural biology over the last decade and now permits the regular discovery of biostructures. Yet, the technical challenges associated to cryo-EM are still numerous, and the measurements remain notoriously difficult to process. This calls for fast and robust algorithms that can reliably handle the challenging reconstruction task at hand.
In this thesis, we investigated two reconstruction paradigms: model-based and data-driven. Model-based methods formulate the reconstruction task as an inverse problem and rely on a faithful model of the acquisition physics. By contrast, the central philosophy of data-driven approaches is to let the reconstruction algorithm be guided by the measured data through some learning procedure. Both paradigms share a tight link in all our works: their reliance on a rigorous mathematical formulation of the cryo-EM imaging model.
The first cryo-EM method we considered is scanning transmission electron tomography (STET), a modality whose primary concern is to reduce the electron dosage required for accurate imaging. To handle this, we developed a tailored acquisition-reconstruction STET framework that relies on the principles of compressed sensing. This scheme permits high-quality reconstruction from a reduced number of measurements, hence greatly preserving the sample.
We then designed several reconstruction algorithms for single-particle analysis (SPA), a popular cryo-EM method that enables the determination of structures at near-atomic resolution. A key challenge for the deployment of robust, iterative reconstruction methods in SPA is that they usually come with a prohibitive computational cost if not carefully engineered. To circumvent this problem, we developed a regularized reconstruction scheme whose cost-dominant operation is recast as a discrete convolution, which makes the use of our robust scheme feasible in SPA. Building on this development, we devised a joint optimization framework that efficiently alternates between the reconstruction and the estimation of the unknown orientations.
We then explored a learning-based method to estimate the unknown orientations in SPA directly from the acquired dataset of projections. Capitalizing on our ability to model the cryo-EM procedure, we generated large synthetic SPA datasets to train a function---parametrized as a neural network---to predict the relative orientation between two projections based on their similarity. The framework relies on the postulate that it is possible to recover, from these estimated orientation distances, the orientations themselves through an appropriate minimization scheme, as supported by preliminary tests.
Finally, we developed a completely new paradigm for SPA reconstruction that leverages the remarkable capability of deep neural networks to capture data distribution. The proposed algorithm uses a generative adversarial network to learn the 3D structure that has simulated projections that most closely match the real data in a distributional sense. By doing so, it can resolve a 3D structure in a single algorithmic run using only the dataset of projections and CTF estimations as inputs. Hence, it bypasses many processing steps that are necessary in the usual cryo-EM reconstruction pipeline, which opens new perspectives for reconstruction in SPA.

Detection of curvilinear structures has long been of interest due to its wide range of applications. Large amounts of imaging data could be readily used in many fields, but it is practically not possible to analyze them manually. Hence, the need for automated delineation approaches. In the recent years Computer Vision witnessed a paradigm shift from mathematical modelling to data-driven methods based on Machine Learning. This led to improvements in performance and robustness of the detection algorithms. Nonetheless, most Machine Learning methods are general-purpose and they do not exploit the specificity of the delineation problem. In this thesis, we present learning methods suited for this task and we apply them to various kinds of microscopic and natural images, proving the general applicability of the presented solutions.
First, we introduce a topology loss - a new training loss term, which captures higher-level features of curvilinear networks such as smoothness, connectivity and continuity. This is in contrast to most Deep Learning segmentation methods that do not take into account the geometry of the resulting prediction. In order to compute the new loss term, we extract topology features of prediction and ground-truth using a pre-trained network, whose filters are activated by structures at different scales and orientations. We show that this approach yields better results in terms of conventional segmentation metrics and overall topology of the resulting delineation.
Although segmentation of curvilinear structures provides useful information, it is not always sufficient. In many cases, such as neuroscience and cartography, it is crucial to estimate the network connectivity. In order to find the graph representation of the structure depicted in the image, we propose an approach for joint segmentation and connection classification. Apart from pixel probabilities, this approach also returns the likelihood of a proposed path being a part of the reconstructed network. We show that segmentation and path classification are closely related tasks and can benefit from the synergy.
The aforementioned methods rely on Machine Learning, which requires significant amounts of annotated ground-truth data to train models. The labelling process often requires expertise, it is costly and tiresome. To alleviate this problem, we introduce an Active Learning method that significantly decreases the time spent on annotating images. It queries the annotator only about the most informative examples, in this case the hypothetical paths belonging to the structure of interest. Contrary to conventional Active Learning methods, our approach exploits local consistency of linear paths to pick the ones that stand out from their neighborhood.
Our final contribution is a method suited for both Active Learning and proofreading the result, which often requires more time than the automated delineation itself. It investigates edges of the delineation graph and determines the ones that are especially significant for the global reconstruction by perturbing their weights. Our Active Learning and proofreading strategies are combined with a new efficient formulation of an optimal subgraph computation and reduce the annotation effort by up to 80%.

In this thesis, we propose new algorithms to solve inverse problems in the context of biomedical images. Due to ill-posedness, solving these problems require some prior knowledge of the statistics of the underlying images. The traditional algorithms, in the field, assume prior knowledge related to smoothness or sparsity of these images. Recently, they have been outperformed by the second generation algorithms which harness the power of neural networks to learn required statistics from training data. Even more recently, last generation deep-learning-based methods have emerged which require neither training nor training data. This thesis devises algorithms which progress through these generations. It extends these generations to novel formulations and applications while bringing more robustness. In parallel, it also progresses in terms of complexity, from proposing algorithms for problems with 1D data and an exact known forward model to the ones with 4D data and an unknown parametric forward model. We introduce five main contributions. The last three of them propose deep-learning-based latest-generation algorithms that require no prior training. 1) We develop algorithms to solve the continuous-domain formulation of inverse problems with both classical Tikhonov and total-variation regularizations. We formalize the problems, characterize the solution set, and devise numerical approaches to find the solutions. 2) We propose an algorithm that improves upon end-to-end neural-network-based second generation algorithms. In our method, a neural network is first trained as a projector on a training set, and is then plugged in as a projector inside the projected gradient descent (PGD). Since the problem is nonconvex, we relax the PGD to ensure convergence to a local minimum under some constraints. This method outperforms all the previous generation algorithms for Computed Tomography (CT). 3) We develop a novel time-dependent deep-image-prior algorithm for modalities that involve a temporal sequence of images. We parameterize them as the output of an untrained neural network fed with a sequence of latent variables. To impose temporal directionality, the latent variables are assumed to lie on a 1D manifold. The network is then tuned to minimize the data fidelity. We obtain state-of-the-art results in dynamic magnetic resonance imaging (MRI) and even recover intra-frame images. 4) We propose a novel reconstruction paradigm for cryo-electron-microscopy (CryoEM) called CryoGAN. Motivated by generative adversarial networks (GANs), we reconstruct a biomolecule's 3D structure such that its CryoEM measurements resemble the acquired data in a distributional sense. The algorithm is pose-or-likelihood-estimation-free, needs no ab initio, and is proven to have a theoretical guarantee of recovery of the true structure. 5) We extend CryoGAN to reconstruct continuously varying conformations of a structure from heterogeneous data. We parameterize the conformations as the output of a neural network fed with latent variables on a low-dimensional manifold. The method is shown to recover continuous protein conformations and their energy landscape.