Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics

Author summaryIn recent years, the application of deep learning represented a breakthrough in the mass spectrometry (MS) field by improving the assignment of the correct sequence of amino acids from observable MS spectra without prior knowledge, also known as de novo MS-based peptide sequencing. However, like other modern neural networks, models do not generalize well enough as they perform poorly on highly varied N- and C-termini peptide test sets. To mitigate this generalizability problem, we conducted a systematic investigation to uncover the requirements for building generalized models and boosting the performance on the MS-based de novo peptide sequencing task. Several experiments confirmed that the training set's peptide diversity directly impacts the resulting model's generalizability. Data showed that the best models were the multienzyme models (MEMs), i.e., models trained from a compendium of highly diverse peptides, such as the one generated from digesting a broad of species samples with a group of proteases. The applicability of these MEMs was later established by fully de novo sequencing 8 of the ten polypeptide chains of five commercial antibodies and extracting over 10000 proving peptides.

Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics

Graph Chatbot

Performing and Detecting Backdoor Attacks on Face Recognition Algorithms

Deep learning approach for identification of H II regions during reionization in 21-cm observations - II. Foreground contamination

Robust machine learning for neuroscientific inference

Performing and Detecting Backdoor Attacks on Face Recognition Algorithms

Deep learning approach for identification of H II regions during reionization in 21-cm observations - II. Foreground contamination

Robust machine learning for neuroscientific inference