Parallel and Scalable Precise Clustering

James Richard Larus, Stuart Anthony Byma
2020

Abstract

This paper describes a new technique for parallelizing protein clustering, an important bioinformatics computation for the analysis of protein sequences. Protein clustering identifies groups of proteins that are similar because they share long sequences of similar amino acids. Given a collection of protein sequences, clustering can significantly reduce the computational effort required to identify all similar sequences by avoiding many negative comparisons. The challenge, however, is to build a clustering that misses as few similar sequences (or elements, more generally) as possible.

Official source

https://infoscience.epfl.ch/entities/publication/308bb5ed-461e-4f59-9830-f7c32128c61e

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.