Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 35.0, was released in November 2021 and contains 19,632 families. The general purpose of the Pfam database is to provide a complete and accurate classification of protein families and domains. Originally, the rationale behind creating the database was to have a semi-automated method of curating information on known protein families to improve the efficiency of annotating genomes. The Pfam classification of protein families has been widely adopted by biologists because of its wide coverage of proteins and sensible naming conventions. It is used by experimental biologists researching specific proteins, by structural biologists to identify new targets for structure determination, by computational biologists to organise sequences and by evolutionary biologists tracing the origins of proteins. Early genome projects, such as human and fly used Pfam extensively for functional annotation of genomic data. The Pfam website allows users to submit protein or DNA sequences to search for matches to families in the database. If DNA is submitted, a six-frame translation is performed, then each frame is searched. Rather than performing a typical BLAST search, Pfam uses profile hidden Markov models, which give greater weight to matches at conserved sites, allowing better remote homology detection, making them more suitable for annotating genomes of organisms with no well-annotated close relatives. Pfam has also been used in the creation of other resources such as iPfam, which catalogs domain-domain interactions within and between proteins, based on information in structure databases and mapping of Pfam domains onto these structures. For each family in Pfam one can: View a description of the family Look at multiple alignments View protein domain architectures Examine species distribution Follow links to other databases View known protein structures Entries can be of several types: family, domain, repeat or motif.
Pascal Frossard, Mireille El Gheche, Hermina Petric Maretic, Giovanni Chierchia
Mathieu Salzmann, Frédéric Kaplan, Delphine Ribes Lemay, Nicolas Henchoz, Valentine Bernasconi
Giovanna Ambrosini, Philipp Bucher, Romain Fernand Pietro Groux