Concept

Pfam

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 35.0, was released in November 2021 and contains 19,632 families. The general purpose of the Pfam database is to provide a complete and accurate classification of protein families and domains. Originally, the rationale behind creating the database was to have a semi-automated method of curating information on known protein families to improve the efficiency of annotating genomes. The Pfam classification of protein families has been widely adopted by biologists because of its wide coverage of proteins and sensible naming conventions. It is used by experimental biologists researching specific proteins, by structural biologists to identify new targets for structure determination, by computational biologists to organise sequences and by evolutionary biologists tracing the origins of proteins. Early genome projects, such as human and fly used Pfam extensively for functional annotation of genomic data. The Pfam website allows users to submit protein or DNA sequences to search for matches to families in the database. If DNA is submitted, a six-frame translation is performed, then each frame is searched. Rather than performing a typical BLAST search, Pfam uses profile hidden Markov models, which give greater weight to matches at conserved sites, allowing better remote homology detection, making them more suitable for annotating genomes of organisms with no well-annotated close relatives. Pfam has also been used in the creation of other resources such as iPfam, which catalogs domain-domain interactions within and between proteins, based on information in structure databases and mapping of Pfam domains onto these structures. For each family in Pfam one can: View a description of the family Look at multiple alignments View protein domain architectures Examine species distribution Follow links to other databases View known protein structures Entries can be of several types: family, domain, repeat or motif.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related lectures (4)
Proteins: Central Dogma and Translation
Explains protein synthesis, translation, and the central dogma of molecular biology.
Protein Mass Spectrometry and Proteomics: Randomization Overview
Explores the significance of randomization in protein mass spectrometry and proteomics, highlighting its role in minimizing bias and ensuring research validity.
Protein Mass Spectrometry and Proteomics: Techniques and Applications
Explores protein mass spectrometry techniques, including labeling methods, quantitation, and biomarker discovery.
Show more
Related publications (19)

fGOT: Graph Distances Based on Filters and Optimal Transport

Pascal Frossard, Mireille El Gheche, Hermina Petric Maretic, Giovanni Chierchia

Graph comparison deals with identifying similarities and dissimilarities between graphs. A major obstacle is the unknown alignment of graphs, as well as the lack of accurate and inexpensive comparison metrics. In this work we introduce the filter graph dis ...
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE2022

Automatic Content Curation of Visual Heritage

Mathieu Salzmann, Frédéric Kaplan, Delphine Ribes Lemay, Nicolas Henchoz, Valentine Bernasconi

Digitization and preservation of large heritage induce high maintenance costs to keep up with the technical standards and ensure sustainable access. Creating impactful usage is instrumental to justify the resources for long-term preservation. The Museum fü ...
2021

Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study

Giovanna Ambrosini, Philipp Bucher, Romain Fernand Pietro Groux

BackgroundPositional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological application ...
2020
Show more
Related concepts (5)
Protein domain
In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions.
Multiple sequence alignment
Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins.
Sequence motif
In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an N-glycosylation site motif can be defined as Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue. When a sequence motif appears in the exon of a gene, it may encode the "structural motif" of a protein; that is a stereotypical element of the overall structure of the protein.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.