Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Cancer is the second leading cause of death worldwide. Cancer develops through multiple hallmark functions including apoptosis evasion, unlimited replicative potential, metastasis, and immune avoidance. Over the past few decades, researchers have reported a substantial amount of information about the role of gene expression dysregulation in cancer, whereas transposable elements (TEs) have been overlooked despite the fact they constitute nearly half of the human genome. TEs are DNA repetitive sequences that spread across the genome of living organisms throughout the evolution of species, contributing to genomic diversity and shaping the epigenomic landscape. The vast majority of TEs appear to be transcriptionally silent in adult tissues, thus avoiding deleterious mutational insertions and recombination events in somatic tissues. Due to the global epigenetic dysregulation that occurs during tumorigenesis, some TEs lose repressive marks and become transcriptionally active in cancer cells. Consequently, TE insertions into oncogenes and tumor suppressor genes may occur, thereby contributing to cancer development and disease progression. It is also widely accepted that some TEs harbor transcription factor recognition sites and have regulatory functions on gene expression. Some transcriptionally active TEs can give rise to alternative TE-derived chimeric gene products. Given these facts, integrative transcriptome analysis of TEs and genes may contribute to a better understanding of complex traits of cancer and help to better classify cancer subtypes with clinical implications for patients.To gain insights into the underlying mechanisms of cancer pathogenesis, we constructed a co-expression network based on the similarity in expression levels of TEs and genes. We focused on colorectal cancer (CRC), a heterogeneous disease with different genetic and molecular backgrounds, contributing to patient outcomes and response to therapy. We investigated the coordinated activity of TEs and genes in view of recurrent CRC chromosomal aberrations, genome organization, and functional significance. We further examined co-expression changes in network structure under the contrasting conditions of cancerous versus matched healthy colic mucosal tissues. We found that the cancer network was associated with a dramatic decrease in the number of gene interactions as opposed to an increase in TE interactions. Physical chromosomal distance affects the degree of co-expression, where the proximal distance effect of TE is contrasted with the distant effect of the gene. Our study sheds light on the complex interplay between TEs and genes and how they coordinately shape cancer and normal co-expression networks. Furthermore, we integrated gene and TE expression to develop a new consensus molecular subtype (CMS) classifier derived from multiple layers of TE and gene signatures. We demonstrated that our new approach led to better identification of CRC patients of the former CMS3 subtype, the most heterogeneous subtype with mixed biological characteristics, and a global reduction of the expected misclassification of other CMS subtypes. We further optimized our classifier for potential clinical use and obtained a set of 50 genes and 20 TE integrants with the best ability to identify CMS subtypes. Our integrative model could be a major add on for the development of future clinical trials by improving patient stratification over current gene-restricted molecular classifications.
Didier Trono, Evaristo Jose Planet Letschert, Nikolaos Lykoskoufis
Didier Trono, Evaristo Jose Planet Letschert, Wayo Matsushima