Publication

Development of an interactive genome browser to visualize and analyse large scale genomic data

Lucas Sinclair
2010
Student project
Abstract

Genomic bioinformatics is a growing and developing field. Indeed, data analysis is becoming an integrative and essential part of any quantitative biological experiment as the technologies evolve and the wet lab methods used generate larger and larger quantities of data. Yet few standards have emerged and a plethora of analytical tools exist, none of which are established as a standard. The difficulties arise early on, even before processing any genomic data, as one first needs to visualize it. Several visualization methods exist, such as the UCSC genome browser, IGB or Argo, but none offer a satisfying interface or set of tools. Stemming from a pre-existing project at the bioinformatics and biostatistics core facility, this study presents a new solution to the multiple difficulties that at present beleaguer the field. A novel genome visualization tool is proposed where the user interface remains simple and incorporates a set of common statistical analysis functions. The software produced, entitled gFeatMiner, is capable of processing large scale genomic datasets for computing descriptive statistics and manipulate them in several ways. The program makes use of modern technologies and infrastructure paving the way for its development into an advanced data mining tool. In the second part of this study, a practical application is worked out. Examining the genes coding for ribosomal proteins in the model organism yeast (Saccharomyces cerevisiae) and using several available sets of data including multiple transcription factor binding profiles in vivo and in vitro, RNA polymerase activity and nucleosome enrichment, we attempt to better understand and reveal cellular mechanisms by clustering the numerous genes together using different criteria and machine learning strategies

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (39)
Data analysis
Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.
Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, and ultimately affect a phenotype. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) and small nuclear RNA (snRNA), the product is a functional non-coding RNA.
Big data
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many entries (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe big data is the one associated with a large body of information that we could not comprehend when used only in smaller amounts.
Show more
Related publications (80)

Data and scripts for the RaFSIP scheme

Athanasios Nenes, Paraskevi Georgakaki

This repository contains microphysics routines, scripts, and processed data from the Weather Research and Forecasting (WRF) model simulations presented in the paper "RaFSIP: Parameterizing ice multiplication in models using a machine learning approach", by ...
Zenodo2024

Robust machine learning for neuroscientific inference

Steffen Schneider

Modern neuroscience research is generating increasingly large datasets, from recording thousands of neurons over long timescales to behavioral recordings of animals spanning weeks, months, or even years. Despite a great variety in recording setups and expe ...
EPFL2024

Validating functional redundancy with mixed generative adversarial networks

Thanh Trung Huynh, Quoc Viet Hung Nguyen, Thành Tâm Nguyên, Trung-Dung Hoang

Data redundancy has been one of the most important problems in data-intensive applications such as data mining and machine learning. Removing data redundancy brings many benefits in efficient data updating, effective data storage, and error-free query proc ...
ELSEVIER2023
Show more
Related MOOCs (28)
Neuroscience Reconstructed: Cell Biology
This course will provide the fundamental knowledge in neuroscience required to understand how the brain is organised and how function at multiple scales is integrated to give rise to cognition and beh
Neuroscience Reconstructed: Cell Biology
This course will provide the fundamental knowledge in neuroscience required to understand how the brain is organised and how function at multiple scales is integrated to give rise to cognition and beh
Neuroscience Reconstructed: Genetics and Brain Development
This course will provide the fundamental knowledge in neuroscience required to understand how the brain is organised and how function at multiple scales is integrated to give rise to cognition and beh
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.