Publication

Subsampling as an economic consequence of using whole genome sequence data in landscape genomics: how to maximize environmental information from a reduced number of locations?

Abstract

The recent availability of whole genome sequence (WGS) data implies to reconsider sampling strategies in landscape genomics for economic reasons. Indeed, while we had many individuals and few genetic markers ten years ago, we now face the contrary with high costs of WGS limiting the number of sequenced samples. In others words, molecular resolution is becoming excellent but it is achieved at the expense of spatial representativeness and statistic robustness. Therefore, when starting from a standard sampling, it is necessary to apply sub-sampling strategies in order to keep most of the environmental information. To study local adaptation of goats and sheep’s breeds in Morocco, we used a sampling design based on a regular grid overlaid on the territory. In each cell of this grid, 3 individuals were sampled in 3 different farms. Then, the final subset destined to sequencing had to meet two criteria in order to ensure a regular cover of both environmental and physical spaces. The first was met by using stratified sampling techniques over a range of climatic variables, previously filtered using a PCA. The second was by minimising a clustering index in order to ensure spatial spread. The sub-sampling procedure using a hierarchical clustering resulted in two datasets of 162 goats selected over 1283, and 162 sheep over 1412 based on variables such as temperature, pluviometry and solar radiation. By maximising the environmental information collected, we were able to select individuals that are the most relevant to study adaptation.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (33)
Sampling (statistics)
In statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt to collect samples that are representative of the population. Sampling has lower costs and faster data collection compared to recording data from the entire population, and thus, it can provide insights in cases where it is infeasible to measure an entire population.
Survey sampling
In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey. The term "survey" may refer to many different types or techniques of observation. In survey sampling it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Different ways of contacting members of a sample once they have been selected is the subject of survey data collection.
Stratified sampling
In statistics, stratified sampling is a method of sampling from a population which can be partitioned into subpopulations. In statistical surveys, when subpopulations within an overall population vary, it could be advantageous to sample each subpopulation (stratum) independently. Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. The strata should define a partition of the population.
Show more
Related publications (48)

DiffAirfoil: An Efficient Novel Airfoil Sampler Based on Latent Space Diffusion Model for Aerodynamic Shape Optimization

Pascal Fua, Zhen Wei

Surrogate-based optimization is widely used for aerodynamic shape optimization, and its effectiveness depends on representative sampling of the design space. However, traditional sampling methods are hard-pressed to effectively sample high-dimensional desi ...
2024

Disentangle genus microdiversity within a complex microbial community by using a multi-distance long-read binning method: example of Candidatus Accumulibacter

Christof Holliger, Julien Maillard, Aline Sondra Adler, Marco Pagni, Simon Marius Jean Poirier

Complete genomes can be recovered from metagenomes by assembling and binning DNA sequences into metagenome assembled genomes (MAGs). Yet, the presence of microdiversity can hamper the assembly and binning processes, possibly yielding chimeric, highly fragm ...
WILEY2022

Sampling-Based AQP in Modern Analytical Engines

Anastasia Ailamaki, Viktor Sanca

As the data volume grows, reducing the query execution times remains an elusive goal. While approximate query processing (AQP) techniques present a principled method to trade off accuracy for faster queries in analytics, the sample creation is often consid ...
ACM2022
Show more
Related MOOCs (14)
Digital Signal Processing I
Basic signal processing concepts, Fourier analysis and filters. This module can be used as a starting point or a basic refresher in elementary DSP
Digital Signal Processing II
Adaptive signal processing, A/D and D/A. This module provides the basic tools for adaptive filtering and a solid mathematical framework for sampling and quantization
Digital Signal Processing III
Advanced topics: this module covers real-time audio processing (with examples on a hardware board), image processing and communication system design.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.