Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The current thesis constitutes an interdisciplinary approach of detecting a selection pressure driven by the environment examining the contribution of Remote Sensing and Spatial Analysis in the field of Landscape Genetics. Even though several studies have been attempting to link genetic and environmental information so as to discover the genes that are being shaped by natural selection because of various interacted environmental factors, aspiring remote sensing derived parameters may have not been extensively exploited. This project aims to fill a part of this gap by analysing whether Remote Sensing data would provoke the emergence of significant gene-environment associations. A heterogeneous set of quantitative and qualitative data from a wide variety of sources with different data structures was collected and tested for potential associations between allelic frequencies at marker loci and environmental parameters in order to identify signatures of natural selection within genomes of North American grey wolves (Canis lupus). Emphasis was set to the inquiry of Normalized Difference Vegetation Index (NDVI) as novel candidate predictor in the evolutionary divergence of the sampled populations. The dataset that has been eventually analysed, consisted of genetic samples by microsatellites, and of two types environmental data, climatic and remote sensed (NDVI, altitude) that have been collected as monthly variables – when available – in order to scan for possible effect of seasonality on genetic data. The procession has been elaborated by Spatial Analysis Method (SAM) on 22 environmental and 523 genetic parameters. SAM requires georeferenced genetic data of the study population so as to retrieve information to characterize the sampling location and to correlate genetic parameters to one or more environmental parameters. The research is summarized in three phases. The first phase requires the desired information to be derived by the corresponding data using a Geographic Information System, so as to proceed to the second stage, which is the encoding of the acquired data and the compilation of a combination matrix with the values of the environmental parameters and the binomial information of the genetic ones. The third, and final, part included the implementation of multiple univariate logistic regressions and the computation of the association degrees between the parameters, in order to establish hypotheses about the possible force that each parameter in question could form. Comparing the two groups of environmental parameters, derived from remote sensing data and climatic data, it is concluded that climatic variables are exerting a selection pressure that could lead to genetic diversity, in contrast to vegetation index and altitude that ceased to be involved in significant associations from the first two lowest confidence levels. Vegetation index tends to shape a reduced selective power for the study area and population in question, although this is not an overall conclusion and the results denote that future researchers could arrive to an outcome that would potentially be more unambiguous by using a dataset of higher resolution and varied content. An explanation that this index is restrained from consisting a powerful candidate for natural selection lies within the computation of the NDVI values proved to be sensitive to a number of perturbing factors including clouds and cloud shadows that due to the prevailing climatic conditions of the study area are not scarce. Furthermore, the missing values of initial genetic dataset prevented the effectuation of G test, so potentially with a complete dataset and additional alleles, a greater amount and range of environmental parameters, NDVI included, would have been unveiled to be under natural selection. From the aspect of genetic data, spatial distribution of alleles should be further analysed for the acquisition of information concerning their local effects and potential emergence of spatial patterns that could unveil an environmental oriented link. Concluding, this thesis has been elaborated under a geographical information point of view, although a biologically-oriented interpretation-analysis will be realised in the context of a future publication together with specialized molecular biologist.
Andrea Rinaldo, Cristiano Trevisin, Lorenzo Mari, Marino Gatto
Giovanna Ambrosini, Nicolas Jean Philippe Guex, Christian Iseli