**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Geographic concentration of economic activities: on the validation of a distance-based mathematical index to identify optimal locations

Abstract

The present study proposes a validation of a mathematical index Q able to identify optimal geographic places for economic activities, solely based on the location variable. This research work takes its roots in the 1970s with the statistical analysis of spatial patterns, or analysis of point processes, whose main goal is to understand if a resulting spatial distribution of points is due to chance or not. Indeed point objects are commonplace (towns in regions, plants in the landscape, galaxies in space, shops in towns) and the development of specific mathematical tools are useful to understand their own location processes. Spatial point deviations from purely random configurations may be analyzed either by quadrat or by distance methods. An interesting method of the second category – the cumulative function M – was developed recently for evaluating the relative geographic concentration and co-location of industries in a nonhomogeneous spatial framework. On this basis, and having quantified retail store interactions, The French physicist Pablo Jensen elaborated the Q-index to automatically detect promising locations. To test the relevance of this quality index, Jensen used location data from 2003 and 2005 for bakeries in the city of Lyon and discovered that between these two years, shops having closed were located on significantly lower quality sites. Here, using bankruptcy data provided by the Registrar of companies of the State of Valais in Switzerland and by the City Council of Glasgow in Scotland, we implemented a method based on univariate logistic regressions to systematically test for the relevance of the Q-index on the many commercial categories available. We show that the Q-index is reliable, although significance tests did not reach stringent levels. Access to trustable bankruptcy data remains a difficult task.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related concepts (21)

Mathematics

Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These top

Statistics

Statistics (from German: Statistik, "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and present

Analysis

Analysis (: analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathema

Related publications (19)

Loading

Loading

Loading

The present study proposes a validation of a mathematical index Q able to identify optimal geographic places for economic activities, solely based on the location variable. This research work takes its roots in the 1970s with the statistical analysis of spatial patterns, or analysis of point processes, whose main goal is to understand if a resulting spatial distribution of points is due to chance or not. Indeed point objects are commonplace (towns in regions, plants in the landscape, galaxies in space, shops in towns) and the development of specific mathematical tools are useful to understand their own location processes. Spatial point deviations from purely random configurations may be analyzed either by quadrat or by distance methods. An interesting method of the second category – the cumulative function M – was developed recently for evaluating the relative geographic concentration and co-location of industries in a nonhomogeneous spatial framework. On this basis, and having quantified retail store interactions, the French physicist Pablo Jensen elaborated the Q-index to automatically detect promising locations. To test the relevance of this quality index, Jensen used location data from 2003 and 2005 for bakeries in the city of Lyon and discovered that between these two years, shops having closed were located on significantly lower quality sites. Here, using bankruptcy data provided by the Registrar of companies of the State of Valais in Switzerland and by the City Council of Glasgow in Scotland, we implemented a method based on univariate logistic regressions to systematically test for the relevance of the Q-index on the many commercial categories available. We show that the Q-index is reliable, although significance tests did not reach stringent levels. Access to trustable bankruptcy data remains a difficult task.

2012Functional time series is a temporally ordered sequence of not necessarily independent random curves. While the statistical analysis of such data has been traditionally carried out under the assumption of completely observed functional data, it may well happen that the statistician only has access to a relatively low number of sparse measurements for each random curve. These discrete measurements may be moreover irregularly scattered in each curve's domain, missing altogether for some curves, and be contaminated by measurement noise. This sparse sampling protocol escapes from the reach of established estimators in functional time series analysis and therefore requires development of a novel methodology.
The core objective of this thesis is development of a non-parametric statistical toolbox for analysis of sparsely observed functional time series data. Assuming smoothness of the latent curves, we construct a local-polynomial-smoother based estimator of the spectral density operator producing a consistent estimator of the complete second order structure of the data. Moreover, the spectral domain recovery approach allows for prediction of latent curve data at a given time by borrowing strength from the estimated dynamic correlations in the entire time series across time. Further to predicting the latent curves from their noisy point samples, the method fills in gaps in the sequence (curves nowhere sampled), denoises the data, and serves as a basis for forecasting.
A classical non-parametric apparatus for encoding the dependence between a pair of or among a multiple functional time series, whether sparsely or fully observed, is the functional lagged regression model. This consists of a linear filter between the regressors time series and the response. We show how to tailor the smoother based estimators for the estimation of the cross-spectral density operators and the cross-covariance operators and, by means of spectral truncation and Tikhonov regularisation techniques, how to estimate the lagged regression filter and predict the response process.
The simulation studies revealed the following findings: (i) if one has freedom to design a sampling scheme with a fixed number of measurements, it is advantageous to sparsely distribute these measurements in a longer time horizon rather than concentrating over a shorter time horizon to achieve dense measurements in order to diminish the spectral density estimation error, (ii) the developed functional recovery predictor surpasses the static predictor not exploiting the temporal dependence, (iii) neither of the two considered regularisation techniques can, in general, dominate the other for the estimation in functional lagged regression models. The new methodologies are illustrated by applications to real data: the meteorological data revolving around the fair-weather atmospheric electricity measured in Tashkent, Uzbekistan, and at Wank mountain, Germany; and a case study analysing the dependence of the US Treasury yield curve on macroeconomic variables.
As a secondary contribution, we present a novel simulation method for general stationary functional time series defined through their spectral properties. A simulation study shows universality of such approach and superiority of the spectral domain simulation over the temporal domain in some situations.

Anaerobic digestion of organic waste to methane gas (CH4) is an attractive method to produce renewable energy. Due to the trend towards the reuse of waste and away from fossil energy sources, this technology saw a worldwide development in the recent years. A fundamental parameter in this domain is the Biochemical Methane Potential (BMP), which defines the amount of CH4 which can be produced out of a certain organic substrate. Such information is essential in order to plan and optimise anaerobic digestion plants, and to evaluate the reactor feed with a new substrate or a new substrate combination (co-digestion). BMPs are assessed in batch test, which can be regarded as the simulation of a full-scale biogas plant. Two experiments are required, the test in which the test substrate is digested on the inoculum, and the blank, in which the inoculum is digested alone. The part of the inoculum which can be digested is called endogenous substrate. Since the test experiment produced not only on the test substrate but only on the endogenous substrate, the CH4 production on the test substrate alone is obtained by the subtraction of the test production minus the blank production. The units of a BMP are generally indicated in litre of CH4 per gram of volatile solids (VS) test substrate [L/gVS]. Despite the importance and the wide application of this parameter, its exact determination remains a challenging issue and results are often not consistent in between (inter-) and within (intra-) laboratories. Experimental protocols exist but do not lead to a satisfactory BMP tests consistency neither. The aim of this study was to identify parameters affecting the outcome of BMP tests, based on the investigation of two different data sets. An inter-laboratory study of BMP tests providing the final BMP values of 327 experiments and information about 40 related experimental parameters, and a second data set containing the complete CH4 production curves of 136 BMP experiments provided by one single laboratory. The method consisted in graphical and statistical analysis (Mixed Effect Modelling), using [R] programming language and software environment. This study found out, that a significant part of the inter-laboratory BMP inconsistency can be explained by an imprecise assessment of the VS, which was not expected. As the VS of the substrate are directly implied in the computation of the BMP, the impact was significant and therefore the BMP were corrected regarding the VS imprecision. With these data, up to 70% of the inconsistency of BMPs could be explained by inter-laboratory effects. The statistical analysis led to the conclusion that the concentration of the endogenous substrate and the moisture content in the digestate would be the principal factors affecting the outcome of BMP tests. An effort was made to identify and correct errors contained in CH4 production curves, which turned out to be delicate for certain cases. Also indications regarding the precision and the reliability of BMPs were formulated. Further, the investigation of the CH4 production curves led to the development of a new method in order to compute the BMP result. This method was based on the fact, that a certain inoculum reaches always the same slope toward the end of the experiment, no matter what was digested before. The advantage of this method would be that the experiment end-point could be set clearly and that the concentration of the endogenous substrate would not have an impact on the outcome of a BMP test. According to the findings of this report, it was proposed to add the following requirements into experimental protocols for BMP tests: Tests, blanks and the analysis of VS should be carried out in triplicates and their standard deviation should be indicated for each of them, together with the BMP result If triplicates contain set-up errors, these experiments must be repeated, including the corresponding blank/test In the annex: An experimental protocol for the TS and VS analyses An indication of a required moisture content of the digestate (mechanism to investigate in detail first) An indication of the range of required VS concentrations or a maximal production rate for blank tests according to certain experimental conditions (mechanism to investigate in detail first). This report identified several parameters, which contribute to the inter-laboratory inconsistency of BMPs. These findings should be investigated further in order to prove and quantify their impact. An effort should be made to demonstrate the newly developed BMP-computation-method, which could eventually lead to more consistent results. The limitation of this study was, that a relatively low amount of data was available compared to their characteristics. Consequently, the findings could only be proven on a few examples and should therefore only be seen as evidences. Further, this led also to the risk of overfitting, since a relative high amount of parameters needed to be included into the statistical model.

2012