**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Regression analysis

Summary

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables' or 'features'). The most common form of regression analysis is linear regression, in which one finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line (or hyperplane) that minimizes the sum of squared differences between the true data and that line (or hyperplane). For specific mathematical reasons (see linear regression), this allows the researcher to estimate the conditional expectation (or population average value) of the dependent variable when the independent variables take on a given set of valu

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related publications (100)

Loading

Loading

Loading

Related courses (163)

Related people (9)

MGT-581: Introduction to econometrics

The course provides an introduction to econometrics. The objective is to learn how to make valid (i.e., causal) inference from economic data. It explains the main estimators and present methods to deal with endogeneity issues.

MATH-408: Regression methods

General graduate course on regression methods

FIN-403: Econometrics

The course covers basic econometric models and methods that are routinely applied to obtain inference results in economic and financial applications.

Related units (9)

Related concepts (131)

Statistics

Statistics (from German: Statistik, "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and present

Linear regression

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variable

Ordinary least squares

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear functio

The increased accessibility of data that are geographically referenced and correlated increases the demand for techniques of spatial data analysis. The subset of such data comprised of discrete counts exhibit particular difficulties and the challenges further increase when a large proportion (typically 50% or more) of the counts are zero-valued. Such scenarios arise in many applications in numerous fields of research and it is often desirable to infer on subtleties of the process, despite the lack of substantive information obscuring the underlying stochastic mechanism generating the data. An ecological example provides the impetus for the research in this thesis: when observations for a species are recorded over a spatial region, and many of the counts are zero-valued, are the abundant zeros due to bad luck, or are aspects of the region making it unsuitable for the survival of the species? In the framework of generalized linear models, we first develop a zero-inflated Poisson generalized linear regression model, which explains the variability of the responses given a set of measured covariates, and additionally allows for the distinction of two kinds of zeros: sampling ("bad luck" zeros), and structural (zeros that provide insight into the data-generating process). We then adapt this model to the spatial setting by incorporating dependence within the model via a general, leniently-defined quasi-likelihood strategy, which provides consistent, efficient and asymptotically normal estimators, even under erroneous assumptions of the covariance structure. In addition to this advantage of robustness to dependence misspecification, our quasi-likelihood model overcomes the need for the complete specification of a probability model, thus rendering it very general and relevant to many settings. To complement the developed regression model, we further propose methods for the simulation of zero-inflated spatial stochastic processes. This is done by deconstructing the entire process into a mixed, marked spatial point process: we augment existing algorithms for the simulation of spatial marked point processes to comprise a stochastic mechanism to generate zero-abundant marks (counts) at each location. We propose several such mechanisms, and consider interaction and dependence processes for random locations as well as over a lattice.

,

We present a framework for performing regression when both covariate and response are probability distributions on a compact interval. Our regression model is based on the theory of optimal transportation, and links the conditional Frechet mean of the response to the covariate via an optimal transport map. We define a Frechet-least-squares estimator of this regression map, and establish its consistency and rate of convergence to the true map, under both full and partial observations of the regression pairs. Computation of the estimator is shown to reduce to a standard convex optimization problem, and thus our regression model can be implemented with ease. We illustrate our methodology using real and simulated data.

Multiple generalized additive models are a class of statistical regression models wherein parameters of probability distributions incorporate information through additive smooth functions of predictors. The functions are represented by basis function expansions, whose coefficients are the regression parameters. The smoothness is induced by a quadratic roughness penalty on the functionsâ curvature, which is equivalent to a weighted $L_2$ regularization controlled by smoothing parameters. Regression fitting relies on maximum penalized likelihood estimation for the regression coefficients, and smoothness selection relies on maximum marginal likelihood estimation for the smoothing parameters.
Owing to their nonlinearity, flexibility and interpretability, generalized additive models are widely used in statistical modeling, but despite recent advances, reliable and fast methods for automatic smoothing in massive datasets are unavailable. Existing approaches are either reliable, complex and slow, or unreliable, simpler and fast, so a compromise must be made. A bridge between these categories is needed to extend use of multiple generalized additive models to settings beyond those possible in existing software. This thesis is one step in this direction. We adopt the marginal likelihood approach to develop approximate expectation-maximization methods for automatic smoothing, which avoid evaluation of expensive and unstable terms. This results in simpler algorithms that do not sacrifice reliability and achieve state-of-the-art accuracy and computational efficiency.
We extend the proposed approach to big-data settings and produce the first reliable, high-performance and distributed-memory algorithm for fitting massive multiple generalized additive models. Furthermore, we develop the underlying generic software libraries and make them accessible to the open-source community.

Related lectures (473)