Concept# Linear model

Summary

In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the term is also used in time series analysis with a different meaning. In each case, the designation "linear" is used to identify a subclass of models for which substantial reduction in the complexity of the related statistical theory is possible.
Linear regression models
Linear regression
For the regression case, the statistical model is as follows. Given a (random) sample (Y_i, X_{i1}, \ldots, X_{ip}), , i = 1, \ldots, n the relation between the observations Y_i and the independent variables X_{ij} is formulated as
:Y_i = \beta_0 + \beta_1 \phi_1(X_{i1}) + \cdots + \beta_p \phi_p(X_{ip}) + \varepsilon_i \qquad i = 1, \ldots, n
where \phi_1, \ldots, \phi_p may be no

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related people (11)

Related publications (65)

Loading

Loading

Loading

Related courses (44)

MATH-341: Linear models

Regression modelling is a fundamental tool of statistics, because it describes how the law of a random variable of interest may depend on other variables. This course aims to familiarize students with linear models and some of their extensions, which lie at the basis of more general regression model

CS-233(a): Introduction to machine learning (BA3)

Machine learning and data analysis are becoming increasingly central in many sciences and applications. In this course, fundamental principles and methods of machine learning will be introduced, analyzed and practically implemented.

FIN-403: Econometrics

The course covers basic econometric models and methods that are routinely applied to obtain inference results in economic and financial applications.

Related units (12)

Related concepts (12)

Statistics

Statistics (from German: Statistik, "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and present

Linear regression

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variable

Dependent and independent variables

Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables are studied under the supposition or demand that they de

This paper is concerned with frequency domain theory for functional time series, which are temporally dependent sequences of functions in a Hilbert space. We consider a variance decomposition, which is more suitable for such a data structure than the variance decomposition based on the Karhunen-Loeve expansion. The decomposition we study uses eigenvalues of spectral density operators, which are functional analogs of the spectral density of a stationary scalar time series. We propose estimators of the variance components and derive convergence rates for their mean square error as well as their asymptotic normality. The latter is derived from a frequency domain invariance principle for the estimators of the spectral density operators. This principle is established for a broad class of linear time series models. It is a main contribution of the paper.

The increased accessibility of data that are geographically referenced and correlated increases the demand for techniques of spatial data analysis. The subset of such data comprised of discrete counts exhibit particular difficulties and the challenges further increase when a large proportion (typically 50% or more) of the counts are zero-valued. Such scenarios arise in many applications in numerous fields of research and it is often desirable to infer on subtleties of the process, despite the lack of substantive information obscuring the underlying stochastic mechanism generating the data. An ecological example provides the impetus for the research in this thesis: when observations for a species are recorded over a spatial region, and many of the counts are zero-valued, are the abundant zeros due to bad luck, or are aspects of the region making it unsuitable for the survival of the species? In the framework of generalized linear models, we first develop a zero-inflated Poisson generalized linear regression model, which explains the variability of the responses given a set of measured covariates, and additionally allows for the distinction of two kinds of zeros: sampling ("bad luck" zeros), and structural (zeros that provide insight into the data-generating process). We then adapt this model to the spatial setting by incorporating dependence within the model via a general, leniently-defined quasi-likelihood strategy, which provides consistent, efficient and asymptotically normal estimators, even under erroneous assumptions of the covariance structure. In addition to this advantage of robustness to dependence misspecification, our quasi-likelihood model overcomes the need for the complete specification of a probability model, thus rendering it very general and relevant to many settings. To complement the developed regression model, we further propose methods for the simulation of zero-inflated spatial stochastic processes. This is done by deconstructing the entire process into a mixed, marked spatial point process: we augment existing algorithms for the simulation of spatial marked point processes to comprise a stochastic mechanism to generate zero-abundant marks (counts) at each location. We propose several such mechanisms, and consider interaction and dependence processes for random locations as well as over a lattice.

Related lectures (142)

To appreciate how neural circuits in the brain control behaviors, we must identify how the neurons comprising the circuit are connected. Neuronal connectivity is difficult to determine experimentally, whereas neuronal activity can often be readily measured. I describe a statistical framework to estimate circuit connectivity directly from measured activity patterns. Because we usually only have access to a small subset of neurons of a circuit, the estimated connectivity reflects an effective coupling, that is, how spiking activity in one neuron effectively modulates the activity of other neurons. For small circuits, like the nervous system of the crab that controls gut muscle activity, we could show that it is possible to derive the actual physiological connectivity from observing neural activity alone. This was achieved with a regression model adapted to the spike train structure of the data (Generalized Linear Model, GLM). This is the first successful demonstration of a network inference algorithm on a physiological circuit for which the connections are known. For larger networks, like cortical networks, the concept of effective connectivity - though not equivalent to structural connectivity - is useful to characterize the functional properties of the network. For example, we may assess whether networks have small-world or scale-free properties that are important for information processing. We find that cortical networks show a small, but significant small-world structure by applying our estimation framework on multi-electrode recordings from the visual system of the awake monkey. Finally, we study how well spike dynamics and network topology can be inferred from noisy calcium imaging data. We applied our framework on simulated data to explore how uncertainties in spike inference due to experimental parameters affect estimates of network connectivity and their topological features. We find that considerable information about the connectivity can be extracted from the neural activity, but only if spikes are reconstructed with high temporal precision. We then study how errors in the network reconstruction affect the estimation of a number of graph-theoretic measures. Our findings provide a benchmark for future experiments that aim to reliably infer neuronal network properties.