Publication# Sampling of alternatives in migration aspiration models

Abstract

The use of discrete choice models (DCMs) is a regular approach to investigating migration aspirations concerning destination choices. However, given the complex substitution patterns between destinations, more advanced model specifications than the multinomial logit (MNL) and nested logit (NL) models which are commonly found in the literature are required. The cross-nested logit (CNL) model allows for a more sophisticated representation of the stochastic structure of destination choices, through the use of overlapping nests while it is also addressing deviations from the property of independence of irrelevant alternatives. However, the shift towards CNL does not come without a cost; these models can be computationally expensive to estimate, especially as the number of observations increases. The estimation speed can be mitigated though via sampling of alternatives i.e. reducing the number of alternatives in the model specification. This method has been previously used mostly in the context of residential choice location. In the current work, we implement sampling of alternatives on migration aspiration choices using the Gallup World Poll data. We examine the impact of stratification and number of alternatives on the CNL model estimates. Moreover, we consider additional MNL and NL specifications to further understand the implications of sampling on DCMs used for modelling migration aspirations.

Discrete choice

In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Such choices contrast with standard consumption models in which the quantity of each good consumed is assumed to be a continuous variable. In the continuous case, calculus methods (e.g. first-order conditions) can be used to determine the optimum amount chosen, and demand can be modeled empirically using regression analysis.

Multinomial logistic regression

In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables (which may be real-valued, binary-valued, categorical-valued, etc.).

Logit

In statistics, the logit (ˈloʊdʒɪt ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in data transformations. Mathematically, the logit is the inverse of the standard logistic function , so the logit is defined as Because of this, the logit is also called the log-odds since it is equal to the logarithm of the odds where p is a probability. Thus, the logit is a type of function that maps probability values from to real numbers in , akin to the probit function.

In discrete choice modeling (DCM), model misspecifications may lead to limited predictability and biased parameter estimates. In this paper, we propose a new approach for estimating choice models in which we divide the systematic part of the utility specif ...

