PrevalenceIn epidemiology, prevalence is the proportion of a particular population found to be affected by a medical condition (typically a disease or a risk factor such as smoking or seatbelt use) at a specific time. It is derived by comparing the number of people found to have the condition with the total number of people studied and is usually expressed as a fraction, a percentage, or the number of cases per 10,000 or 100,000 people. Prevalence is most often used in questionnaire studies. Incidence (epidemiology)#Incidence vs.
Heteroskedasticity-consistent standard errorsThe topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors (or simply robust standard errors), Eicker–Huber–White standard errors (also Huber–White standard errors or White standard errors), to recognize the contributions of Friedhelm Eicker, Peter J. Huber, and Halbert White.
Optimal designIn the design of experiments, optimal designs (or optimum designs) are a class of experimental designs that are optimal with respect to some statistical criterion. The creation of this field of statistics has been credited to Danish statistician Kirstine Smith. In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated without bias and with minimum variance. A non-optimal design requires a greater number of experimental runs to estimate the parameters with the same precision as an optimal design.
Geometric meanIn mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite set of real numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometric mean is defined as the nth root of the product of n numbers, i.e., for a set of numbers a1, a2, ..., an, the geometric mean is defined as or, equivalently, as the arithmetic mean in logscale: Most commonly the numbers are restricted to being non-negative, to avoid complications related to negative numbers not having real roots, and frequently they are restricted to being positive, to enable the use of logarithms.
Imputation (statistics)In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as "unit imputation"; when substituting for a component of a data point, it is known as "item imputation". There are three main problems that missing data causes: missing data can introduce a substantial amount of bias, make the handling and analysis of the data more arduous, and create reductions in efficiency.
Data dredgingData dredging (also known as data snooping or p-hacking) is the misuse of data analysis to find patterns in data that can be presented as statistically significant, thus dramatically increasing and understating the risk of false positives. This is done by performing many statistical tests on the data and only reporting those that come back with significant results.
Generalized meanIn mathematics, generalized means (or power mean or Hölder mean from Otto Hölder) are a family of functions for aggregating sets of numbers. These include as special cases the Pythagorean means (arithmetic, geometric, and harmonic means). If p is a non-zero real number, and are positive real numbers, then the generalized mean or power mean with exponent p of these positive real numbers is (See p-norm).
HistogramA histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to "bin" (or "bucket") the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent and are often (but not required to be) of equal size.
Decision modelA decision model in decision theory is the starting point for a decision method within a formal (axiomatic) system. Decision models contain at least one action axiom. An action is in the form "IF is true, THEN do ". An action axiom tests a condition (antecedent) and, if the condition has been met, then (consequent) it suggests (mandates) an action: from knowledge to action. A decision model may also be a network of connected decisions, information and knowledge that represents a decision-making approach that can be used repeatedly (such as one developed using the Decision Model and Notation standard).
Marginal modelIn statistics, marginal models (Heagerty & Zeger, 2000) are a technique for obtaining regression estimates in multilevel modeling, also called hierarchical linear models. People often want to know the effect of a predictor/explanatory variable X, on a response variable Y. One way to get an estimate for such effects is through regression analysis. In a typical multilevel model, there are level 1 & 2 residuals (R and U variables). The two variables form a joint distribution for the response variable ().