The sample complexity of a machine learning algorithm represents the number of training-samples that it needs in order to successfully learn a target function. More precisely, the sample complexity is the number of training-samples that we need to supply to the algorithm, so that the function returned by the algorithm is within an arbitrarily small error of the best possible function, with probability arbitrarily close to 1. There are two variants of sample complexity: The weak variant fixes a particular input-output distribution; The strong variant takes the worst-case sample complexity over all input-output distributions. The No free lunch theorem, discussed below, proves that, in general, the strong sample complexity is infinite, i.e. that there is no algorithm that can learn the globally-optimal target function using a finite number of training samples. However, if we are only interested in a particular class of target functions (e.g, only linear functions) then the sample complexity is finite, and it depends linearly on the VC dimension on the class of target functions. Let be a space which we call the input space, and be a space which we call the output space, and let denote the product . For example, in the setting of binary classification, is typically a finite-dimensional vector space and is the set . Fix a hypothesis space of functions . A learning algorithm over is a computable map from to . In other words, it is an algorithm that takes as input a finite sequence of training samples and outputs a function from to . Typical learning algorithms include empirical risk minimization, without or with Tikhonov regularization. Fix a loss function , for example, the square loss , where . For a given distribution on , the expected risk of a hypothesis (a function) is In our setting, we have , where is a learning algorithm and is a sequence of vectors which are all drawn independently from . Define the optimal riskSet , for each . Note that is a random variable and depends on the random variable , which is drawn from the distribution .