Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics. An object's characteristics are also known as feature values and are typically presented to the machine in a vector called a feature vector. Such classifiers work well for practical problems such as document classification, and more generally for problems with many variables (features), reaching accuracy levels comparable to non-linear classifiers while taking less time to train and use. If the input feature vector to the classifier is a real vector , then the output score is where is a real vector of weights and f is a function that converts the dot product of the two vectors into the desired output. (In other words, is a one-form or linear functional mapping onto R.) The weight vector is learned from a set of labeled training samples. Often f is a threshold function, which maps all values of above a certain threshold to the first class and all other values to the second class; e.g., The superscript T indicates the transpose and is a scalar threshold. A more complex f might give the probability that an item belongs to a certain class. For a two-class classification problem, one can visualize the operation of a linear classifier as splitting a high-dimensional input space with a hyperplane: all points on one side of the hyperplane are classified as "yes", while the others are classified as "no". A linear classifier is often used in situations where the speed of classification is an issue, since it is often the fastest classifier, especially when is sparse. Also, linear classifiers often work very well when the number of dimensions in is large, as in document classification, where each element in is typically the number of occurrences of a word in a document (see document-term matrix). In such cases, the classifier should be well-regularized.
Florent Gérard Krzakala, Lenka Zdeborová, Hugo Chao Cui, Bruno Loureiro
Alexandre Massoud Alahi, Mohamed Ossama Ahmed Abdelfattah, Mariam Ahmed Mahmoud Hegazy Hassan
Sabine Süsstrunk, Tong Zhang, Yufan Ren