**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Lecture# Gaussian Naive Bayes & K-NN

Description

This lecture covers the topics of Gaussian Naive Bayes and K-nearest neighbors (K-NN). The instructor starts by discussing student feedback and the importance of data-driven improvements. The lecture delves into the probabilistic classification technique of Naive Bayes, explaining the prior probability and generative models. The concept of conditional independence assumption is introduced, along with the implementation of Gaussian Naive Bayes. The instructor also explains the K-NN algorithm, emphasizing the importance of choosing the right value for K and the distance metric. The lecture concludes with insights on the challenges of high-dimensional data and the considerations for hyperparameter tuning.

Login to watch the video

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

In course

Related concepts (279)

ME-390: Foundations of artificial intelligence

This course provides the students with basic theory to understand the machine learning approach, and the tools to use the approach for problems arising in engineering applications.

Generative model

In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsistent, but three major types can be distinguished, following : A generative model is a statistical model of the joint probability distribution on given observable variable X and target variable Y; A discriminative model is a model of the conditional probability of the target Y, given an observation x; and Classifiers computed without using a probability model are also referred to loosely as "discriminative".

Probability distribution

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events (subsets of the sample space). For instance, if X is used to denote the outcome of a coin toss ("the experiment"), then the probability distribution of X would take the value 0.5 (1 in 2 or 1/2) for X = heads, and 0.

Naive Bayes classifier

In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Bayesian network models, but coupled with kernel density estimation, they can achieve high accuracy levels. Naive Bayes classifiers are highly scalable, requiring a number of parameters linear in the number of variables (features/predictors) in a learning problem.

Random forest

Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned. Random decision forests correct for decision trees' habit of overfitting to their training set.

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances.

Related lectures (753)

Logistic Regression: Probabilistic Interpretation

Covers logistic regression's probabilistic interpretation, multinomial regression, KNN, hyperparameters, and curse of dimensionality.

Unsupervised Learning: PCA & K-means

Covers unsupervised learning with PCA and K-means for dimensionality reduction and data clustering.

Document Analysis: Topic Modeling

Explores document analysis, topic modeling, and generative models for data generation in machine learning.

Probabilistic Linear Regression

Explores probabilistic linear regression, covering joint and conditional probability, ridge regression, and overfitting mitigation.

Gaussian Mixture Models: Data Classification

Explores denoising signals with Gaussian mixture models and EM algorithm, EMG signal analysis, and image segmentation using Markovian models.