**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Lecture# Unsupervised Learning: Clustering & Dimensionality Reduction

Description

This lecture covers the fundamentals of unsupervised learning, focusing on clustering with K-means and dimensionality reduction using Principal Component Analysis (PCA). It explains how unsupervised learning identifies patterns in data without predefined labels. The instructor discusses the K-means algorithm for grouping data points based on proximity and PCA for finding a new set of features that best represent the data. The lecture also introduces autoencoders as neural networks for dimensionality reduction. Practical examples and applications are provided to illustrate the concepts.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

In course

ME-390: Foundations of artificial intelligence

This course provides the students with basic theory to understand the machine learning approach, and the tools to use the approach for problems arising in engineering applications.

Related concepts (209)

Data

In common usage and statistics, data (USˈdætə; UKˈdeɪtə) is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally. A datum is an individual value in a collection of data. Data is usually organized into structures such as tables that provide additional context and meaning, and which may themselves be used as data in larger structures.

Big data

Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many entries (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe big data is the one associated with a large body of information that we could not comprehend when used only in smaller amounts.

Data quality

Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for [its] intended uses in operations, decision making and planning". Moreover, data is deemed of high quality if it correctly represents the real-world construct to which it refers. Furthermore, apart from these definitions, as the number of data sources increases, the question of internal data consistency becomes significant, regardless of fitness for use for any particular external purpose.

Data science

Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured, and unstructured data. Data science also integrates domain knowledge from the underlying application domain (e.g., natural sciences, information technology, and medicine). Data science is multifaceted and can be described as a science, a research paradigm, a research method, a discipline, a workflow, and a profession.

Data mining

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information (with intelligent methods) from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD.

Related lectures (1,000)

Document Analysis: Topic ModelingDH-406: Machine learning for DH

Explores document analysis, topic modeling, and generative models for data generation in machine learning.

Unsupervised Learning: Dimensionality Reduction and ClusteringME-390: Foundations of artificial intelligence

Covers unsupervised learning, focusing on dimensionality reduction and clustering, explaining how it helps find patterns in data without labels.

Machine Learning FundamentalsDH-406: Machine learning for DH

Introduces fundamental machine learning concepts, covering regression, classification, dimensionality reduction, and deep generative models.

Neural Networks Recap: Activation FunctionsDH-406: Machine learning for DH

Covers the basics of neural networks, activation functions, training, image processing, CNNs, regularization, and dimensionality reduction methods.

Unsupervised Learning: PCA & K-meansME-390: Foundations of artificial intelligence

Covers unsupervised learning with PCA and K-means for dimensionality reduction and data clustering.