Publication

Probabilistic methods for neural combinatorial optimization

Nikolaos Karalias
2023
EPFL thesis
Abstract

The monumental progress in the development of machine learning models has led to a plethora of applications with transformative effects in engineering and science. This has also turned the attention of the research community towards the pursuit of constructing artificial intelligence (AI) models with general reasoning capabilities. Yet, despite the staggering success of artificial neural networks in a variety of tasks that involve language and image generation or object detection and recognition, tasks that involve discrete and combinatorial problem-solving are still a fundamental blind spot of those models and present a longstanding obstacle in the road to general-purpose AI systems. Combinatorial optimization problems are prominent representatives in that category as they present fundamental challenges that are hard to tackle within the standard machine learning paradigm. Two fundamental obstacles in this pursuit are i) the difficulty of navigating exponentially large discrete configuration spaces using continuous gradient-based optimization, ii) our inability to procure large amounts of labeled data due to the high computational budget that this requires. The subject of this thesis will be to develop a coherent approach to combinatorial optimization with neural networks that focuses on directly tackling those challenges. In the first half of the thesis, we will present our proposal for neural combinatorial optimization without supervision. We demonstrate how it is possible to design continuous loss functions for constrained optimization problems in a way that enables training without access to labeled data. We leverage the celebrated probabilistic method from the field of combinatorics to argue about the existence of high-quality solutions within the learned representations of a neural network that has been trained with our approach. We also show how to deterministically recover those solutions using derandomization techniques from the literature. In the second half, we expand the scope of our inquiry and design a general framework for continuous extensions of set functions. This approach enables training neural networks for discrete problems even when the objective and the constraints of the problem are given as a black box. We develop extensions for domains like the hypercube but also higher-dimensional domains like the cone of positive semi-definite matrices. This framework enables us to efficiently incorporate problem-specific priors in the pipeline which leads to improved empirical results. Finally, we show that the versatility of this approach extends beyond combinatorial optimization as it can be used to define a novel continuous surrogate of the discrete training error for classification problems. Overall, our proposed methods make progress in advancing the state of the art for neural combinatorial optimization through principled loss function design. Furthermore, by enabling the use of discrete functions in end-to-end differentiable models they pave the way for improved combinatorial and reasoning capabilities for machine learning algorithms.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (36)
Artificial neural network
Artificial neural networks (ANNs, also shortened to neural networks (NNs) or neural nets) are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons.
Recurrent neural network
A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition.
Mathematical optimization
Mathematical optimization (alternatively spelled optimisation) or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.
Show more
Related publications (145)

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Mattia Atzeni

The ability to reason, plan and solve highly abstract problems is a hallmark of human intelligence. Recent advancements in artificial intelligence, propelled by deep neural networks, have revolutionized disciplines like computer vision and natural language ...
EPFL2024

Topics in statistical physics of high-dimensional machine learning

Hugo Chao Cui

In the past few years, Machine Learning (ML) techniques have ushered in a paradigm shift, allowing the harnessing of ever more abundant sources of data to automate complex tasks. The technical workhorse behind these important breakthroughs arguably lies in ...
EPFL2024

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Scott William Pesme

In this PhD manuscript, we explore optimisation phenomena which occur in complex neural networks through the lens of 22-layer diagonal linear networks. This rudimentary architecture, which consists of a two layer feedforward linear network with a diagonal ...
EPFL2024
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.