**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Algorithms For Clustering Problems

Abstract

Clustering is a classic topic in combinatorial optimization and plays a central role in many areas, including data science and machine learning. In this thesis, we first focus on the dynamic facility location problem (i.e., the facility location problem in evolving metrics). We present a new LP-rounding algorithm for facility location problems, which yields the first constant factor approximation algorithm for the dynamic facility location problem. Our algorithm installs competing exponential clocks on clients and facilities, and connects every client by the path that repeatedly follows the smallest clock in the neighborhood. The use of exponential clocks gives rise to several properties that distinguish our approach from previous LP-roundings for facility location problems. In particular, we use \emph{no clustering} and we enable clients to connect through paths of \emph{arbitrary lengths}. In fact, the clustering-free nature of our algorithm is crucial for applying our LP-rounding approach to the dynamic problem.

Furthermore, we present both empirical and theoretical aspects of the $k$-means problem. The best known algorithm for $k$-means with a provable guarantee is a simple local-search heuristic that yields an approximation guarantee of $9+\epsilon$, a ratio that is known to be tight with respect to such methods. We overcome this barrier by presenting a new primal-dual approach that enables us (1) to exploit the geometric structure of $k$-means and (2) to satisfy the hard constraint that at most $k$ clusters are selected without deteriorating the approximation guarantee. Our main result is a $6.357$-approximation algorithm with respect to the standard LP relaxation. Our techniques are quite general and we also show improved guarantees for the general version of $k$-means where the underlying metric is not required to be Euclidean and for $k$-median in Euclidean metrics.

We also improve the running time of our algorithm to almost linear running time and still maintain a provable guarantee. We compare our algorithm with {\sc K-Means++} (a widely studied algorithm) and show that we obtain better accuracy with comparable and even better running time.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related concepts (17)

Related publications (18)

Algorithm

In mathematics and computer science, an algorithm (ˈælɡərɪðəm) is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algo

Approximation algorithm

In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable

Combinatorial optimization

Combinatorial optimization is a subfield of mathematical optimization that consists of finding an optimal object from a finite set of objects, where the set of feasible solutions is discrete or ca

Loading

Loading

Loading

Hyung Chan An, Ashkan Norouzi Fard, Ola Nils Anders Svensson

The dynamic facility location problem is a generalization of the classic facility location problem proposed by Eisenstat, Mathieu, and Schabanel to model the dynamics of evolving social/infrastructure networks. The generalization lies in that the distance metric between clients and facilities changes over time. This leads to a trade-off between optimizing the classic objective function and the "stability" of the solution: There is a switching cost charged every time a client changes the facility to which it is connected. While the standard linear program (LP) relaxation for the classic problem naturally extends to this problem, traditional LP-rounding techniques do not, as they are often sensitive to small changes in the metric resulting in frequent switches. We present a new LP-rounding algorithm for facility location problems, which yields the first constant approximation algorithm for the dynamic facility location problem. Our algorithm installs competing exponential clocks on the clients and facilities and connects every client by the path that repeatedly follows the smallest clock in the neighborhood. The use of exponential clocks gives rise to several properties that distinguish our approach from previous LP roundings for facility location problems. In particular, we use no clustering and we allow clients to connect through paths of arbitrary lengths. In fact, the clustering-free nature of our algorithm is crucial for applying our LP-rounding approach to the dynamic problem.

Approximation algorithms are a commonly used tool for designing efficient algorithmic solutions for intractable problems, at the expense of the quality of the output solution. A prominent technique for designing such algorithms is the use of Linear Programming (LP) relaxations. An optimal solution to such a relaxation provides a bound on the objective value of the optimal integral solution, to which we compare the integral solution we return. In this context, when studying a specific problem, two natural questions often arise: What is a strong LP relaxation for this problem, and how can we exploit it? Over the course of the past few decades, a significant amount of effort has been expended by the research community in order to answer these questions for a variety of interesting intractable problems. Although there exist multiple problems for which we have designed LP relaxations that achieve best-possible guarantees, there still exist numerous problems for which we either have no strong LP relaxations, or do not know how to use them. The main focus of this thesis is extending our understanding of such strong relaxations. We focus on designing good approximation algorithms for certain allocation problems, by employing a class of strong LP relaxations, called configuration-LPs. For many such allocation problems, the best-known results are derived by using simple and natural LP relaxations, whereas configuration-LPs have been used successfully on several occasions in order to break pre-existing barriers set by weaker relaxations. However, our understanding of configuration-LPs is far from complete for many problems. Therefore, understanding and using these relaxations to the farthest extent possible is a quite intriguing question. Answering this question could result in improved approximation algorithms for a wide variety of allocation problems. The first problem we address in this thesis is the restricted max-min fair allocation problem. Prior to our work, the best known result provided an $\Omega(1)$-approximation that ran in polynomial time. Also, it was known how to estimate the value of an optimal solution to the problem within a factor of $1/(4+c)$, for any $c>0$, by solving the corresponding configuration-LP. Our first contribution in this thesis is the design of a $1/13$-approximation algorithm for the problem, using the configuration-LP. Specifically, although our algorithm is fully combinatorial, it consists of a local-search procedure that is guaranteed to succeed only when the configuration-LP is feasible. In order to establish the correctness and running time of the algorithm, it is crucial to use the configuration-LP in our analysis. The second problem we study is the scheduling of jobs on unrelated machines in order to minimize the sum of weighted completion times. For this problem, the best known approximation algorithm achieves a ratio of $3/2-r$, for some small $r>0$. Our second contribution in this thesis is the improvement of this ratio to $(1+\sqrt{2})/2+c$, for any $c>0$, for the special case of the problem where the jobs have uniform Smith ratios. To achieve this ratio, we design a randomized rounding algorithm that rounds solutions to the corresponding configuration-LP. Through a careful examination of the distribution this randomized algorithm outputs, we identify the one that maximizes the approximation ratio, and we then upper bound the ratio this worst-case distribution exhibits by $(1+\sqrt{2})/2+c$.

Hyung Chan An, Ola Nils Anders Svensson

Linear programming has played a key role in the study of algorithms for combinatorial optimization problems. In the field of approximation algorithms, this is well illustrated by the uncapacitated facility location problem. A variety of algorithmic methodologies, such as LP-rounding and primal-dual method, have been applied to and evolved from algorithms for this problem. Unfortunately, this collection of powerful algorithmic techniques had not yet been applicable to the more general capacitated facility location problem. In fact, all of the known algorithms with good performance guarantees were based on a single technique, local search, and no linear programming relaxation was known to efficiently approximate the problem. In this paper, we present a linear programming relaxation with constant integrality gap for capacitated facility location. We demonstrate that the fundamental theories of multi-commodity flows and matchings provide key insights that lead to the strong relaxation. Our algorithmic proof of integrality gap is obtained by finally accessing the rich toolbox of LP-based methodologies: we present a constant factor approximation algorithm based on LP-rounding. © 2014 IEEE.