**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Person# Lingxiao Huang

This person is no longer with EPFL

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (2)

We study the problem of constructing epsilon-coresets for the (k, z)-clustering problem in a doubling metric M(X, d). An epsilon-coreset is a weighted subset S subset of X with weight function w : S -> R->= 0, such that for any k-subset C is an element of X, it holds that Sigma(x is an element of S) w(x).d(z) (x, C) is an element of (1 +/- epsilon) . Sigma(x is an element of X) d(z) (x, C). We present an efficient algorithm that constructs an epsilon-coreset for the (k, z)-clustering problem in M(X, d), where the size of the coreset only depends on the parameters k, z, epsilon and the doubling dimension ddim(M). To the best of our knowledge, this is the first efficient epsilon-coreset construction of size independent of vertical bar X vertical bar for general clustering problems in doubling metrics. To this end, we establish the first relation between the doubling dimension of M(X, d) and the shattering dimension (or VC-dimension) of the range space induced by the distance d. Such a relation is not known before, since one can easily construct instances in which neither one can be bounded by (some function of) the other. Surprisingly, we show that if we allow a small (1 +/- epsilon)-distortion of the distance function d (the distorted distance is called the smoothed distance function), the shattering dimension can be upper bounded by O(epsilon(-O(ddim(M)))). For the purpose of coreset construction, the above bound does not suffice as it only works for unweighted spaces. Therefore, we introduce the notion of tau-error probabilistic shattering dimension, and prove a (drastically better) upper bound of O(ddim(M).log(1/epsilon)+log log 1/tau) for the probabilistic shattering dimension for weighted doubling metrics. As it turns out, an upper bound for the probabilistic shattering dimension is enough for constructing a small coreset. We believe the new relation between doubling and shattering dimensions is of independent interest and may find other applications. Furthermore, we study robust coresets for (k, z)-clustering with outliers in a doubling metric. We show an improved connection between alpha-approximation and robust coresets. This also leads to improvement upon the previous best known bound of the size of robust coreset for Euclidean space [Feldman and Langberg, STOC 11]. The new bound entails a few new results in clustering and property testing. As another application, we show constant-sized (epsilon, k, z)-centroid sets in doubling metrics can be constructed by extending our coreset construction. Prior to our result, constant-sized centroid sets for general clustering problems were only known for Euclidean spaces. We can apply our centroid set to accelerate the local search algorithm (studied in [Friggstad et al., FOCS 2016]) for the (k, z)-clustering problem in doubling metrics.

Nisheeth Vishnoi, Laura Elisa Celis, Vijay Keswani, Lingxiao Huang

Developing classification algorithms that are fair with respect to sensitive attributes of the data is an important problem due to the increased deployment of classification algorithms in societal contexts. Several recent works have focused on studying classification with respect to specific fairness metrics, modeled the corresponding fair classification problem as constrained optimization problems, and developed tailored algorithms to solve them. Despite this, there still remain important metrics for which there are no fair classifiers with theoretical guarantees; primarily because the resulting optimization problem is non-convex. The main contribution of this paper is a meta-algorithm for classification that can take as input a general class of fairness constraints with respect to multiple non disjoint and multi-valued sensitive attributes, and which comes with provable guarantees. In particular, our algorithm can handle non-convex "linear fractional" constraints (which includes fairness constraints such as predictive parity) for which no prior algorithm was known. Key to our results is an algorithm for a family of classification problems with convex constraints along with a reduction from classification problems with linear fractional constraints to this family. Empirically, we observe that our algorithm is fast, can achieve near-perfect fairness with respect to various fairness metrics, and the loss in accuracy due to the imposed fairness constraints is often small.