We are interested in multilayer graph clustering, which aims at dividing the graph nodes into categories or communities. To do so, we propose to learn a clustering-friendly embedding of the graph nodes by solving an optimization problem that involves a fidelity term to the layers of a given multilayer graph, and a regularization on the (single-layer) graph induced by the embedding. The fidelity term uses the contrastive loss to properly aggregate the observed layers into a representative embedding. The regularization pushes for a sparse and community-aware graph, and it is based on a measure of graph sparsification called "effective resistance", coupled with a penalization of the first few eigenvalues of the representative graph Laplacian matrix to favor the formation of communities. The proposed optimization problem is non-convex but fully differentiable, and thus can be solved via the descent gradient method. Experiments show that our method leads to a significant improvement w.r.t. state-of-the-art multilayer graph clustering algorithms.