The motivation for this work is to improve the performance of deep neural networks through the optimization of the individual activation functions. Since the latter results in an infinite-dimensional optimization problem, we resolve the ambiguity by searching for the sparsest and most regular solution in the sense of Lipschitz. To that end, we first introduce a bound that relates the properties of the pointwise nonlinearities to the global Lipschitz constant of the network. By using the proposed bound as regularizer, we then derive a representer theorem that shows that the optimum configuration is achievable by a deep spline network. It is a variant of a conventional deep ReLU network where each activation function is a piecewise-linear spline with adaptive knots. The practical interest is that the underlying spline activations can be expressed as linear combinations of ReLU units and optimized using l(1)-minimization techniques.
Michaël Unser, Alexis Marie Frederic Goujon
The capabilities of deep learning systems have advanced much faster than our ability to understand them. Whilst the gains from deep neural networks (DNNs) are significant, they are accompanied by a growing risk and gravity of a bad outcome. This is tr ...
Volkan Cevher, Grigorios Chrysos, Fanghui Liu, Zhenyu Zhu