The goal of this paper is to characterize function distributions that general neural networks trained by descent algorithms (GD/SGD), can or cannot learn in polytime. The results are: (1) The paradigm of general neural networks trained by SGD is poly-time ...