Nettsider med emneord «understanding deep learning»

Publisert 10. aug. 2023 21:23

In this project, we  addresses fundamental question of “How much should we overparameterize a NN?” with a focus on genralizaiton and common practice in DL such as SGD, nonsmooth activations, and implicit/explicit regularizations.  For smooth activations and gradient descent, we established current best scaling on the number of parameters for fully-trained shallow NNs under standard initialization schemes [1].