Nettsider med emneord «generalization»
Publisert 10. aug. 2023 21:23
In this project, we addresses fundamental question of “How much should we overparameterize a NN?” with a focus on genralizaiton and common practice in DL such as SGD, nonsmooth activations, and implicit/explicit regularizations. For smooth activations and gradient descent, we established current best scaling on the number of parameters for fully-trained shallow NNs under standard initialization schemes [1].