Scalable Deep Learning across Domains

In this project, we modify adaptive variants of unbiased quantization schemes tailored to general variational inequality (VI) problems including those with convex-like structures, e.g., convex minimization, saddle-point problems, and games [4–6] with several applications such as auction theory [7], multi-agent and robust reinforcement learning (RL) [8], adversarially robust learning [9], and generative adversarial networks. In particular, our goal is to design novel adaptive and layer-wise compression schemes tailored to tasks beyond supervised learning building on our recent work on SOTA compression schemes for deep learning [1,10].

Bildet kan inneholde: azure, organisme, font, sirkel, mønster.

DL characterized by massive datasets, inductive bias from large-scale and complex architectures, and nonconvex empirical risk minimization via stochastic gradient descent (SGD) with multi-GPU systems has impacted almost all scientific and engineering fields. Data volume for learning across healthcare, Internet of Things (IoT), web, and financial domains grows continuously. Meanwhile, the size of NNs grows exponentially, e.g, the number of parameters for recent deep language models surpasses hundreds of billions.

Several methods have been proposed to accelerate training for SL such as gradient (or model update) compression, gradient sparsification, weight quantization/sparsification, and reducing the frequency of communication though local updates [1-3]. Unbiased quantization is in particular interesting due to both enjoying strong theoretical guarantees along with providing communication efficiency on the fly, i.e., it converges under the same hyperparameteres tuned for uncompressed variants while providing substantial savings in terms of communication costs. Our recent research has demonstrated that we can significantly accelerate supervised learning in multi-GPU systems, e.g., training deep ResNets on ImageNet, without sacrificing accuracy [1,2].

This project is available for a master student with a strong background in machine learning. If interested, please contact Associate Professor Ali Ramezani-Kebrya for details. Students should be familiar with reinforcement learning, PyTorch/JAX, MPI, CUDA. Familiarity with distributed optimization is a plus.

[1] Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel M. Roy, and Ali Ramezani-Kebrya. Adaptive gradient quantization for data-parallel SGD. In Advances in Neural Information Processing Systems (NeurIPS), 2020.

[2] Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, and Daniel M. Roy. NUQSGD: Provably communication-efficient data-parallel SGD via nonuniform quantization. Journal of Machine Learning Research (JMLR), 22(114):1–43, 2021.

[3] Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. QSGD: Communication-efficient SGD via gradient quantization and encoding. In Proc. Advances in Neural Information Processing Systems (NeurIPS), 2017.

[4] Francisco Facchinei and Jong-Shi Pang. Finite-dimensional variational inequalities and complementarity problems. Springer, 2003.

[5] Heinz H. Bauschke and Patrick L. Combettes. Convex analysis and monotone operator theory in Hilbert spaces. Springer, 2011.

[6] Kimon Antonakopoulos, Thomas Pethick, Ali Kavis, Panayotis Mertikopoulos, and Volkan Cevher. Sifting through the noise: Universal first-order methods for stochastic variational inequalities. In Advances in Neural Information Processing Systems (NeurIPS), 2021.

[7] Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E. Schapire. Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems (NeurIPS), 2015.

[8] Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta. Robust adversarial reinforcement learning. In International Conference on Machine Learning (ICML), 2017.

[9] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems (NeurIPS), 2018.

[10] Mohammadreza Alimohammadi, Ilia Markov, Elias Frantar, and Dan Alistarh. L-GreCo: An Efficient and General Framework for Layerwise-Adaptive Gradient Compression. arXiv preprint arXiv:2210.17357 (2022).

Emneord: Deep learning, Optimization, High-performance computing

Publisert 10. aug. 2023 21:24 - Sist endret 10. aug. 2023 21:24

Veileder(e)

Ali Ramezani-Kebrya Universitetet i Oslo

Scalable Deep Learning across Domains

Veileder(e)

Omfang (studiepoeng)