Fast Multi-GPU communication over PCI Express
NCCL (pronounced "Nickel") is a stand-alone library of standard collective communication routines for GPUs.
NVIDIA Collective Communications Library
Description of the topic
NCCL is used to communicate between multiple GPUs and multiple machines with GPUs when doing distributed Deep Learning Training. When using multiple computers, NCCL uses TCP/IP to communicate. NCCL supports an arbitrary number of GPUs installed in a single node or across multiple nodes and can be used in either single- or multi-process (e.g., MPI) applications.
The tasks for the master project will be to:
- Benchmark and analyze the existing NCCL implementation with TCP/IP
- Use TCP/IP over PCIe to get a baseline performance.
- Write an optimized PCIe transport for NCCL
- Contribute code back to the open-source NCCL project.
Implement PCIe transport in the NVIDIA Collective Communications Library (NCCL) and use Deep Learning Training to benchmark the implementation.
In-depth knowledge on how to distribute workloads over multiple machines connected in a PCIe network. The student will also get detailed insight in working with and modifying and contributing code to an existing open-source library.
Good understanding of C and/or C++ programming. INF3151 or equivalent is recommended.