Understanding NVIDIA NVLink
NVLink is NVIDIA’s high-bandwidth, low-latency GPU-to-GPU interconnect with built-in resiliency features, available on Scaleway’s H100-SGX Instances. It was designed to significantly improve the performance and efficiency when connecting GPUs, CPUs, and other components within the same node. It provides much higher bandwidth (up to 900 GB/s total GPU-to-GPU bandwidth in an 8-GPU configuration) and lower latency compared to traditional PCIe Gen 4 (up to 32 GB/s per link). This allows more data to be transferred between GPUs in less time while also reducing latency.
The high bandwidth and low latency make NVLink ideal for applications that require real-time data synchronization and processing, such as AI and HPC use-case scenarios. NVLink provides up to 900 GB/s total bandwidth for multi-GPU I/O and shared memory accesses, which is 7x the bandwidth of PCIe Gen 5. NVLink allows direct GPU-to-GPU interconnection, improving data transfer efficiency and reducing the need for CPU intervention, which can introduce bottlenecks.
NVLink supports the connection of multiple GPUs, enabling the creation of powerful multi-GPU systems capable of handling more complex and demanding workloads. Unified Memory Access allows GPUs to access each other’s memory directly without CPU mediation, which is particularly beneficial for large-scale AI and HPC workloads.
Comparison: NVLink vs. PCIeLink to this anchor
NVLink and PCI Express (PCIe) are both used for GPU communication, but NVLink is specifically designed to address the bandwidth and latency bottlenecks of PCIe in multi-GPU setups.
Feature | NVLink 4.0 (H100-SGX) | PCIe 5.0 |
---|---|---|
Use case | High-performance computing, deep learning | General-purpose computing, graphics |
Bandwidth | Up to 900 GB/s (aggregate, multi-GPU) | 128 GB/s (x16 bidirectional) |
Latency | Lower than PCIe (sub-microsecond) | Higher compared to NVLink |
Communication | Direct GPU-to-GPU | Through CPU or PCIe switch |
Memory sharing | Unified memory space across GPUs | Requires CPU intervention (higher overhead) |
Scalability | Multi-GPU direct connection via NVSwitch | Limited by PCIe lanes |
Efficiency | Optimized for GPU workloads | More general-purpose |
In summary, NVLink, available on H100-SGX Instances, is superior for multi-GPU AI and HPC workloads due to its higher bandwidth, lower latency, and memory-sharing capabilities, while PCIe remains essential for broader system connectivity and general computing.