Was this page helpful?

Understanding NVIDIA NVLink

Reviewed on 13 March 2025 • Published on 13 March 2025

NVLink is NVIDIA’s high-bandwidth, low-latency GPU-to-GPU interconnect with built-in resiliency features, available on Scaleway’s H100-SGX Instances. It was designed to significantly improve the performance and efficiency when connecting GPUs, CPUs, and other components within the same node. It provides much higher bandwidth (up to 900 GB/s total GPU-to-GPU bandwidth in an 8-GPU configuration) and lower latency compared to traditional PCIe Gen 4 (up to 32 GB/s per link). This allows more data to be transferred between GPUs in less time while also reducing latency.

The high bandwidth and low latency make NVLink ideal for applications that require real-time data synchronization and processing, such as AI and HPC use-case scenarios. NVLink provides up to 900 GB/s total bandwidth for multi-GPU I/O and shared memory accesses, which is 7x the bandwidth of PCIe Gen 5. NVLink allows direct GPU-to-GPU interconnection, improving data transfer efficiency and reducing the need for CPU intervention, which can introduce bottlenecks.

NVLink supports the connection of multiple GPUs, enabling the creation of powerful multi-GPU systems capable of handling more complex and demanding workloads. Unified Memory Access allows GPUs to access each other’s memory directly without CPU mediation, which is particularly beneficial for large-scale AI and HPC workloads.

Comparison: NVLink vs. PCIeLink to this anchor

NVLink and PCI Express (PCIe) are both used for GPU communication, but NVLink is specifically designed to address the bandwidth and latency bottlenecks of PCIe in multi-GPU setups.

Feature	NVLink 4.0 (H100-SGX)	PCIe 5.0
Use case	High-performance computing, deep learning	General-purpose computing, graphics
Bandwidth	Up to 900 GB/s (aggregate, multi-GPU)	128 GB/s (x16 bidirectional)
Latency	Lower than PCIe (sub-microsecond)	Higher compared to NVLink
Communication	Direct GPU-to-GPU	Through CPU or PCIe switch
Memory sharing	Unified memory space across GPUs	Requires CPU intervention (higher overhead)
Scalability	Multi-GPU direct connection via NVSwitch	Limited by PCIe lanes
Efficiency	Optimized for GPU workloads	More general-purpose

In summary, NVLink, available on H100-SGX Instances, is superior for multi-GPU AI and HPC workloads due to its higher bandwidth, lower latency, and memory-sharing capabilities, while PCIe remains essential for broader system connectivity and general computing.

Was this page helpful?