NVIDIA has added three new technologies to its HGX AI supercomputing platform to help speed the new era of industrial AI and HPC: the NVIDIA A100 80GB PCIe GPU, NVIDIA NDR 400G InfiniBand networking, and NVIDIA Magnum IO GPUDirect Storage software. These technology additions would provide the high performance required for industrial HPC innovation when used together.
A100 80GB PCIe is powered by the NVIDIA Ampere architecture, which features Multi-Instance GPU (MIG) technology to deliver acceleration for smaller workloads such as AI inference. MIG allows HPC systems to scale compute and memory down with guaranteed quality of service. In addition to PCIe, there are four- and eight-way NVIDIA HGX A100 configurations.
Because of the A100 80GB PCIe’s huge memory capacity and high memory bandwidth, more data and larger neural networks can be stored in memory, reducing inter-node communication and energy consumption. It would help researchers to obtain better throughput and faster findings when combined with faster memory bandwidth, enhancing the value of their IT expenditures.
NVIDIA Quantum-2 Modular Switches
NVIDIA InfiniBand, an offloadable in-network computing connection, boosts HPC server systems that demand unmatched data throughput. NDR InfiniBand increases performance in industrial and scientific HPC systems to tackle massive challenges. The NVIDIA Quantum-2 fixed-configuration switch systems would provide 3x more port density than HDR InfiniBand, with 64 ports of NDR 400Gb/s InfiniBand per port (or 128 ports of NDR200).
The NVIDIA Quantum-2 modular switches would offer scalable port configurations up to 2,048 NDR 400Gb/s InfiniBand ports (or 4,096 NDR200 ports) with a total bidirectional throughput of 1.64 petabits per second, which according to NVIDIA is “5 times faster” than the previous generation. With a DragonFly+ network architecture, the 2,048-port switch delivers 6.5x more scalability than the previous generation, with the capacity to link more than a million nodes in just three hops.
The NVIDIA Quantum-2 switches, which are scheduled to be available by the end of the year, are backward and forward-compatible, allowing for easy transfer and extension of current systems and applications.
Magnum IO GPUDirect Storage, which is available now, would allow direct memory access between GPU memory and storage, delivering the highest performance for complex applications. The direct channel would allow programs to take advantage of decreased I/O latency and utilize the full capacity of network adapters while reducing CPU load and minimizing the impact of increasing data consumption.
“The HPC revolution started in academia and is rapidly extending across a broad range of industries,” said Jensen Huang, founder and CEO of NVIDIA. “Key dynamics are driving super-exponential, super-Moore’s law advances that have made HPC a useful tool for industries. NVIDIA’s HGX platform gives researchers unparalleled high performance computing acceleration to tackle the toughest problems industries face.”
Hundreds of global partners, including Atos, Dell Technologies, Hewlett Packard Enterprise (HPE), Lenovo, Microsoft Azure, and NetApp, use the NVIDIA HGX AI supercomputing platform for their systems and applications.