NVIDIA Unveils Arm Neoverse-based Grace CPU Superchip for AI and HPC

NVIDIA has announced the release of its first Arm Neoverse-based discrete data center CPU, which is aimed for AI infrastructure and high-performance computing (HPC). In comparison to today’s leading server CPUs, this would give double the memory bandwidth and energy efficiency, according to NVIDIA.

The NVIDIA Grace CPU Superchip is made up of two CPU chips that are linked together using NVLink-C2C, a high-speed, low-latency chip-to-chip connection.

The Grace CPU Superchip is a companion to NVIDIA’s first CPU-GPU integrated module, the Grace Hopper Superchip, which was launched last year and is meant to run large-scale HPC and AI workloads alongside an NVIDIA Hopper architecture-based GPU. The basic CPU design, as well as the NVLink-C2C interface, are identical on both superchips.

“A new type of data center has emerged – AI factories that process and refine mountains of data to produce intelligence,” said Jensen Huang, Founder and Chief Executive Officer (CEO) of NVIDIA. “The Grace CPU Superchip offers the highest performance, memory bandwidth and NVIDIA software platforms in one chip and will shine as the CPU of the world’s AI infrastructure.”

– story continues below the photo –

Photo Jensen Huang, Founder and CEO of NVIDIA
“A new type of data center has emerged – AI factories that process and refine mountains of data to produce intelligence,” said Jensen Huang, Founder and CEO of NVIDIA.

Introducing NVIDIA’s CPU Platform

Grace CPU Superchip combines 144 Arm cores into a single socket for industry-leading performance on the SPECrate 2017 int base benchmark, with an estimated performance of 740. As calculated in NVIDIA’s laboratories with the same class of compilers, this would be more than 1.5x greater than the dual-CPU shipping with the DGX A100 today.

Grace CPU Superchip’s “revolutionary” memory subsystem, which consists of LPDDR5x memory with Error Correction Code for the optimum balance of performance and power consumption, would also deliver the highest energy efficiency and memory bandwidth. The LPDDR5x memory subsystem would provide double the bandwidth of standard DDR5 designs, at 1 terabyte per second, while requiring significantly less power, with the entire CPU and memory needing only 500 watts.

The Grace CPU Superchip is built on Arm v9, the most recent data center architecture. The Grace CPU Superchip combines the best single-threaded core performance with compatibility for Arm’s next generation of vector extensions, bringing instant benefits to a wide range of applications.

NVIDIA’s computational software stacks, including NVIDIA RTX, NVIDIA HPC, NVIDIA AI, and Omniverse, will all operate on the Grace CPU Superchip. Customers can configure servers with the Grace CPU Superchip and NVIDIA ConnectX-7 NICs as standalone CPU-only systems or as GPU-accelerated servers with one, two, four, or eight Hopper-based GPUs, allowing them to optimize performance for their specific workloads while maintaining a single software stack.

Designed for AI, HPC, Cloud and Hyperscale Applications

With the best performance, memory bandwidth, energy economy, and configurability, the Grace CPU Superchip will thrive in the most demanding AI, HPC, data analytics, scientific computing, as well as hyperscale cloud computing applications.

The Grace CPU Superchip’s 144 cores and 1TB/s of memory bandwidth will provide unprecedented performance for CPU-based high performance computing applications. HPC applications are compute-intensive, demanding the highest performing cores, highest memory bandwidth and the right memory capacity per core to speed outcomes.

NVIDIA is working with leading HPC, supercomputing, hyperscale and cloud customers for the Grace CPU Superchip. Both it and the Grace Hopper Superchip are expected to be available in the first half of 2023.