QCT's Next Leap in Accelerated Computing
NVIDIA GB200 Grace™ Blackwell Superchip

Accelerated computing performance for AI and HPC applications

QCT NVIDIA Grace™ Blackwell Products

NVIDIA MGX™​

Bring Accelerated Computing to AI and HPC Users with a Modular Design​
GPUCPUDPUA single NVIDIA MGX™ architecture enables100+ configurations for diverse applicationsQCT’s MGX™ Architecture SystemsAIHPC+DataAnalyticsDigitalTwinsCloudServicesCloudGaming5G

NVIDIA MGX™ is a modular architecture with a variety of server configurations, that can be used for a wide variety of use cases, tailored for AI, HPC, and NVIDIA Omniverse™ applications.  

NVIDIA MGX™ supports a range of GPUs, CPUs, data processing units (DPUs), and network adapters, including x86 and ARM processors. 

NVIDIA GB200 Grace™ Blackwell Superchip

The NVIDIA GB200 Grace™ Blackwell Superchip arrives as the flagship of the Blackwell architecture, catapulting generative AI to trillion-parameter scale and delivering 30X faster real-time large language model (LLM) inference, 25X lower TCO, and 25X less energy. It combines two NVIDIA Blackwell GPUs and an NVIDIA Grace CPU and can scale up to the GB200 NVL72, a 72-GPU fifth-generation NVIDIA® NVLink®-connected GPU that acts as a single massive GPU.​

Resource: NVIDIA

NVIDIA GB200 Grace™ Blackwell Superchip​
Rack-scale Architecture (GB200 NVL72)​

First Architecture with Rack-level NVIDIA® NVLink®​

This new class of rack-scale architecture is capable of interconnecting 72 NVIDIA Blackwell GPUs via NVIDIA® NVLink® offering linear scalability, massive shared memory space across GPUs and exceptional power efficiency with liquid cooled NVIDIA GB200 Grace™ Blackwell Superchips. This acts as a single massive GPU and delivers 30X faster real-time trillion-parameter LLM inference.​ 

NVIDIA® NVLink® significantly enhances real-time inference performance of a 1.8 trillion-parameter mixture of experts (MoE) model, which requires 32 GPUs for a 50ms token-to-token latency. This setup, interconnected by NVLink, achieves an 8X higher token-per-second throughput compared to a single eight-B200 server. In comparison, the same number of B200 GPUs using InfiniBand achieves a 4X increase. NVLink maximizes at-scale performance and GPU communication across 32 GPUs (see diagram below). 

True Heterogenous AI/HPC System
  • Future compatible with NVIDIA CPU, GPU, DPU
  • Easy to deploy with full support for an NVIDIA software stack
Accelerated Computing Workloads
  • Machine Learning and Inference workloads

    e.g., NLP, DLRM

  • Database workloads

    e.g., Hash Join

  • HPC workloads

    e.g., OpenFOAM, GROMACS

Breakthrough Performance
  • Memory size intensive
  • CPU-to-GPU interaction intensive

To accelerate performance for multitrillion-parameter and mixture-of-experts AI models, the latest iteration of NVIDIA NVLink® delivers groundbreaking 1.8TB/s bidirectional throughput per GPU, ensuring seamless high-speed communication among up to 576 GPUs for the most complex LLMs.

(Source: NVIDIA®)

Liquid-cooled GB200 NVL72 racks reduce a data center’s carbon footprint and energy consumption. Liquid cooling increases compute density, reduces the amount of floor space used, and facilitates high-bandwidth, low-latency GPU communication with large NVLink domain architectures. Compared to NVIDIA H100 air-cooled infrastructure, GB200 delivers 25X more performance at the same power while reducing water consumption.

(Source: NVIDIA®)

QCT NVIDIA Grace™ Blackwell & NVIDIA GB200 NVL72 Products

NVIDIA​ MGX™ Architecture

The NVIDIA GB200 Grace™ Blackwell Superchip supercharges next-generation AI and accelerated computing. Acting as a heart to a much larger system, the NVIDIA Grace™ Blackwell Superchip can scale up to the NVIDIA GB200 NVL72. This is the first architecture with rack level fifth-generation NVIDIA® NVLink®, connecting 72 high-performance NVIDIA B200 Tensor Core GPUs and 36 NVIDIA Grace CPUs to deliver 900 GB/s of bidirectional bandwidth.

With NVIDIA® NVLink® Chip-2-Chip (C2C), applications have coherent access to a unified memory space to eliminate complexity and speed up deployment. This simplifies programming and supports the larger memory needs of trillion-parameter LLMs, transformer models for multimodal tasks, models for large-scale simulations, and generative models for 3D data.

Additionally, the NVIDIA GB200 NVL72 uses NVIDIA® NVLink® and cold-plate-based liquid cooling to create a single massive 72-GPU rack that can overcome thermal challenges, increase compute density, and facilitate high-bandwidth, low-latency GPU communication.

Book Sample Now

CAPTCHA image

This helps us prevent spam, thank you.

*Please complete required fields.

QCT's Next Leap in Accelerated Computing

NVIDIA GB200 Grace™ Blackwell Superchip​

Watch Videos

Top