NVIDIA Blackwell B200 — Flagship GPU for the Generative-AI Era

Name: NVIDIA Blackwell B200 — Flagship GPU for the Generative-AI Era
Start: 2024-03-18

On 18 March 2024, in San Jose, California, NVIDIA CEO Jensen Huang used the GTC 2024 keynote to unveil the next-generation GPU architecture Blackwell—named after the statistician David Harold Blackwell—and its first product, the B200.

The launch landed sixteen months after ChatGPT's debut, in the middle of a generative-AI boom that was rewriting the data-centre market. "We are creating a new industrial revolution," Huang declared.

B200 Specifications

The B200 packs 208 billion transistors. It is a chiplet design: two dies fabricated on TSMC's custom 4 nm (4NP) process, joined by a 10 TB/s "NV-HBI" interconnect and presented as a single logical GPU.

Headline numbers:

FP4 inference: 20 PFLOPS (about 5× H100)
FP8 training: 10 PFLOPS (about 2.5× H100)
Memory: 192 GB HBM3e, 8 TB/s bandwidth
TDP: 1000 W (liquid cooling assumed)

The new FP4 (4-bit floating-point) format was introduced to extract maximum throughput from low-precision inference workloads—aimed squarely at collapsing the per-token cost of LLM serving.

GB200 NVL72 — The Rack as a Computer

Blackwell's true protagonist is not the single GPU but the rack-scale system GB200 NVL72.

A single cabinet pairs 36 Grace CPUs with 72 B200 GPUs, fused by fifth-generation NVLink into one huge shared-memory domain. The rack delivers 1.4 exaflops in FP4, carries 13.5 TB of HBM3e, and is priced at roughly US$2–3 million.

The product symbolised a shift from "buying GPUs" to "buying an AI factory by the rack".

Delay — Mask Defect and Thermals

Behind the headline event, Blackwell hit a serious manufacturing wall.

In August 2024, The Information reported a design flaw in the B200 silicon mask. NVIDIA acknowledged a mismatch in thermal-expansion coefficients between the GPU dies, LSI bridges, RDL interposer and substrate inside the CoWoS-L package—causing package warpage. A re-spin of the top metal layers and bump structure was required, pushing the planned Q3 2024 ramp out by several months.

GB200 NVL72 racks then ran into overheating issues; some customers had to redesign their liquid-cooling loops.

Mass production finally began in December 2024, with large-scale shipment ramping through Q1 2025. Even then demand outstripped supply: lead times on new orders ran 6–12 months.

Customers and TAM

Confirmed GB200 buyers include Microsoft Azure, Amazon AWS, Google Cloud, Oracle Cloud, Meta, Tesla, xAI, OpenAI, CoreWeave and Dell. The four big hyperscalers alone earmarked more than US$200 billion of 2024 AI capex—the bulk of it routed to NVIDIA.

At GTC, Huang put the data-centre TAM at "over US$200 billion". A year later he would lift the figure to US$1 trillion.

Competition — AMD MI300X and Google TPU v5p

NVIDIA's runaway was not entirely uncontested.

AMD MI300X: 153 billion transistors on the CDNA 3 architecture, 192 GB HBM3. Microsoft and Meta announced deployments. The CUDA-ecosystem moat, however, kept AMD perpetually behind on software readiness.

Google TPU v5p: powered Google's own frontier training. Not sold externally—available only via Google Cloud.

AWS Trainium2, Microsoft Maia 100: every cloud provider accelerated its in-house silicon programme. The strategic question—"how long do we keep paying NVIDIA?"—drove the captive-chip push.

Even so, NVIDIA's share of the AI-chip market in 2024–2025 remained above 90%. Rivals still sat at "an alternative that complements NVIDIA", not "a replacement".

Bubble Concerns

Beneath the euphoria, sceptical voices were building. Could hyperscaler AI capex ever be recovered? Would falling inference costs and the rise of Chinese open-weights models (e.g. DeepSeek) eventually shrink demand for NVIDIA's flagship GPUs? Those questions detonated as actual share-price destruction in January 2025 with the DeepSeek-R1 shock.

Blackwell B200 thus stands as both the apex product of the generative-AI compute build-out and a symbol of NVIDIA's peak one-firm dominance. From this summit, the story moves on—into an era in which the economics of AI compute themselves are put under question.

NVIDIA Blackwell B200 — Flagship GPU for the Generative-AI Era

Metadata

NVIDIA Blackwell B200 — Flagship GPU for the Generative-AI Era

B200 Specifications

GB200 NVL72 — The Rack as a Computer

Delay — Mask Defect and Thermals

Customers and TAM

Competition — AMD MI300X and Google TPU v5p

Bubble Concerns

Sources