Nvidia may have to postpone the mass production ramp-up of its next-generation AI servers based on the Blackwell B200 and GB200 platforms, according to a new report from TrendForce. The delay is reportedly due to challenges with overheating, power consumption, and the need for optimized interconnections.
Mass Production Delayed Until Mid-2025
TrendForce suggests that mass production and peak shipments of Blackwell-based machines will likely occur in mid-2025, representing a delay of nearly half a year. While some limited quantities of these servers are expected to ship in 2024, the bulk of production is being pushed back. This is mainly due to initial low yields of the B200 GPUs.
Overheating and Power Consumption Challenges
The primary issues are related to the B200 GPUs overheating and excessive power consumption in high-density server racks. A 72-GPU NVL72 rack, initially reported to consume 120kW, now reportedly requires 140 kW, which exceeds the power capacity of many typical data centers. This increased power demand, coupled with cooling requirements, has forced Nvidia to repeatedly revise server rack designs.
Liquid Cooling and Interconnect Optimization
Liquid cooling is essential for Blackwell servers. However, current liquid cooling systems (CDUs) can only handle 60-80kW of thermal power, requiring upgrades to cool the new server racks. Cooling system providers are working to optimize cold plate designs and increase CDU capacity.
TrendForce also points out that Nvidia needs to optimize its interconnections, though it does not specify which interconnects require attention. These combined challenges have led to the reported production delay.
Future Blackwell GPUs
The delay also raises questions about the launch of B200A (a simplified version of the B200) and the future B300/GB300 series, which promise even greater performance but are likely to consume even more power, requiring more advanced designs and cooling solutions.