Google Cloud will enhance AI cloud infrastructure with new TPUs and NVIDIA GPUs, the cloud division announced on Oct. 30 at the App Day & Infrastructure Summit.
Now in preview for cloud customers, the sixth-generation of the Trillium NPU powers many of Google Cloud’s most popular services, including Search and Maps.
“Through these advancements in AI infrastructure, Google Cloud empowers businesses and researchers to redefine the boundaries of AI innovation,” Mark Lohmeyer, VP and GM of Compute and AI Infrastructure at Google Cloud, wrote in a press release. “We are looking forward to the transformative new AI applications that will emerge from this powerful foundation.”
Trillium NPU speeds up generative AI processes
As large language models grow, so must the silicon to support them.
The sixth generation of the Trillium NPU delivers training, inference, and delivery of large language model applications at 91 exaflops in one TPU cluster. Google Cloud reports that the sixth-generation version offers a 4.7-times increase in peak compute performance per chip compared to the fifth generation. It doubles the High Bandwidth Memory capacity and the Interchip Interconnect bandwidth.
Trillium meets the high compute demands of large-scale diffusion models like Stable Diffusion XL. At its peak, Trillium infrastructure can link tens of thousands of chips, creating what Google Cloud describes as “a building-scale supercomputer.”
Enterprise customers have been asking for more cost-effective AI acceleration and increased inference performance, said Mohan Pichika, group product manager of AI infrastructure at Google Cloud, in an email to TechRepublic.
In the press release, Google Cloud customer Deniz Tuna, head of development at mobile app development company HubX, noted: “We used Trillium TPU for text-to-image creation with MaxDiffusion & FLUX.1 and the results are amazing! We were able to generate four images in 7 seconds — that’s a 35% improvement in response latency and ~45% reduction in cost/image against our current system!”
New Virtual Machines anticipate NVIDIA Blackwell chip delivery
In November, Google will add A3 Ultra VMs powered by NVIDIA H200 Tensor Core GPUs to their cloud services. The A3 Ultra VMs run AI or high-powered computing workloads on Google Cloud’s data center-wide network at 3.2 Tbps of GPU-to-GPU traffic. They also offer customers:
- Integration with NVIDIA ConnectX-7 hardware.
- 2x the GPU-to-GPU networking bandwidth compared to the previous benchmark, A3 Mega.
- Up to 2x higher LLM inferencing performance.
- Nearly double the memory capacity.
- 1.4x more memory bandwidth.
The new VMs will be available through Google Cloud or Google Kubernetes Engine.
SEE: Blackwell GPUs are sold out for the next year, Nvidia CEO Jensen Huang said at an investors’ meeting in October.
Additional Google Cloud infrastructure updates support the growing enterprise LLM industry
Naturally, Google Cloud’s infrastructure offerings interoperate. For example, the A3 Mega is supported by the Jupiter data center network, which will soon see its own AI-workload-focused enhancement.
With its new network adapter, Titanium’s host offload capability now adapts more effectively to the diverse demands of AI workloads. The Titanium ML network adapter uses NVIDIA ConnectX-7 hardware and Google Cloud’s data-center-wide 4-way rail-aligned network to deliver 3.2 Tbps of GPU-to-GPU traffic. The benefits of this combination flow up to Jupiter, Google Cloud’s optical circuit switching network fabric.
Another key element of Google Cloud’s AI infrastructure is the processing power required for AI training and inference. Bringing large numbers of AI accelerators together is Hypercompute Cluster, which contains A3 Ultra VMs. Hypercompute Cluster can be configured via an API call, leverages reference libraries like JAX or PyTorch, and supports open AI models like Gemma2 and Llama3 for benchmarking.
Google Cloud customers can access Hypercompute Cluster with A3 Ultra VMs and Titanium ML network adapters in November.
These products address enterprise customer requests for optimized GPU utilization and simplified access to high-performance AI Infrastructure, said Pichika.
“Hypercompute Cluster provides an easy-to-use solution for enterprises to leverage the power of AI Hypercomputer for large-scale AI training and inference,” he said by email.
Google Cloud is also preparing racks for NVIDIA’s upcoming Blackwell GB200 NVL72 GPUs, anticipated for adoption by hyperscalers in early 2025. Once available, these GPUs will connect to Google’s Axion-processor-based VM series, leveraging Google’s custom Arm processors.
Pichika declined to directly address whether the timing of Hypercompute Cluster or Titanium ML was connected to delays in the delivery of Blackwell GPUs: “We’re excited to continue our work together to bring customers the best of both technologies.”
Two more services, the Hyperdisk ML AI/ML focused block storage service and the Parallestore AI/HPC focused parallel file system, are now generally available.
Google Cloud services can be reached across numerous international regions.
Competitors to Google Cloud for AI hosting
Google Cloud competes primarily with Amazon Web Services and Microsoft Azure in cloud hosting of large language models. Alibaba, IBM, Oracle, VMware, and others offer similar stables of large language model resources, although not always at the same scale.
According to Statista, Google Cloud held 10% of the cloud infrastructure services market worldwide in Q1 2024. Amazon AWS held 34% and Microsoft Azure held 25%.
 
                                    