Tomahawk 5 Now Available from Broadcom to Speed up AI/ML Workloads

Broadcom

In a single, monolithic device, the StrataXGS Tomahawk 5 switch series from Broadcom (NASDAQ:AVGO) would offer 51.2 Terabits/sec of Ethernet switching capacity. According to the networking solutions vendor, it is twice as much bandwidth as any other switch silicon now on the market.

Broadcom is currently providing the Tomahawk 5 family to support the development of the following generation of unified networks. With capabilities like single-pass VxLAN routing and bridging, Tomahawk 5 offers AI/ML workload virtualization, which can be essential to enabling effective use of the massively shared infrastructure in large data centers. Tomahawk 5 includes capabilities including Broadcom Cognitive Routing, enhanced shared packet buffering, configurable in-band telemetry, and hardware-based link failover, all of which would be crucial to lowering job completion time (JCT) for AI/ML workloads.

The Broadcom Tomahawk 5, BCM78900, is currently available to users all around the world.

“Delivering the world’s first 51.2 Tbps switch two years after we released Tomahawk 4, the industry’s first 25 Tbps switch, is a testament to the outstanding execution and innovation by the Broadcom team,” said Ram Velaga, Senior Vice President and General Manager, Core Switching Group, Broadcom. “Since the introduction of Tomahawk 1 in 2014, Broadcom has consistently executed on doubling the bandwidth approximately every two years. With today’s introduction of the fifth generation Tomahawk family, we are proud to say that a single Tomahawk 5 replaces forty-eight Tomahawk 1 switches in the network, resulting in over 95 percent reduction in power requirements. We applaud our customers, partners, and engineers for making this possible.”

Network Traffic

The Cognitive Routing feature of Tomahawk 5 would help increase network link usage by automatically and dynamically choosing the least-used links in the system for each flow that passes through the switch. This can be crucial for AI/ML workloads, which frequently combine long-lived, low-entropy elephant flows with short-lived, low-bandwidth mice flows. Real-time dynamic load balancing is a feature of Tomahawk 5 that tracks how all links are being used, both upstream and downstream in the network, to choose the best route for each flow. Additionally, it would keep an eye on the hardware links’ condition and automatically diverts traffic from unhealthy links. These features would significantly increase network utilization, decrease congestion, and shorten JCT.

Ram Velaga, SVP and GM, Core Switching Group, Broadcom
“Delivering the world’s first 51.2 Tbps switch two years after we released Tomahawk 4, the industry’s first 25 Tbps switch, is a testament to the outstanding execution and innovation by the Broadcom team,” said Ram Velaga, SVP and GM, Core Switching Group, Broadcom.

Minimizing network congestion by regulating the rate at which traffic is introduced into the network by each source is essential for enhancing JCT. Tomahawk 5 would offer significant configurable in-band telemetry on both real traffic and network probes because network operators use a variety of various congestion control strategies at their endpoints (such as merchant or custom NICs).

In order to gather telemetry on queue size, packet latency, switch utilization, and a range of other customer-selectable parameters, real-time information can be introduced into traffic at line rate as it travels over the network. For exact end-to-end network congestion control, this metadata can be employed.

Broadcom Tomahawk 5 would offer a direct 100G PAM4 interface to direct attach copper (DAC), front panel pluggable optics, and co-packaged optics to provide the lowest power and lowest cost for physical connectivity. Without the use of re-timers or other active components, the flexible, long-reach Tomahawk 5 SerDes would enable DAC connectivity to all devices within a rack and even between racks. It can also connect directly to a large ecosystem of front-panel pluggable optical modules that are industry standards.

“Broadcom’s impressive achievement of doubling the bandwidth for data center platforms on a single chip with Tomahawk 5 allows Juniper Networks to keep providing an extensive routing and switching portfolio with unmatched power efficiency,” said Michael Bushong, Group Vice President, Juniper Networks. “Combining best-of-breed silicon technology with Junos OS, the industry’s most advanced network operating system, addresses the leading-edge requirements of our customers.”

Silicon Photonics

Additionally, Tomahawk 5 will be made available with co-packaged optics using Broadcom’s Silicon Photonics Chiplets in Package (SCIP) platform, utilizing the company’s silicon photonics and packaging technologies. This would result in a more than 50 percent reduction in the power required for optical connectivity. Customers can select the best I/O for each component of their intra-cluster, inter-cluster, and inter-DC networks without the need for software porting because the same switch hardware offers all of these options.

To sum up, StrataXGS Tomahawk 5 Series key features for AI/ML would include:

  • 256 ports supported on a single chip, the greatest radix of 200GbE ports in the world, enabling flat, low latency AI/ML clusters
  • The most cutting-edge shared-buffer architecture in the market, offering the best performance and lowest tail latency for RoCEv2 and other modern RDMA protocols. 51.2 Tbps
  • To manage the huge, low entropy flows typical of AI/ML workloads, advanced Broadcom Cognitive Routing, dynamic load balancing, and support for end-to-end congestion control are specifically built features
  • Support for topologies like torus, Dragonfly, Dragonfly+, and Megafly that are both Clos and non-Clos
  • Hardware-based link failover for lowered JCT and increased network resilience

Key benefits of StrataXGS Tomahawk 5 Series would include:

  • Enables 64 ports of 800GbE switching and routing to support the next generation of unified data center infrastructure
  • General computing and AI/ML workload virtualization using single-pass VxLAN routing and bridging
  • Physical I/O choices using 512 instances of the 100G PAM4 SerDes with “the best performance, greatest adaptability, and longest range available in the market”
  • Time synchronization using SyncE and PTP with high precision
  • Six on-chip ARM processors for high-bandwidth, fully programmable streaming telemetry, and sophisticated embedded applications such as on-chip statistics summarization
  • Unmatched power efficiency, implemented as a monolithic 5nm die

“Microsoft Azure offers best-in-class infrastructure for HPC and AI/ML workloads of all sizes. Broadcom’s Tomahawk 5 provides important benefits for next generation HPC and AI/ML requirements,” said Steve Scott, Technical Fellow and Corporate Vice President, Microsoft Azure Hardware Architecture. “Tomahawk 5 advances the scale, telemetry, and advanced features needed to support HPC and AI/ML network requirements, within the framework of the open and innovative Ethernet ecosystem.”