Enfabrica Launches Compute Fabric Devices for AI in the Cloud

semiconductor

Enfabrica Corporation, a networking chip startup, has emerged from stealth mode to announce a new class of chips called Accelerated Compute Fabric (ACF) devices. These devices are designed to tackle scalability and performance issues associated with accelerated computing and Artificial Intelligence (AI) workloads.

The company aims to meet the rapidly evolving demands of AI-driven services in the cloud. With Enfabrica’s ACF solution, enterprises can enjoy the highest scalability, performance, and total cost of ownership (TCO) for distributed AI, machine learning, extended reality, high-performance computing, and in-memory database architecture, according to the company.

Enfabrica was established in 2020 by a team of Silicon Valley veterans who previously worked at industry-leading companies such as Broadcom, Google, Cisco, AWS, and Intel. Sutter Hill Ventures sponsored the startup, which was founded by Rochan Sankar, Shrijeet Mukherjee, and other engineers. Enfabrica’s ACF solution was designed and built to address scalability issues associated with accelerated computing.

The ACF devices would offer scalable, streaming, multi-Terabit-per-second data flow across GPUs, CPUs, accelerators, memory, and networking devices. These devices use entirely standards-based hardware and software interfaces, which optimize interface bottlenecks in today’s Top-of-Rack network switches, server NICs, PCIe switches, and CPU-controlled DRAM by collapsing latency tiers. The devices would also enable scalable AI fabrics of memory, computation, and network resources, ranging from a single system to tens of thousands of nodes.

Scalable High-Bandwidth Data Flow

Bob Wheeler, principal analyst at Wheeler’s Network
“Scaling memory bandwidth and capacity is a critical need for accelerated computing in the cloud,” said Bob Wheeler, principal analyst at Wheeler’s Network.

The ACF devices would provide the highest performance boost for large language model (LLM) inferencing and deep learning recommendation model (DLRM) inferencing. Enfabrica’s flagship ACF switch silicon can help customers reduce their GPU compute costs by around 50% and 75% for LLM inferencing and DLRM inferencing, respectively, at the same performance point.

Enfabrica claims its ACF solution to be the first data center silicon product in the industry to incorporate ComputeExpressLink (CXL) memory bridging capabilities. This technology would allow a single GPU rack to have direct, low-latency, uncontended access to local CXL.mem DDR5 DRAM with “more than 50 times” greater memory capacity than GPU-native High-Bandwidth Memory.

Enfabrica’s ACF-S devices, the first semiconductor from the company, would provide scalable, composable, high-bandwidth data flow between any combination of GPU, CPU, accelerator ASIC, memory, flash storage, and networking parts. The devices reduce the number of devices, I/O latency hops, and device power in AI clusters consumed by top-of-rack network switches, RDMA-over-Ethernet NICs, Infiniband HCAs, PCIe/CXL switches, and CPU-attached DRAM.

Enfabrica’s ACF-S would lower the compute cost for large AI recommendation engines, while its memory tiering would reduce the necessary number of GPUs and CPUs “by 75%” on a typical hyperscale DLRM inference load, providing significant TCO and power benefits.

“Scaling memory bandwidth and capacity is a critical need for accelerated computing in the cloud,” said Bob Wheeler, principal analyst at Wheeler’s Network. “In this light, we see CXL and RDMA as complementary technologies, with hyperscalers having already deployed high-bandwidth RDMA networks for GPUs. Enfabrica’s unique blending of CXL switching and RDMA networking functions in a single Accelerated Compute Fabric device promises a disruptive way to build scalable memory hierarchies for AI, and importantly the solution doesn’t have to rely on advanced CXL 3.x capabilities that are years away from being implemented or proven at scale.”