AWS has released its Elastic Compute Cloud (Amazon EC2) P4d cloud instances, the next generation of GPU-powered instances. These would deliver “3x faster performance, up to 60% lower cost, and 2.5x more GPU memory” when compared to previous generation P3 instances. It would fit requirements for machine learning training and high-performance computing (HPC) workloads.
P4d instances feature eight NVIDIA A100 Tensor Core GPUs and 400 Gbps of network bandwidth (16x more than P3 instances). Using P4d cloud instances with AWS’s Elastic Fabric Adapter (EFA) and NVIDIA GPUDirect RDMA (remote direct memory access), users would be able to create P4d instances with EC2 UltraClusters capability.
With EC2 UltraClusters, users can scale P4d cloud instances to over 4,000 A100 GPUs by making use of AWS-designed non-blocking petabit-scale networking infrastructure integrated with Amazon FSx for Lustre high performance storage. It would offer on-demand access to supercomputing-class performance to accelerate machine learning training and HPC.
“The pace at which our customers have used AWS services to build, train, and deploy machine learning applications has been extraordinary. At the same time, we have heard from those customers that they want an even lower cost way to train their massive machine learning models,” said Dave Brown, Vice President, EC2, AWS. “Now, with EC2 UltraClusters of P4d cloud instances powered by NVIDIA’s latest A100 GPUs and petabit-scale networking, we’re making supercomputing-class performance available to virtually everyone, while reducing the time to train machine learning models by 3x, and lowering the cost to train by up to 60% compared to previous generation cloud instances.”
Users can run containerized applications on P4d instances with AWS Deep Learning Containers with libraries for Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Container Service (Amazon ECS). For a more fully managed experience, customers can use P4d instances via Amazon SageMaker, providing developers and data scientists with the ability to build, train, and deploy machine learning models quickly. HPC customers can leverage AWS Batch and AWS ParallelCluster with P4d instances to help orchestrate jobs and clusters efficiently.
P4d cloud instances support all major machine learning frameworks, including TensorFlow, PyTorch, and Apache MXNet, giving customers the flexibility to choose the framework that works best for their applications. P4d instances are available in US East (N. Virginia) and US West (Oregon), with availability planned for additional regions soon.