NVIDIA today is now making it possible for businesses around the world to create and deploy large language models (LLM) by allowing them to create domain-specific chatbots, personal assistants, and other AI applications that comprehend language with new levels of depth and subtlety. NVIDIA has also unveiled NVIDIA Quantum-2, the next iteration of its InfiniBand networking infrastructure.
It’s the result of NVIDIA launching the following artificial intelligence technology solutions. The company has unveiled the NVIDIA NeMo Megatron framework for training language models with trillions of parameters; the Megatron 530B customizable LLM that can be trained for new domains and languages; and NVIDIA Triton Inference Server with multi-GPU, multinode distributed inference functionality.
Combined with NVIDIA DGX systems, these tools would offer a production-ready, enterprise-grade solution to simplify the development and deployment of large language models.
“Large language models have proven to be flexible and capable, able to answer deep domain questions, translate languages, comprehend and summarize documents, write stories and compute programs, all without specialized training or supervision,” said Bryan Catanzaro, vice president of Applied Deep Learning Research at NVIDIA. “Building large language models for new languages and domains is likely the largest supercomputing application yet, and now these capabilities are within reach for the world’s enterprises.”
NVIDIA NeMo Megatron
Megatron, an open-source project developed by NVIDIA researchers exploring effective training of massive transformer language models at scale, is the foundation for NVIDIA NeMo Megatron. Megatron 530B is one of the most customizable language models in the world.
Enterprises may use the NeMo Megatron framework to overcome the difficulties of training powerful natural language processing models. It is designed to expand over NVIDIA DGX SuperPOD’s large-scale accelerated computing infrastructure.
With data processing libraries that collect, curate, organize, and clean data, NeMo Megatron automates the complexity of LLM training. It allows the training of huge language models to be spread efficiently across thousands of GPUs using innovative technologies for data, tensor, and pipeline parallelization. The NeMo Megatron architecture may be used by businesses to train LLMs for their own subjects and languages.
– story continues below the photo –
NVIDIA Triton Inference Server
LLM inference applications may now expand over several GPUs and nodes with real-time speed with to new multi-GPU, multi-node features in the newest NVIDIA Triton Inference Server, which was also released last week. The models demand more memory than a single GPU or even a large server with numerous GPUs can provide, and inference must be performed fast in order to be relevant in applications.
Megatron 530B can operate on two NVIDIA DGX systems with Triton Inference Server, reducing processing time from over a minute on a CPU server to half a second, allowing LLMs to be deployed for real-time applications.
Availability
- Enterprises can experience developing and deploying large language models at no charge in curated labs with NVIDIA LaunchPad, also being announced last week.
- Organizations can apply to join the early access program for the NVIDIA NeMo Megatron accelerated framework for training large language models.
- NVIDIA Triton is available from the NVIDIA NGC catalog, a hub for GPU-optimized AI software that includes frameworks, toolkits, pretrained models and Jupyter Notebooks, and as open source code from the Triton GitHub repository.
- Triton is also included in the NVIDIA AI Enterprise software suite, which is optimized, certified and supported by NVIDIA. Enterprises can use the software suite to run language model inference on mainstream accelerated servers in on-prem data centers and private clouds.
- NVIDIA DGX SuperPOD and NVIDIA DGX systems are available from NVIDIA’s global resellers, which can provide pricing to qualified customers upon request.
The Launch of NVIDIA Quantum-2
NVIDIA has also introduced NVIDIA Quantum-2, the next iteration of its InfiniBand networking infrastructure, which provides cloud computing providers and supercomputing centers with exceptional performance, broad accessibility, and solid security.
According to the company itself, NVIDIA Quantum-2 is the most sophisticated end-to-end networking platform ever constructed, consisting of the NVIDIA Quantum-2 switch, the ConnectX-7 network adapter, the BlueField-3 data processing unit (DPU), and all the software that supports the new architecture.
NVIDIA Quantum-2 is being introduced at a time when supercomputing facilities are gradually opening up to a large number of customers, many of whom are from outside their enterprises. At the same time, the world’s cloud service providers are beginning to give their millions of clients more supercomputing capabilities.
NVIDIA Quantum-2 comes with key features that are necessary for demanding workloads in either arena. It would offer great performance with 400 gigabits per second throughput and excellent multi-tenancy to support multiple users, thanks to cloud-native technology.
“The requirements of today’s supercomputing centers and public clouds are converging,” said Gilad Shainer, senior vice president of Networking at NVIDIA. “They must provide the greatest performance possible for next-generation HPC, AI and data analytics challenges, while also securely isolating workloads and responding to varying demands of user traffic. This vision of the modern data center is now real with NVIDIA Quantum-2 InfiniBand.”
– story continues below the photo –
Quantum-2 InfiniBand Switch
The new Quantum-2 InfiniBand switch lies at the heart of the Quantum-2 platform. It has slightly more transistors than the NVIDIA A100 GPU, which has 54 billion transistors on 7-nanometer silicon.
It has 64 400Gbps ports and 128 200Gbps ports, and will be available in a variety of switch systems with up to 2,048 400Gbps ports and 4,096 200Gbps ports – more than 5x the switching capabilities of the previous version, Quantum-1.
The combined networking speed, switching capability and scalability would be ideal for building the next-generation of giant HPC systems.
Atos, DataDirect Networks (DDN), Dell Technologies, Excelero, GIGABYTE, HPE, IBM, Inspur, Lenovo, NEC, Penguin Computing, QCT, Supermicro, VAST Data, and WekaIO are among the top infrastructure and system manufacturers offering the NVIDIA Quantum-2 switch.
Quantum-2, ConnectX-7 and BlueField-3
The NVIDIA Quantum-2 platform offers the NVIDIA ConnectX-7 NIC and the NVIDIA BlueField-3 DPU InfiniBand as networking end-points.
ConnectX-7 twice the data throughput of the world’s current leading HPC networking chip, the NVIDIA ConnectX-6, with 8 billion transistors in a 7-nanometer architecture. It also improves RDMA, GPUDirect Storage, GPUDirect RDMA, and In-Networking Computing performance. In January, the ConnectX-7 will be sampled.
With 22 billion transistors in a 7-nanometer architecture, BlueField-3 InfiniBand provides sixteen 64-bit Arm CPUs to offload and isolate the data center infrastructure stack. In May, BlueField-3 samples were taken.