Introduction
The HPC AI Converged Intelligent Computing Center at our school boasts powerful computational capabilities and excellent stability, with high scalability to support multi-machine and multi-card configurations and heterogeneous management, making it a leading intelligent computing center in the industry. The center currently comprises four intelligent computing clusters: Phase I, Phase II, Phase III (ACD), and EDA.
HPC Phase I Intelligent Computing Cluster began official operations in April 2022, with a high-performance computing power of 0.246 Pflops@FP64 and intelligent computing power of 5.280 Pflops@FP16. The cluster includes 12 Intel CPU compute nodes and 4 NVIDIA A30 GPU compute nodes. It features a high-performance network using 100Gb/s InfiniBand and a parallel file system storage capacity of 701GB.
HPC Phase II Intelligent Computing Cluster started official operations in September 2023, comprising two major parts: the HPC AI platform and the domestic AI platform. The HPC AI platform offers a high-performance computing power of 6.454 Pflops@FP64 and intelligent computing power of 180.204 Pflops@FP16. It includes 146 Intel CPU compute nodes, 20 AMD CPU compute nodes, 65 NVIDIA A800 GPU compute nodes, and 15 NVIDIA A40 GPU compute nodes. The domestic AI platform provides an intelligent computing power of 19.040 Pflops@FP16, including 8 Atlas 300T Pro training nodes and 2 Atlas 300V Pro inference nodes. The high-performance network of the Phase II cluster uses 200Gb/s InfiniBand with an accompanying storage capacity of 2.3PB, containing 309TB SSD and 3.9PB HDD.
HPC Phase III (ACD) Intelligent Computing Cluster is planned to be operational by 2025. The cluster will offer an intelligent computing power of 1051.344 Pflops@FP16, consisting of 68 GPU ACD (Advanced Computing Devices) compute nodes. The high-performance network will use 400Gb/s RoCE V2, with a storage capacity of 17PB.
HPC EDA Intelligent Computing Cluster began official operations in July 2023, with a high-performance computing power of 0.267 Pflops@FP64 and intelligent computing power of 5.280 Pflops@FP16. The cluster includes 20 Intel CPU compute nodes and 4 NVIDIA A30 GPU compute nodes. It features a high-performance network using 200Gb/s InfiniBand and a parallel file system storage capacity of 1.2PB.