Introduction
Our Smart Computing Centre has powerful computing power and excellent stability, is highly scalable, supports multi-machine and multi-card, and manages heterogeneity, making it the leading Smart Computing Centre in the industry. The core computing power includes two major parts, the general HPC AI platform and the domestic AI platform, which encompasses:
The second phase of the general HPC AI high-performance computing platform, which will be officially commissioned in September 2023, has a theoretical computing power of 6.454 PFlops@FP64, with Intel CPUs of 14656 cores, AMD CPUs of 2560 cores, NVIDIA A800 GPU cards of 520 and NVIDIA A40 GPU cards of 120, and 2.3 million RAM and 2.5 million RAM. 120 NVIDIA A800 GPU cards and NVIDIA A40 GPU cards, 2.3PB of supporting memory, and 309TB of SSD and 3.9PB of HDD for data storage.
ARM AI platform in September 2023 to start the official trial operation use, AI total FP16 precision arithmetic is 19.04Pflops (trillion times per second), training cluster arithmetic Atlas 300T Pro 64 cards, inference cluster arithmetic Atlas 300V Pro 16 cards. Cluster high-performance network with RoCE 100Gb/s, high-speed Ethernet network with 25Gb/s, parallel file system storage capacity of 514TB.
The EDA HPC platform will be officially commissioned in July 2023. The theoretical arithmetic power of the HPC platform reaches FP64 precision arithmetic power of 0.267 Pflops (10 billion times per second), and the arithmetic power consists of Intel CPU 1504 cores, NVIDIA A30 GPU card 32 cards, and supporting memory of 34.8 PB. The cluster high-performance network adopts IB 200Gb/s, high-speed Ethernet network adopts 25Gb/s, and the parallel file system storage capacity is 514TB. The high-speed Ethernet network uses 25Gb/s, and the parallel file system storage capacity is 1.2PB.
The first phase of the HPC AI high-performance computing platform will be put into operation in April 2022. The total FP64 power of the first phase of the HPC is 0.246Pflops (10 billion times per second), the CPU adopts a dual-channel Intel Xeon Gold 6348 28 cores with a basic main frequency of 2.6GHz for a total of 36 computing nodes, and the GPU adopts an NVIDIA A30 for a total of 32 GPU cards. GPUs are NVIDIA A30 with 32 GPU cards. The cluster high-performance network adopts Mellanox HDR 100G InfiniBand, the high-speed Ethernet network adopts 10Gb/s, and the parallel file system BeeGFS storage capacity is 701GB.