One of Nvidia’s advantages in the data center space is that it not only offers cutting-edge GPUs for AI and HPC computing, but also uses its own hardware and software to effectively scale the number of processors across a data center. How can you beat Nvidia if your GPUs are slow and your software stack is not as pervasive as Nvidia’s CUDA? Then scale your own scale-out capabilities. That’s exactly what Chinese GPU maker Moores Threads has done, according to a report in Science China Morning Post.
Moore Threads has upgraded its KUAE data center servers for AI, enabling the connection of up to 10,000 GPUs in a single cluster. The KUAE data center servers integrate eight MTT S4000 GPUs interconnected using the company’s proprietary MTLink technology, specifically designed for training and running large language models (LLMs). These GPUs are based on the MUSA architecture and feature 128 tensor cores and 48 GB GDDR6 memory with 768 GB/s bandwidth. A 10,000-GPU cluster would use 1,280,000 tensor cores, although the actual performance is unknown as performance scaling depends on a variety of factors.
The move highlights MooreThreads’ efforts to power AI capabilities in data centers, despite its presence on the U.S. Department of Commerce Entity List. MooreThreads’ products, of course, lag behind Nvidia’s GPUs in terms of performance. Even Nvidia’s A100 80GB GPU, introduced in 2020, offers much better compute performance than the MTT S4000 (624/1248 INT8 TOPS vs. 200 INT8 TOPS). But some argue the MTT S4000 is competitive against unnamed Nvidia GPUs.
Founded in 2020 by a former Nvidia China executive, Moore Threads has been blacklisted by the Biden administration and therefore doesn’t have access to cutting-edge process technology due to US export restrictions. But the company is developing new GPUs for gaming (though these graphics cards aren’t on our list of the best graphics cards) and making headway in AI despite major obstacles.
To date, MooreThreads has formed strategic partnerships with major state-owned telecommunications operators such as China Mobile and China Unicom, as well as China Energy Engineering Corporation and Gulin Huajue Big Data Technology, aiming to develop three new computing cluster projects and further improve China’s AI capabilities.
MooreThreads recently completed a funding round, raising 2.5 billion yuan (approximately US$343.7 million). This influx of capital is expected to support the company’s ambitious expansion plans and technological advancements. However, without access to advanced process technologies offered by TSMC, Intel Foundry, and Samsung Foundry, the company faces numerous challenges in developing its next-generation GPUs.