Close Menu
Nabka News
  • Home
  • News
  • Business
  • China
  • India
  • Pakistan
  • Political
  • Tech
  • Trend
  • USA
  • Sports

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Xinjiang’s Horgos becomes China’s largest land port for vehicle exports-Xinhua

October 25, 2025

Interior ministry notifies ban on TLP after federal cabinet approval

October 25, 2025

More demand than supply gives companies an edge, Jim Cramer says

October 24, 2025
Facebook X (Twitter) Instagram
  • Home
  • About NabkaNews
  • Advertise with NabkaNews
  • DMCA Policy
  • Privacy Policy
  • Terms of Use
  • Contact us
Facebook X (Twitter) Instagram Pinterest Vimeo
Nabka News
  • Home
  • News
  • Business
  • China
  • India
  • Pakistan
  • Political
  • Tech
  • Trend
  • USA
  • Sports
Nabka News
Home » Alibaba Cloud ditches Nvidia interconnects in favor of Ethernet — the tech giant will use its own high-performance network to connect 15,000 GPUs in its data centers
Tech

Alibaba Cloud ditches Nvidia interconnects in favor of Ethernet — the tech giant will use its own high-performance network to connect 15,000 GPUs in its data centers

i2wtcBy i2wtcJune 29, 2024No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email WhatsApp Copy Link
Follow Us
Google News Flipboard Threads
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link


Ennan Zhai, an engineer and researcher at Alibaba Cloud, shared a research paper on GitHub, revealing the cloud provider’s design of its data center for LLM training. Titled “Alibaba HPN: Data Center Network for Large-Scale Language Model Training,” the PDF document outlines how Alibaba will use Ethernet to enable 15,000 GPUs to communicate with each other.

Typical cloud computing generates consistent but small data flows at speeds below 10 Gbps. LLM training, on the other hand, generates periodic data bursts that can reach up to 400 Gbps. According to the paper, “This characteristic of LLM training makes Equal-Cost Multi-Path (ECMP), a load balancing scheme commonly used in traditional data centers, prone to hash bias, resulting in uneven distribution of traffic and other issues.”

To get around this, Zhai and his team developed the High Performance Network (HPN), which uses a “two-tier, dual-plane architecture” that reduces the number of ECMP occurrences while allowing the system to “precisely select network paths that can preserve elephant flows.” The HPN also uses dual top-of-rack (ToR) switches to back up each other. These switches are the most common single point of failure in LLM training, and require GPUs to synchronize to complete iterations.

8 GPUs per host, 1,875 hosts per datacenter

Alibaba Cloud divided its data centers into hosts with eight GPUs per host. Each GPU has a network interface card (NIC) with two ports, and each GPU-NIC system is called a “rail.” The host also has an additional NIC to connect to the back-end network. Each rail is connected to two different ToR switches to ensure that the failure of one switch does not affect the entire host.

Even though Alibaba Cloud has dropped NVlink for host-to-host communication, it still uses Nvidia’s proprietary technology for intra-host networking, as communication between GPUs within a host requires more bandwidth, but rail-to-rail communication is much slower, so the “dedicated 400Gbps RDMA network throughput, for a total of 3.2Tbps of bandwidth” per host is more than enough to maximize the bandwidth of PCIe Gen5x16 graphics cards.

Alibaba Cloud also uses 51.2 Tb/s Ethernet single-chip ToR switches because multi-chip solutions are more prone to instability and have a four times higher failure rate than single-chip switches. However, these switches run so hot that heat sinks readily available on the market are not enough to prevent them from shutting down due to overheating. So the company came up with a novel solution: creating a vapor chamber heat sink with an extra column in the middle to carry thermal energy more efficiently.

Ennan Zhai and his team will be presenting their findings at the SIGCOMM (Special Interest Group on Data Communications) conference in Sydney, Australia this August. Many companies, including AMD, Intel, Google, and Microsoft, will be interested in this project, mainly because they are uniting to create Ultra Accelerator Link, an open standard interconnect set to rival NVlink. Alibaba Cloud has been using HPN for over eight months, so the technology is already tried and tested, which is of particular interest to them.

Get the best Tom’s Hardware news and in-depth reviews straight to your inbox.

However, HPN still has some drawbacks, the biggest of which is the complex cabling structure. With nine NICs on each host, and each NIC connected to two different ToR switches, it is highly likely to make a mistake about which jack is connected to which port. That said, the technology is probably cheaper than NVlink, so any institution setting up a data center can save a lot on setup costs (and also avoid Nvidia technology, especially if they are one of the companies sanctioned by the US in the ongoing chip war with China).



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email WhatsApp Copy Link
i2wtc
  • Website

Related Posts

Tech

More demand than supply gives companies an edge, Jim Cramer says

October 24, 2025
Tech

Cramer on 10 stocks reporting earnings next week; calls two buys

October 24, 2025
Tech

3 takeaways from Intel’s third quarter earnings report

October 24, 2025
Tech

Meta faces a new threat from OpenAI’s new viral Sora 2 video app

October 24, 2025
Tech

Jim Cramer’s top 10 things to watch in the stock market Friday

October 24, 2025
Tech

Elon Musk calls ISS ‘corporate terrorists’ for rejecting his pay package

October 24, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

House Republicans unveil aid bill for Israel, Ukraine ahead of weekend House vote

April 17, 2024

Prime Minister Johnson presses forward with Ukraine aid bill despite pressure from hardliners

April 17, 2024

Justin Verlander makes season debut against Nationals

April 17, 2024

Tesla lays off 285 employees in Buffalo, New York as part of major restructuring

April 17, 2024
Don't Miss

Trump says China’s Xi ‘hard to make a deal with’ amid trade dispute | Donald Trump News

By i2wtcJune 4, 20250

Growing strains in US-China relations over implementation of agreement to roll back tariffs and trade…

Donald Trump’s 50% steel and aluminium tariffs take effect | Business and Economy News

June 4, 2025

The Take: Why is Trump cracking down on Chinese students? | Education News

June 4, 2025

Chinese couple charged with smuggling toxic fungus into US | Science and Technology News

June 4, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to NabkaNews, your go-to source for the latest updates and insights on technology, business, and news from around the world, with a focus on the USA, Pakistan, and India.

At NabkaNews, we understand the importance of staying informed in today’s fast-paced world. Our mission is to provide you with accurate, relevant, and engaging content that keeps you up-to-date with the latest developments in technology, business trends, and news events.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Xinjiang’s Horgos becomes China’s largest land port for vehicle exports-Xinhua

October 25, 2025

Interior ministry notifies ban on TLP after federal cabinet approval

October 25, 2025

More demand than supply gives companies an edge, Jim Cramer says

October 24, 2025
Most Popular

Mainland-Hong Kong water supply project pumps lifeblood into once-parched metropolis-Xinhua

March 26, 2025

Grand ceremony held to worship legendary ancestor Huangdi in China’s Henan-Xinhua

April 1, 2025

China’s first homegrown large cruise ship debuts in Qingdao-Xinhua

April 6, 2025
© 2025 nabkanews. Designed by nabkanews.
  • Home
  • About NabkaNews
  • Advertise with NabkaNews
  • DMCA Policy
  • Privacy Policy
  • Terms of Use
  • Contact us

Type above and press Enter to search. Press Esc to cancel.