Close Menu
Nabka News
  • Home
  • News
  • Business
  • China
  • India
  • Pakistan
  • Political
  • Tech
  • Trend
  • USA
  • Sports

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Fact check: Andy Pycroft’s X account bashing Pakistani cricketers is fake – Sport

September 16, 2025

Trump administration orders Delta, Aeromexico to unwind joint venture by Jan. 1

September 16, 2025

Major Xinjiang port handles over 7,000 China-Europe freight train trips this year-Xinhua

September 16, 2025
Facebook X (Twitter) Instagram
  • Home
  • About NabkaNews
  • Advertise with NabkaNews
  • DMCA Policy
  • Privacy Policy
  • Terms of Use
  • Contact us
Facebook X (Twitter) Instagram Pinterest Vimeo
Nabka News
  • Home
  • News
  • Business
  • China
  • India
  • Pakistan
  • Political
  • Tech
  • Trend
  • USA
  • Sports
Nabka News
Home » Alibaba Cloud ditches Nvidia interconnects in favor of Ethernet — the tech giant will use its own high-performance network to connect 15,000 GPUs in its data centers
Tech

Alibaba Cloud ditches Nvidia interconnects in favor of Ethernet — the tech giant will use its own high-performance network to connect 15,000 GPUs in its data centers

i2wtcBy i2wtcJune 29, 2024No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email WhatsApp Copy Link
Follow Us
Google News Flipboard Threads
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link


Ennan Zhai, an engineer and researcher at Alibaba Cloud, shared a research paper on GitHub, revealing the cloud provider’s design of its data center for LLM training. Titled “Alibaba HPN: Data Center Network for Large-Scale Language Model Training,” the PDF document outlines how Alibaba will use Ethernet to enable 15,000 GPUs to communicate with each other.

Typical cloud computing generates consistent but small data flows at speeds below 10 Gbps. LLM training, on the other hand, generates periodic data bursts that can reach up to 400 Gbps. According to the paper, “This characteristic of LLM training makes Equal-Cost Multi-Path (ECMP), a load balancing scheme commonly used in traditional data centers, prone to hash bias, resulting in uneven distribution of traffic and other issues.”

To get around this, Zhai and his team developed the High Performance Network (HPN), which uses a “two-tier, dual-plane architecture” that reduces the number of ECMP occurrences while allowing the system to “precisely select network paths that can preserve elephant flows.” The HPN also uses dual top-of-rack (ToR) switches to back up each other. These switches are the most common single point of failure in LLM training, and require GPUs to synchronize to complete iterations.

8 GPUs per host, 1,875 hosts per datacenter

Alibaba Cloud divided its data centers into hosts with eight GPUs per host. Each GPU has a network interface card (NIC) with two ports, and each GPU-NIC system is called a “rail.” The host also has an additional NIC to connect to the back-end network. Each rail is connected to two different ToR switches to ensure that the failure of one switch does not affect the entire host.

Even though Alibaba Cloud has dropped NVlink for host-to-host communication, it still uses Nvidia’s proprietary technology for intra-host networking, as communication between GPUs within a host requires more bandwidth, but rail-to-rail communication is much slower, so the “dedicated 400Gbps RDMA network throughput, for a total of 3.2Tbps of bandwidth” per host is more than enough to maximize the bandwidth of PCIe Gen5x16 graphics cards.

Alibaba Cloud also uses 51.2 Tb/s Ethernet single-chip ToR switches because multi-chip solutions are more prone to instability and have a four times higher failure rate than single-chip switches. However, these switches run so hot that heat sinks readily available on the market are not enough to prevent them from shutting down due to overheating. So the company came up with a novel solution: creating a vapor chamber heat sink with an extra column in the middle to carry thermal energy more efficiently.

Ennan Zhai and his team will be presenting their findings at the SIGCOMM (Special Interest Group on Data Communications) conference in Sydney, Australia this August. Many companies, including AMD, Intel, Google, and Microsoft, will be interested in this project, mainly because they are uniting to create Ultra Accelerator Link, an open standard interconnect set to rival NVlink. Alibaba Cloud has been using HPN for over eight months, so the technology is already tried and tested, which is of particular interest to them.

Get the best Tom’s Hardware news and in-depth reviews straight to your inbox.

However, HPN still has some drawbacks, the biggest of which is the complex cabling structure. With nine NICs on each host, and each NIC connected to two different ToR switches, it is highly likely to make a mistake about which jack is connected to which port. That said, the technology is probably cheaper than NVlink, so any institution setting up a data center can save a lot on setup costs (and also avoid Nvidia technology, especially if they are one of the companies sanctioned by the US in the ongoing chip war with China).



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email WhatsApp Copy Link
i2wtc
  • Website

Related Posts

Tech

AI-powered smart glasses take center stage

September 16, 2025
Tech

Trump’s willingness to let TikTok go dark motivated China to make deal, says Bessent

September 16, 2025
Tech

Tesla’s stock erases loss for the year, up over 80% from April low

September 15, 2025
Tech

Alphabet becomes fourth company to reach $3 trillion market cap

September 15, 2025
Tech

Nvidia violated anti-monopoly law, will continue investigation

September 15, 2025
Tech

5 fintechs that could IPO after Klarna

September 15, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Fact check: Andy Pycroft’s X account bashing Pakistani cricketers is fake – Sport

September 16, 2025

House Republicans unveil aid bill for Israel, Ukraine ahead of weekend House vote

April 17, 2024

Prime Minister Johnson presses forward with Ukraine aid bill despite pressure from hardliners

April 17, 2024

Justin Verlander makes season debut against Nationals

April 17, 2024
Don't Miss

Trump says China’s Xi ‘hard to make a deal with’ amid trade dispute | Donald Trump News

By i2wtcJune 4, 20250

Growing strains in US-China relations over implementation of agreement to roll back tariffs and trade…

Donald Trump’s 50% steel and aluminium tariffs take effect | Business and Economy News

June 4, 2025

The Take: Why is Trump cracking down on Chinese students? | Education News

June 4, 2025

Chinese couple charged with smuggling toxic fungus into US | Science and Technology News

June 4, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to NabkaNews, your go-to source for the latest updates and insights on technology, business, and news from around the world, with a focus on the USA, Pakistan, and India.

At NabkaNews, we understand the importance of staying informed in today’s fast-paced world. Our mission is to provide you with accurate, relevant, and engaging content that keeps you up-to-date with the latest developments in technology, business trends, and news events.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Fact check: Andy Pycroft’s X account bashing Pakistani cricketers is fake – Sport

September 16, 2025

Trump administration orders Delta, Aeromexico to unwind joint venture by Jan. 1

September 16, 2025

Major Xinjiang port handles over 7,000 China-Europe freight train trips this year-Xinhua

September 16, 2025
Most Popular

China claims Britain’s MI6 framed two Chinese government officials as spies

June 3, 2024

Is China planning to invade Taiwan? Experts give their opinions

June 4, 2024

‘Like a son’: Former Biden senior adviser with deep business ties to China spotted at Hunter Biden gun trial

June 5, 2024
© 2025 nabkanews. Designed by nabkanews.
  • Home
  • About NabkaNews
  • Advertise with NabkaNews
  • DMCA Policy
  • Privacy Policy
  • Terms of Use
  • Contact us

Type above and press Enter to search. Press Esc to cancel.