Network Innovation in the AI Era: A Comprehensive Upgrade from Communication Media to Data Center Clusters

robot
Abstract generation in progress

The Importance of Networks in the AI Era and Directions for Innovation

The arrival of the era of large models has made the network a key component in the field of AI. As the gap between model size and single-card computing power limits widens, multi-server clusters have become the main method for solving model training, which is also the foundation for the enhanced status of the network in the AI era. Compared to the past, when networks were mainly used for data transmission, today networks are more used for synchronizing model parameters between graphics cards, which poses higher requirements for network density and capacity.

The demand for the network mainly comes from three aspects:

  1. The increasingly large model size leads to increased training time, necessitating improvements in computational efficiency to shorten the duration. However, the performance improvement of a single device is limited, and overall computational power can only be enhanced by increasing the number of devices and improving parallel efficiency.

  2. Complex communication of multi-card synchronization. In large model training, alignment is required between single cards after each computation, which places higher demands on network transmission and exchange.

  3. The cost of failures is high. Training large models takes months, and interruptions can cause significant losses. A failure at any point in the network can lead to interruptions, so there are extremely high requirements for network stability.

In response to these demands, network innovation mainly focuses on the following directions:

  1. Communication medium iteration. The three major mediums of light, copper, and silicon each have their advantages and are competing in different scenarios. Optical modules are pursuing high speeds while also reducing costs through methods such as LPO and silicon photonics. Copper cables dominate in cabinet connections due to their cost-performance advantage. New technologies such as Chiplet and Wafer-scaling are exploring the limits of silicon-based interconnection.

  2. Competition of Network Protocols. The communication protocols within nodes are strongly tied to GPUs, such as NVLINK and Infinity Fabric. Between nodes, the main competition is between IB and Ethernet.

  3. Changes in Network Architecture. The current mainstream leaf-spine architecture shows limitations under ultra-large clusters, and new architectures such as Dragonfly and Rail-only are expected to become the evolutionary direction for the next generation of ultra-large clusters.

  4. Switch Innovation. Optical switches are gradually gaining attention due to their advantages of low latency and low power consumption. Electrical switches, on the other hand, continue to innovate at the chip level.

  5. Innovation in Data Center Clusters. As the capacity of individual data centers approaches its limits, achieving efficient interconnection between data centers has become a new research direction.

Overall, network innovation in the AI era is continuously evolving in three directions: cost reduction, openness, and scalability. As a complex systems engineering project, communication systems require continuous innovation at different stages. Investors should pay attention to core component suppliers while also tracking the industrial opportunities brought by new technologies.

ETH-0.42%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 3
  • Share
Comment
0/400
MemecoinTradervip
· 07-08 04:56
bullish signals in network infrastructure... time to load up on $DATA tokens fr fr
Reply0
GateUser-cff9c776vip
· 07-07 05:19
Is this really the cost reduction and efficiency improvement of the digital age, optimizing to the end and losing to the point?
View OriginalReply0
DaoTherapyvip
· 07-07 05:18
The new era has rolled into internet speed.
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)