AI chip transmission to CPO positioning battle: What are NVIDIA and Broadcom competing? (Down)

After understanding the three expansion structures of Light Communication and Data Center in the previous article, we can understand more clearly that the market is not only focusing on the direction of its rival AMD, but also focusing on the compet...


After understanding the three expansion structures of Light Communication and Data Center in the previous article, we can understand more clearly that the market is not only focusing on the direction of its rival AMD, but also focusing on the competition between AI chip dragon NVIDIA and global communication chip giant Broadcom.

In fact, in addition to the competition between chips, the competition in AI industry is also expanding to the competition in system solutions.

The first intersection between Broadcom and NVIDIA is "customized AI chips" (ASIC). Because NVIDIA GPUs are expensive, cloud service providers (CSPs) including Google, Meta, Amazon, and Microsoft are all developing their own AI chips, and Broadcom's ASIC capabilities have become the primary partner of these companies.

In addition to the competition of self-developed chips, another more key technology is "network connection technology", which is also the second intersection between Broadcom and NVIDIA.

First of all, under the protection of the two major cities of NVLink and CUDA, Broadcom has been working for a long time, and finally launched the latest online exchange chip Tomahawk Ultra this year, which has the opportunity to enter the Scale-Up market and aim to challenge the leading position of NVIDIA NVLink.

Tomahawk Ultra is part of the Scale-Up Ethernet (simplified SUE) project that Broadcom has been promoting, and this product is also considered an alternative to NVSwitch. Broadcom said that the Tomahawk Ultra can connect in series at a time is four times that of the NVLink Switch and will be handed over to the NTD 5-nanometer process.

It is worth noting that although Broadcom is one of the UALink alliances, he also actively promotes the SUE architecture based on Ethernet, so the market also pays great attention to the competitive relationship between Broadcom and UALink and how to jointly respond to the NVLink enemy.

In order to win the promotion of Broadcom, NVIDIA also launched NVFusion solutions this year, opening up partners such as United Development, Marvell, Astera Labs, etc. to jointly research, and create customized AI chips through the NVLink ecosystem. The outside world believes that this is a semi-open cooperation to consolidate the ecosystem, and it also provides more partners with some customized spaces and opportunities.

The

Scale-Out part is mainly caused by Broadcom, which has been deeply rooted in Ether. The latest products recently released include Tomahawk 6, Jericho4, attacking Scale Out and more remote commercial machines.

NVIDIA has launched many Quantum InfiniBand exchanger products and Spectrum Ether Internet Exchange Platform to strengthen more Scale-Out products. Although InfiniBand belongs to an open structure, the product ecosystem is mainly destroyed by Mellanox purchased by NVIDIA, limiting the flexibility of customers.

▲ According to Broadcom photos, the three products each span two different server expansion structures. (Source: Technology News)

Scale-Across, which is expanding across data centers at longer distances, is currently uncertain who will Broadcom or NVIDIA will take the lead. However, NVIDIA has pioneered the Spectrum-XGS for this concept. This solution uses a new network algorithm to effectively move data farther away from site sites, and can also be used as a supplementary solution for existing Scale-Up and Scale-Out architectures.

As for Broadcom's Jericho4, it also conforms to the concept of Scale-Across. Broadcom notes that the Tomahawk series chips can connect to a single machine center with a connection distance of no more than one kilometer (about 0.6 miles), while the Jericho4 equipment can handle inter-room connections over 100 kilometers, maintaining unstoppable RoCE transmission, and its data processing capabilities are about four times that of previous generations.

So what are the solutions of NVIDIA and Broadcom’s CPO?

As the online transmission war continues, I believe that the competition on the optical network will become more intense. NVIDIA and Broadcom both find new solutions to CPO's Optoelectronics, and NTU and Grofangde are also actively developing CPO's processes and solutions.

NVIDIA's strategy is to take the system architecture as the development point and regard optical interconnections as part of the SoC rather than an external module. This year, GTC officially released the Quantum-X Photonics InfiniBand exchanger and Spectrum-X Photonics Ethernet exchanger. The former will be launched at the end of the year and the latter will be in 2026.

Both platforms use the tyre-electroelectric COUPE platform, and integrate 65-nanometer photon integral circuits (PICs) with electronic integral circuits (EICs) through SoIC-X packaging technology. The development point of this strategy is to strengthen the integration of its own platform, strengthen overall efficiency and scale expansion.

Broadcom's strategy focuses on providing comprehensive solutions, focusing on the scale-based operation of supply chains, and providing complete modular solutions for third-party customers to help customers implement applications. Broadcom also said that the reason why the company's success in the CPO field is based on its deep integration capabilities of semiconductor and optical technology.

Broadcom is currently launching the third-generation 200G / lane CPO market. Broadcom also said that its CPO products use a 3D chip stacking structure, the PIC also uses 65 nanometers, and the EIC uses a 7-nanometer process.

From the figure below, we can see that the optical transport module is composed of the following key components, such as a laser light source (Laser Diode), a modulator, a photo sensor (Photo Detector), etc. Among them, the laser light source is responsible for generating optical signals, and the optical modulator is responsible for converting the signal/digital signal into optical signals. Because it involves electro-optical conversion, it can also be said to determine the transmission speed of a single channel.

On the key optical modulator, NVIDIA selects MRM (micro-Ring Modulator). Because the MRM is smaller in size, it is prone to errors and temperature impacts, and will also be one of the challenges leading to MRM.

As for Broadcom, we chose to use more mature technology MZM modulator (Mach-Zehnder Modulator), and at the same time laid out MRM technology. We have been tested through a 3-nanometer process and continue to lead the CPO progress in chip stacking.

(Source: Redefine Innovation)

At present, with the continuous expansion of AI recommendations, the market focus has gradually shifted from "computer power competition" to "data transfer speed". Whether it is Broadcom's main network and exchange technology, and the end-to-end solution promoted by NVIDIA, who can be the first to break through the limitations of transmission efficiency and delays, who have the chance to get ahead in the next wave of AI competition.

Extended reading: Cross-data center transmission and optical communication become the next battleground! After Scale-Up and Scale-Out, what is the new Scale-Across called out by NVIDIA? (superior)

Recommend News