This technology tycoon is willful. If you can't buy NVIDIA GPU, make it yourself!

  Technology is productivity.

  Innovation is very important for any company

  Are the most powerful motivation.

  [Global Storage Watch | Hot Spots] According to foreign media reports, Elon Reeve Musk wants to buy all the AI chips of NVIDIA, but he can't. So Musk started the road of self-control of his own investment in GPU.

  Therefore, Musk is willing to invest more than $1 billion to develop Dojo supercomputers for Tesla only because he can't get enough NVIDIA chips. According to Musk himself, "If they can provide us with enough GPUs, we may not need Dojo. But they can't do it because they have too many customers. "

  Later, Musk made a further comment in response to a Twitter account: "Unfortunately, they can't even provide us with a small part of the required calculation!"

  D1 supercomputing chip is the foundation of Dojo system. The chip adopts 7 nm manufacturing process, and its processing capacity is 1024 gigaflops, that is, 102.4 billion times per second. 1500 D1 chips can form an array, and putting 25 arrays on a wafer can form a Training Tile, which is the cell core of Dojo supercomputer.

  On Tesla AI Day in 2022, Tesla threw Dojo ExaPod, or Dojo Cluster. ExaPod contains 120 Training Tile and 3000 D1 chips. It has a static random access memory capacity of 13TB and a high transmission bandwidth memory capacity of 1.3TB, and its computing power is as high as 1.1EFLOP.

  Recently, Musk said, "Before the end of next year, we will spend far more than $1 billion on the Dojo project, and we have an amazing amount of video data to train." Musk also mentioned that it is expected to use two supercomputing systems, NVIDIA GPU and Dojo.

  Dojo is a Multi-Chip Modularized supercomputer developed by Tesla, which was officially unveiled on Tesla AI Day in 2021. At first, Dojo mainly served for data labeling and training of autonomous driving system, and later it was applied to robot research and development. Optimus Prime's "brain" was equipped with Dojo's D1 supercomputer chip.

  Tesla has an amazing amount of image data. At present, the total mileage of the fully automatic driving (FSD) test version has reached 300 million miles. Dojo will be used to process the massive data needed to develop autonomous driving software in the future, which will help Tesla get rid of its dependence on NVIDIA GPU. In addition, Musk also said that Tesla is considering licensing its FSD hardware and software to other automakers.

  In order to better train, reduce training costs and improve training results, Dojo Supercomputer will reach the computing power of 100 exaFLOPS in the future.

  Of course, in order to give full play to the FSD training results for better autonomous driving training, Musk used his own supercomputer and built a cluster based on 5760 Nvidia A100 GPU. However, NVIDIA's GPU supply is far from meeting Tesla's training application requirements on FSD. Dojo supercomputing based on D1 chip is also a last resort for Musk.

  Of course, Musk promised to spend $1 billion to help Tesla develop self-driving car software from now until the end of 2024.

  Some insiders also believe that there are two reasons for Musk to accelerate his own super-Dojo. One reason is that the price of NVIDIA GPU has been soaring. In order to bargain, Tesla needs to have a card to play. Another reason is that it is publicly claimed that NVIDIA's GPU supply can't meet Tesla's large-scale FSD demand. Mainly, public cloud vendors such as Amazon Cloud Technology Amazon AWS, Microsoft public cloud Microsoft Azure, Google Cloud Alphabet Google Cloud and Oracle Bone Inscriptions Cloud Oracle Cloud are snapping up NVIDIA's GPUs, and other enterprises in the industry are snapping up them, so Tesla is increasingly nervous to ensure its application needs. The implication here is that Tesla's own D1 chip is not as good as NVIDIA's GPU, but Tesla will have to use the existing D1 chip to make it up.

  Of course, Musk has always been full of praise for the NVIDIA GPU.

  It turns out that Tesla, the new Dojo supercomputing center, will use a large number of Nvidia GPU after all. Tesla needs to process a large number of real-world lens data recorded by its car on the road to train the FSD algorithm that relies entirely on the camera, rather than the hybrid method of cameras and other sensors adopted by other car companies. Commenting on this, Musk clarified that Dojo will use the fusion architecture of D1 chip customized by Nvidia and Tesla.

  Of course, Musk has always praised the founders and employees of Nvidia. Previously, Tesla used a lot of Nvidia hardware. Will continue to be used. In fact, frankly speaking, if NVIDIA can provide Tesla with enough GPUs, Tesla may not need Dojo. But NVIDIA can't. After all, NVIDIA has so many global customers. Despite this, NVIDIA has given priority to some GPU orders from Tesla in a friendly way. In any case, Tesla is not short of money.

  Looking back at Tesla's GPU development history, we can easily find that a few years ago, Tesla officially announced its latest supercomputer Dojo, which is a large data center for training fully automatic driving (FSD) software for its cars. With the announcement, it was revealed that the electric vehicle leader developed his own chip D1 to train the artificial intelligence algorithm needed by FSD. This is a global event in the automotive industry, because Tesla has been using Nvidia GPU to provide computing support for this artificial intelligence training.

  This is the same logic as Tesla thinks it is necessary to design its own hardware and software internally, whether it is to reduce costs and dependence on suppliers, or to have its own technical reserves when there is nothing it needs in the market. As early as 2018, Tesla announced that it had designed its own chip for the on-board computer in its car, called "Tesla" GPU, and abandoned Nvidia's system designed for electric vehicle manufacturers.

  Interestingly, NVIDIA has a large number of competitors trying to replicate its achievements in the design of artificial intelligence chips, but few people have achieved such success as Tesla's "real knife and real gun".

  However, Tesla's design of its own chip D1 reminds us of Apple and its A-series chips for iPhone and iPad devices, as well as the M-series chips for Mac computers and the new iPad Pro series. Tesla continues to be committed to its vertical integration, allowing the company to rely on its own GPU technology to increase demand replenishment and reduce its dependence on external suppliers, which is essentially beneficial to its FSD test and development.

  For the core chip D1 of Dojo Supercomputer, there have been many introductions in the industry before, so let's briefly sort it out here.

  Before, friends in the industry should have seen some impressive performance introductions of D1 chip. Tesla said that it can output single-precision FP32 tasks up to 362 TeraFLOPs or about 22.6 TeraFLOPs with FP16/CFP8 precision. Obviously, Tesla has optimized the FP16 data type, and they even managed to beat the current computing power leader-NVIDIA. Nvidia's A100 GPU can "only" generate 312 TeraFLOPs of computing power under FP16 workload.

  Tesla built a network of functional units (FU) for D1 chip, and these functional units were interconnected to form a huge chip.

  Each FU contains a 64-bit CPU with a custom ISA, which is specially designed for transposing, collecting, broadcasting and linking. The CPU design itself is a superscalar implementation with 4-wide scalar and 2-wide vector pipeline. Each FU has its own 1.25MB scratchpad SRAM memory. FU itself can execute a TeraFLOP of BF16 or CFP8, a 64 Gigabit FLOP calculated by FP32, and achieve a bandwidth of 512 GB/s in any direction in the network. This means lower latency and higher performance.

  With D1 chip, Tesla will have the supercomputer needed for advanced artificial intelligence training in the world. Tesla's 25 D1 chips can form a training module with a bandwidth of 36 TB/s and a peak computing power of BF16/CFP8 of 9 PFLOPS. Deploying 120 training modules (including 3000 D1 chips) in several cabinets can form ExaPOD supercomputing cluster, with more than 1 million training nodes, and the peak computing power of BF16/CFP8 reaches 1.1 ExaFLOPS. Compared with the current supercomputer based on NVIDIA equipment, under the same configuration cost, the performance is improved by 4 times, the performance per watt is improved by 1.3 times, and the floor space is reduced by 5 times.

  Based on the 7-nanometer manufacturing process, D1 chip is manufactured by TSMC, TSMC, packaging more than 50 billion transistors, and its processing capacity reaches 362 trillion floating-point operations per second. Its die area is 645 mm2, which is smaller than that of NVIDIA A100(826 mm2) and AMD Arcturus(750 mm2). It is equipped with 354 training nodes and supports various instructions for AI training, including FP32, BFP16, CFP8, INT32, INT16 and INT8.

  D1 chip is applied to the training model of video data collected by Tesla vehicles.

  Venkataramanan, the leader in charge of developing D1 chips and Dojo supercomputing, was originally from AMD, and worked as a long-term engineering director at AMD for nearly 15 years, and currently works in Tesla for about 7 years.

  In the past few years, artificial intelligence (AI) has been widely adopted. As we all know, Tesla is a company engaged in electric cars and self-driving cars, and artificial intelligence is of great value to all aspects of the company's work. In order to speed up the workload of artificial intelligence software, Tesla had to introduce D1 chip and super-computing Dojo for artificial intelligence training.

  Of course, the super-computing Dojo is to combine the training module composed of 25 D1 chips as the main computing unit, and integrate the CPU, storage, communication interface, network, power supply and other modules, and finally build a super-computer with advanced performance.

  At present, many companies build ASIC for artificial intelligence workloads. From countless startups to big companies such as Amazon, Baidu, Intel and Nvidia. However, not everyone can correctly apply and give full play to the value of ASIC chips, and not everyone can perfectly meet every workload. This should be another important reason why Tesla chose to develop its own ASIC chip for artificial intelligence training. If you want to play the real perfect performance of GPU, you still have to master the technology and put it into practical application optimization.

  According to several media reports, Tesla put the production of Dojo, a supercomputing platform, on the agenda, and Dojo will start production in July 2023. Tesla also expects its computing power to rank among the Top5 in the world around January 2024.

  The industry expects that when the Dojo supercomputer is delivered, it may see faster iteration and improvement of Tesla FSD(Full Self-Drive) fully automatic driving.

  In 2023, Tesla CEO Elon once said in the company's first quarter earnings conference call that the company's supercomputer Dojo had "great potential". Musk said that Tesla is "putting a lot of energy" into Dojo and believes that it "may have an order of magnitude increase in training costs".

  Amin, a global storage observer, believes that real tough people, like Musk, make their own chips, do their own supercomputers, make their own models, do their own training, do their own applications, and realize their own commercialization. Tesla has achieved such a "one-stop" technological innovation road from core to line, from chip to full automatic driving, from hardware to software to application, touching every innovation level in the electric vehicle industry. With such strong innovation ability, Tesla is still very bullish.

  Welcome to add comments at the end of the article!

  [Global Storage Watch | Global Cloud Watch | Amin Watch | Technology Statement] Focus on the analysis of technology companies, speak with data, and take you to understand technology. This article and the author's reply only represent personal views and do not constitute any investment advice.