Tesla unveils its new supercomputer (5th most powerful in the world) to train self-driving AI

Tesla unveils its new supercomputer (5th most powerful in the world) to train self-driving AI


supercomputer



Tesla has introduced their new supercomputer, which is already the fifth most powerful in the world, and it’s going to be the forerunner to Tesla’s future new Dojo supercomputer.


It is being used to train the neural nets powering Tesla’s Autopilot and forthcoming self-driving AI.


Over the last few years, Tesla has had a clear focus on computing power both inside and outside its vehicles.

Inside, it needs computers powerful enough to run its self-driving software, and outside, it needs supercomputers to train its self-driving software powered by neural nets that are fed an insane amount of data coming from the fleet.


CEO Elon Musk has been touting Tesla’s Dojo project, which supposedly consists of a supercomputer capable of an exaFLOP, one quintillion (1018) floating-point operations per second, or 1,000 petaFLOPS — making it one of the most powerful computers in the world.


Tesla has been working on Dojo for the last few years, and Musk has been indicating that it should be ready by the end of this year.


But the business has produced other supercomputers on its route to Dojo, and now Andrej Karpathy, Tesla’s head of AI, has introduced the latest one during a presentation at the 2021 Conference on Computer Vision and Pattern Recognition.


During the presentation, Karpathy gave a shoutout to Tesla’s supercomputing team and displayed their latest effort, Tesla’s third supercomputer cluster:


Tesla is claiming some rather absurd stats on this new cluster, which should make it roughly the sixth most-powerful computer in the world:

  • 720 nodes of 8x A100 80GB. (5760 GPUs total)
  • 1.8 EFLOPS (720 nodes * 312 TFLOPS-FP16-A100 * 8 GPU/nodes)
  • 10 PB of “hot tier” NVME storage @ 1.6 TBps
  • 640 Tbps of total switching capacity

Karpathy commented on the effort:

“We have a neural net architecture network and we have a data set, a 1.5 petabytes data set that requires a huge amount of computing. So I wanted to give a plug to this insane supercomputer that we are building and using now. For us, computer vision is the bread and butter of what we do and what enables Autopilot. And for that to work really well, we need to master the data from the fleet, and train massive neural nets and experiment a lot. So we invested a lot into the compute. In this case, we have a cluster that we built with 720 nodes of 8x A100 of the 80GB version. So this is a massive supercomputer. I actually think that in terms of flops, it’s roughly the number 5 supercomputer in the world.”


The Tesla engineer didn’t want to detail on project Dojo, but he did remark that it will be an even better supercomputer geared for neural net training than Tesla’s existing cluster.


Musk also previously indicated that Tesla aims to someday make its supercomputers available to other companies in order for them to train their neural nets on it.

No comments

The Creative Web Team. Powered by Blogger.