Pretty sure "tops" refers to "tensor operations per second", a single tensor operation is a multiplication and an addition. The T is not for "tera". You can have gigatops, teratops, petatops, etc.
@@DrKnowitallKnows No you were correct, Tesla's HW3 is said to have 37 TOPS per NPU core. I don't think they'd be very good at self-driving doing just 37 operations per second.
I was really looking for info on DPUs, but I liked your video... I mean really I clicked the thumbs up button and I also like the video because you have very good information. I am working on developing LLMs and this is relevant info. You have a lot of good info here and I'm going to be looking out for the future videos you mentioned in this one and I'll be checking out what else you have. Thanks
yes, please talk about convolutional NN. when i did my M.S. 20 years ago, I did a minor on AI, and I do remember back-prop NN as I wrote it from scratch on C++ for a class homework, but nowadays there are so many types that it's hard to keep track, especially if you don't use them on your typical day to day.
Dr know it all: where were you when I was in high school?; I'm envious of your ability to convey informations. Damn you're smart....I'm having a bit of a man -crush right now. more dojo's pleeeeaaaase.
I appreciate your deep understanding on your field and on the youtube algorithm and engagement. Ill make sure to like your videos. Love your content. thank you.
Back in the day. GPUs were limited by their ability to transform vertices which led to rendering rates being bottlenecked at the triangle level. But that rarely is the case today. Modern GPUs are measured by their ability to process fragments (pixel samples) - which leads to games being bottlenecked by fill rate.
Just got my Coral edge TPU. I would love to see an update on this and what "regular people" can expect to achieve as advancements continue to be made in this space.
I recently saw a video about DOJO and I believe it was a Karpathy and Musk tag team, but I watch so much content that I may be wrong. However, if I recall the numbers correctly, the main point that knocked me off my feet was that Tesla’s current processor farm requires 3 days to run through a training cycle. The DOJO farm will do 3 cycles a day. If I can find that video, I will forward the link.
4:10 I don't think there is a relationship between a screen being 2D and matrices being 2D arrays. The matrix is 2D because affine transformations are binary relations. The dimensions of the screen correspond to the size of the matrix (minus 1 - for 2D graphics you have 3x3 matrix).
AI chip vendors and industry has confused the NPU name. Traditionally it was always referred to Network Processing Units which are used by network vendors similar to DSPs for signal processing, GPUs for graphics processing, APUs for application processing etc.
Would love a series on neural networks etc focusing on self driving and or generalized intelligence. Don’t know if you know but Elon Musk saw and commented on an article that referenced your 4d video
I saw that. I think I floated around the entire rest of the day :D If you want to see my "at the moment" reaction, check this out: th-cam.com/video/VRUgydmMq20/w-d-xo.html. Talk about star struck! (and PS I'll definitely work on some NN videos. I have a plan for at least 3 parts)
It would be great if you could dive deeper into neural nets. What I find an interesting aspect regarding Teslas FSD Beta, is that with that much of an improvement which they achieved, they have much less data to work through. The fewer driver interventions you have the fewer flaws you have to iron out. Especially as the neural net training consumes a lot of time.
if people want to learn more about neural nets, brandon Rohrer has some of the best explanations of Deep neural nets. I don't believe the statement "GPU" isn't built for ML applies to the latest Nvidia chips. For example, the RT cores could be used to calculate risk of collision with other moving objects. Calculating rays or emulating surround sound if part of general class of problems that involve bounding volume hierarchy calculations. You can do the same calculations on CPU or CUDA cores, but the RT cores are optimized to do it faster more efficiently. When Hinton, Bengio and LeCunn get System 2.0 working, we're going to need different hardware that's closer to FPGA or much higher memory bandwidth. RNN models like LSTM are much more calculation intensive and require more memory. The proposals from Bengio add memory to the network. Hinton wants to add smart routing with capsule networks.
The most significant single difference between a CPU and the others is branching. CPUs evaluate conditionals and go on to execute different code pathways. So while a branch is technically possible on a GPU, typically both paths are evaluated and the wrong-path result is multiplied by zero.
Yes, I believe that's correct. My understanding is that you could _somehow_ manage to jury rig something together that could work as an OS--but of course it would run like crap. I definitely wasn't suggesting anyone try!
Most interesting AI info. Even with lots of Musk money they do make mistakes as any learning process. The rest of the world has a few ideas. 😎 Thank you. The OPi5 is AI fun. 🥰
The DPU is A DATA PROCESSING UNIT is like a really really premium version of a network card or WIFI card it does the work of what the CPU will be doing it but it does it on the DPU because it has a dedicated processor for networking. It takes basically work out of the CPU.
Did you see the npu built into the new m1 MacBook architecture? I don’t think many details were provided but is that a first for the macs or did I miss it previously?
Woah. Totally missed that. I need to look. I did notice the latest version of Photoshop has "neural filters" in it. I haven't had a chance to play with them yet, but NNs are invading the world, clearly ;)
@@spleck615 Thanks! I'm definitely planning to do an episode on this soon as I have time to really look into it... maybe Thanksgiving break will be just the opportunity :)
"They do not think" : so how do you explain some FSD wideo where FSD fail doing something (Uturn, Avoid a cone) and then, 2 minutes after, FSD succeed on second attempt and third attempt ...
Different lighting conditions could easily change how the FSD computer perceives the scene and reacts. It could easily succeed on the first, and fail on the second or third, but the driver doesn't chalk that up to un-learning, but to it being in "beta." So throw in some confirmation bias.
@@SodaPopin5ki Perhaps but the different try were juste one after another (one or 2 minutes between tries) and so far nobody reported a success in first try and fail in second ... ? puzzling ...
@@fredt7518 Another possibility is the car retains the drivable space area within the immediate area. After going through it the first time, it could retain the info if the car hadn't left the area before the 2nd attempt.
I remember that video from another TH-camr. It seemed that the car was learning how to do the u-turn. So, if the car does not learn by itself, which makes sense, how is possible that after the second time it did it perfectly six times in a row... It was definitely learning from the correction that the driver was doing... I wonder if cars are transmitting the data in real time and being corrected which, does not make sense but, what if... I don't have an explanation. It would be great if DrKnow could give his opinion about that video. Thanks for the channel, really interesting :)
GPU mem is not much faster than CPU one overall. It is just faster in GPU oriented scenarios. But it is slow in random access. GPU is not doing computations very fast. They just do it many in parallel.
Awww. Zenlee will be sad. I've already adjusted the font once. Any suggestions on a better one? I went with a sans-serif to try to make it easier to read.
@@DrKnowitallKnows Pardon me for not seeing the new font. Now it’s perfect and simple. But honestly the soundtrack is annoying. If you can change the soundtrack that’ll be great. Awesome content. Appreciate it.
Quantum computing has a very very style of computing and presents challenges of its own. Depending on the need they can be incredible, but for traditional computing needs, they’re dog slow. If Tesla needs the performance I’m sure they’ll do it!
What do you think of James Locke's new video that seemingly shows the FSD Beta learning on the fly through repeated use? th-cam.com/video/QtaWbaWi93Y/w-d-xo.html
01:44: RAM is in the wrong slots: single channel operation only. 03:04: talks about VRAM, zooms on coils. 03:51: talks about triangles, shows pic with “rectangle” mesh. 07:17: most of GPUs don't have memory on the chip. They have it in separate chips, GDDR5(X) or GDDR6(X). On chip memory is HBM2, which is rare. Of course, I am talking about consumer GPUs, because what good some tegra card does to me, when I don't have in my PC? 07:55: Lol. That did not age well. :-D But was not there GPU shortage already at the time of filming? I believe so. ~10:40: IDK, but you seem to live in a world, where basically everyone has a Tesla. I'm sorry for the disappointment, but this is not true in this world. But if you came through Mirror. First of all, let me welcome you in our reality. It's kind of shitty. Unlike in yours, regular people don't have access to GPUs with HBM2, for example. I guess where you come from, the economic situation is better, tech is probably a lot further as well. Luck you… 12:10: The what? Miles? What Miles guy? Dude. SI units! 13:07: Billions of…what? SI units! If you want to be taken seriously. 13:27: No, I did not enjoy this video. I will not leave you like. Or dislike. Could not be bothered. Just this reaction. And the famous YT algorithm sucks ass when this is what it suggests me. And all of this is ONLY what I caugth and know. Someone more educated must feel tortured watching this.
You are not keeping up. Dojo is a complete flop. Musk fired the head module designer for non-performance. He designed the Dojo module with little to no memory. The idea was that it would make the system faster Turns out AI is a memory pig to work efficiently!!! So at the very least the entire Dojo project will need to be redesigned or scrapped altogether or Maybe repurpose the chips for a huge data center after adding memory The good news is that FSD AND Optimus are NOT dependent on Dojo Our original supercomputer is still the one we lean on for all FSD and Optimus iterations The 10,000 H100 Invidia chips went to our old supercomputer -- NOT Dojo
Pretty sure "tops" refers to "tensor operations per second", a single tensor operation is a multiplication and an addition. The T is not for "tera". You can have gigatops, teratops, petatops, etc.
Sorry, yes. You should see the outtakes on that little bit. And in the end I got it wrong anyway. Thanks for correcting me!
@@DrKnowitallKnows No you were correct, Tesla's HW3 is said to have 37 TOPS per NPU core. I don't think they'd be very good at self-driving doing just 37 operations per second.
Its trillion dumb
@@tristanmanchester yeah but they're texas sized operations
Would love to see more about NN in general, maybe an overview of different kinds of?! Or a general overview 😁
Yes, please we need a deeper dive!
Watching this 3 years later as I learn a bit more about Machine Learning... very well explained doctor!
Tf are you doing here bruh?😂👋🏾👋🏾
I was really looking for info on DPUs, but I liked your video... I mean really I clicked the thumbs up button and I also like the video because you have very good information. I am working on developing LLMs and this is relevant info. You have a lot of good info here and I'm going to be looking out for the future videos you mentioned in this one and I'll be checking out what else you have. Thanks
This channel is underrated
Enjoyed the presentation. ❤❤❤
Thanks for a very high level overview yet describing enough info to explore in detail each individual topics. Subscribed.
would like more about NN. Thanks for the video! Great content
yes, please talk about convolutional NN. when i did my M.S. 20 years ago, I did a minor on AI, and I do remember back-prop NN as I wrote it from scratch on C++ for a class homework, but nowadays there are so many types that it's hard to keep track, especially if you don't use them on your typical day to day.
Thanks for the information 😀
Fantastic. A really good introduction.
Yes! Would love a series on NN.
Wow thank you for explaining! I truly learned something new today!
Great video, lot of info in this one. Do the other episodes would love to see your break down on the topics you stated
Thanks for sharing your knowledge! Subscribed.
Love your lecture type video style, can’t wait until your Tesla is delivered and you begin the self driving journey 😉
Great video
Your number of subs is growing fast! You absolutely deserve it
Great explanation. Thanks for sharing this
Dr know it all: where were you when I was in high school?; I'm envious of your ability to convey informations. Damn you're smart....I'm having a bit of a man -crush right now. more dojo's pleeeeaaaase.
lol, thank you. I think my students all think I'm a goof, so glad to know someone thinks otherwise!
I love the explanation, is Clare, deep and understandable
Thanks for this video! Will be following the whole A.I. serie 4 sure !
This channel has grown a LOT. Just a month ago it had less than 500 subscribers. Keep it up!
Right? It’s been a ride these past couple of weeks. I’m really humbled people are enjoying my videos. Thank you, everyone!
Good video on hardware knowledge. Just do more detailed videos on NPU
New subscriber. Watched many of your vids and plan to watch them all. Keep up the great work!
Thank you!
This was informative.
Perfect video as always
You should improve the sound it would be even more perfect
Fascinating ! Subbed.
I appreciate your deep understanding on your field and on the youtube algorithm and engagement.
Ill make sure to like your videos. Love your content. thank you.
Thanks for the lecture, Dr!
The photo you show for gpu is actually a TPU. It’s the Volta board from NVDIA.
Back in the day. GPUs were limited by their ability to transform vertices which led to rendering rates being bottlenecked at the triangle level. But that rarely is the case today. Modern GPUs are measured by their ability to process fragments (pixel samples) - which leads to games being bottlenecked by fill rate.
This content is great, thanks Dr Know-it-all
Thank you very much sir ❤❤❤
Great explanation
Thanks!
I would definitely like to learn what "deconvolution" is.
Just got my Coral edge TPU. I would love to see an update on this and what "regular people" can expect to achieve as advancements continue to be made in this space.
Love the value-add, original content vs other channels that just regurgitate online articles for some views
Would love to see a video or 2 on neural nets
Google does sell TPU's, allbeit only the edge TPU's which you can use to train TensorFlow Lite models, these are missing certain types of layers.
I recently saw a video about DOJO and I believe it was a Karpathy and Musk tag team, but I watch so much content that I may be wrong. However, if I recall the numbers correctly, the main point that knocked me off my feet was that Tesla’s current processor farm requires 3 days to run through a training cycle. The DOJO farm will do 3 cycles a day. If I can find that video, I will forward the link.
Was that a Romulan warbird in the intro sequence? It is rare to see the Romulans so far from their home territory ;)
4:10 I don't think there is a relationship between a screen being 2D and matrices being 2D arrays.
The matrix is 2D because affine transformations are binary relations.
The dimensions of the screen correspond to the size of the matrix (minus 1 - for 2D graphics you have 3x3 matrix).
AI chip vendors and industry has confused the NPU name. Traditionally it was always referred to Network Processing Units which are used by network vendors similar to DSPs for signal processing, GPUs for graphics processing, APUs for application processing etc.
Yes NN's please. Can you cover how it works using Tesla's NPU as example? i.e. when a video frame comes through, what processes it goes through
Cool. I like that way of making something a bit theoretical more practical. Thanks for the suggestion!
Would love a series on neural networks etc focusing on self driving and or generalized intelligence. Don’t know if you know but Elon Musk saw and commented on an article that referenced your 4d video
I saw that. I think I floated around the entire rest of the day :D If you want to see my "at the moment" reaction, check this out: th-cam.com/video/VRUgydmMq20/w-d-xo.html. Talk about star struck! (and PS I'll definitely work on some NN videos. I have a plan for at least 3 parts)
It would be great if you could dive deeper into neural nets. What I find an interesting aspect regarding Teslas FSD Beta, is that with that much of an improvement which they achieved, they have much less data to work through. The fewer driver interventions you have the fewer flaws you have to iron out. Especially as the neural net training consumes a lot of time.
Can we use TPUs and NPUs for rendering video graphics for gaming? 😅
Great Video !
if people want to learn more about neural nets, brandon Rohrer has some of the best explanations of Deep neural nets.
I don't believe the statement "GPU" isn't built for ML applies to the latest Nvidia chips. For example, the RT cores could be used to calculate risk of collision with other moving objects. Calculating rays or emulating surround sound if part of general class of problems that involve bounding volume hierarchy calculations. You can do the same calculations on CPU or CUDA cores, but the RT cores are optimized to do it faster more efficiently.
When Hinton, Bengio and LeCunn get System 2.0 working, we're going to need different hardware that's closer to FPGA or much higher memory bandwidth. RNN models like LSTM are much more calculation intensive and require more memory. The proposals from Bengio add memory to the network. Hinton wants to add smart routing with capsule networks.
The most significant single difference between a CPU and the others is branching. CPUs evaluate conditionals and go on to execute different code pathways. So while a branch is technically possible on a GPU, typically both paths are evaluated and the wrong-path result is multiplied by zero.
Great video! Love to hear more about NN, Dojo and artificial intelligence techniques in general.
Check out the Dojo one that's up already. AI and such will be coming soon as I can make them!
3:13 - Isn't VRAM "Video RAM" and not "Visual RAM"? Or is it considered the same thing?
Would love to hear your opinion on the new apple M1 chips
So many people have asked that of me today. Guess I better look into it and do an episode! Thanks.
Excellent look into this aspect of the Tesla secret sauce!
Thank you for your great videos on subjects not available any where else!
you're welcome!
If one were to make a Turing machine with Kleenex as the instruction tape,
would this be a TPU ;-)?
Can you make a video comparing Tesla to Intel Mobileye in the race to full self driving?
Cool video! (I think ASICs *can't* run something like an OS cause they don't have a Turing complete instruction set.)
Yes, I believe that's correct. My understanding is that you could _somehow_ manage to jury rig something together that could work as an OS--but of course it would run like crap. I definitely wasn't suggesting anyone try!
5:56 "The more tops you can do the the better" 💀
Hi i am from india realy nice vedio
NPUs consume power like GPUs? I was under the impression they were much more efficient? Thanks for any clarification! Love your channel.
You should do a video summarizing your background and professional interests.
I am surprised you didn't mention that GPUs are highly parallel processors which is why they are so fast.
Most interesting AI info. Even with lots of Musk money they do make mistakes as any learning process. The rest of the world has a few ideas. 😎 Thank you. The OPi5 is AI fun. 🥰
Please make materials about ML
hahahahaha your intro is great
This gentleman needs to see the RTX 3090
Feeding the algorithm!
lol thanks :)
just curious if you are planning to trade-in your old vehicle with TESLA? If you are, are they giving you good deal?
so how is Nvidia's DPU the Bluefield 2 comparing to the Google TPU.
It seems that bluefield strength is the low latency bandwith with 200 Gbps.
The DPU is A DATA PROCESSING UNIT is like a really really premium version of a network card or WIFI card it does the work of what the CPU will be doing it but it does it on the DPU because it has a dedicated processor for networking. It takes basically work out of the CPU.
Did you see the npu built into the new m1 MacBook architecture? I don’t think many details were provided but is that a first for the macs or did I miss it previously?
Woah. Totally missed that. I need to look. I did notice the latest version of Photoshop has "neural filters" in it. I haven't had a chance to play with them yet, but NNs are invading the world, clearly ;)
@@DrKnowitallKnows more details blog.tensorflow.org/2020/11/accelerating-tensorflow-performance-on-mac.html
@@spleck615 Thanks! I'm definitely planning to do an episode on this soon as I have time to really look into it... maybe Thanksgiving break will be just the opportunity :)
but you can buy TPU's isnt it, there is that google dev board for AI training
for what it's worth, I thought the kleenex analogy was very good
Can you talk about the new apple chip which has a cpu, gpu, and a neural engine built in! I’d really like to know your thoughts.
Only just finding out about this. I will certainly check though. What a fascinating chip to release in a consumer focused product!
"They do not think" : so how do you explain some FSD wideo where FSD fail doing something (Uturn, Avoid a cone) and then, 2 minutes after, FSD succeed on second attempt and third attempt ...
Different lighting conditions could easily change how the FSD computer perceives the scene and reacts. It could easily succeed on the first, and fail on the second or third, but the driver doesn't chalk that up to un-learning, but to it being in "beta." So throw in some confirmation bias.
@@SodaPopin5ki Perhaps but the different try were juste one after another (one or 2 minutes between tries) and so far nobody reported a success in first try and fail in second ... ? puzzling ...
@@fredt7518 Another possibility is the car retains the drivable space area within the immediate area. After going through it the first time, it could retain the info if the car hadn't left the area before the 2nd attempt.
@@SodaPopin5ki yes that make sense. So it is like the car "learn" from its first try (like a child) :-)
I remember that video from another TH-camr. It seemed that the car was learning how to do the u-turn. So, if the car does not learn by itself, which makes sense, how is possible that after the second time it did it perfectly six times in a row... It was definitely learning from the correction that the driver was doing... I wonder if cars are transmitting the data in real time and being corrected which, does not make sense but, what if... I don't have an explanation. It would be great if DrKnow could give his opinion about that video. Thanks for the channel, really interesting :)
Bring on the NN content :)
GPU mem is not much faster than CPU one overall. It is just faster in GPU oriented scenarios. But it is slow in random access.
GPU is not doing computations very fast. They just do it many in parallel.
everything is good. but please, please change the music and the font.
Awww. Zenlee will be sad. I've already adjusted the font once. Any suggestions on a better one? I went with a sans-serif to try to make it easier to read.
@@DrKnowitallKnows Pardon me for not seeing the new font. Now it’s perfect and simple. But honestly the soundtrack is annoying. If you can change the soundtrack that’ll be great. Awesome content. Appreciate it.
How about quantum computing? Tesla should try to build one
Eek!! Completely insane. Tesla is probably working on it now! ;)
Quantum computing has a very very style of computing and presents challenges of its own. Depending on the need they can be incredible, but for traditional computing needs, they’re dog slow. If Tesla needs the performance I’m sure they’ll do it!
How does tesla’s NPU chip compare to top tier bitcoin miner currently in the market?
Go watch Sandy Munro. He talks about the 3.0 chip in Tesla
Such a strange obsession with Tesla…
Taylor Kenneth Anderson Richard Harris Jose
Andrew tate: dont but the gpus they do matrix with ur life as their math
Sea of comments? There's are 30 here..
What do you think of James Locke's new video that seemingly shows the FSD Beta learning on the fly through repeated use?
th-cam.com/video/QtaWbaWi93Y/w-d-xo.html
01:44: RAM is in the wrong slots: single channel operation only.
03:04: talks about VRAM, zooms on coils.
03:51: talks about triangles, shows pic with “rectangle” mesh.
07:17: most of GPUs don't have memory on the chip. They have it in separate chips, GDDR5(X) or GDDR6(X). On chip memory is HBM2, which is rare. Of course, I am talking about consumer GPUs, because what good some tegra card does to me, when I don't have in my PC?
07:55: Lol. That did not age well. :-D But was not there GPU shortage already at the time of filming? I believe so.
~10:40: IDK, but you seem to live in a world, where basically everyone has a Tesla. I'm sorry for the disappointment, but this is not true in this world. But if you came through Mirror. First of all, let me welcome you in our reality. It's kind of shitty. Unlike in yours, regular people don't have access to GPUs with HBM2, for example. I guess where you come from, the economic situation is better, tech is probably a lot further as well. Luck you…
12:10: The what? Miles? What Miles guy? Dude. SI units!
13:07: Billions of…what? SI units! If you want to be taken seriously.
13:27: No, I did not enjoy this video. I will not leave you like. Or dislike. Could not be bothered. Just this reaction. And the famous YT algorithm sucks ass when this is what it suggests me.
And all of this is ONLY what I caugth and know. Someone more educated must feel tortured watching this.
There was a lot of waffle here so I skipped a lot and still don’t know the difference between NPU and TPU, will try elsewhere
+1 for CNN and others.
Should you nit drop the Dr. in solidarity with Jill Biden? At least for one episode?
Typography is hard. Trading an I for an O.
Lee Richard Davis Nancy Wilson Brian
You are not keeping up. Dojo is a complete flop.
Musk fired the head module designer for non-performance.
He designed the Dojo module with little to no memory.
The idea was that it would make the system faster
Turns out AI is a memory pig to work efficiently!!!
So at the very least the entire Dojo project will need to be redesigned or scrapped altogether
or
Maybe repurpose the chips for a huge data center after adding memory
The good news is that FSD AND Optimus are NOT dependent on Dojo
Our original supercomputer is still the one we lean on for all FSD and Optimus iterations
The 10,000 H100 Invidia chips went to our old supercomputer -- NOT Dojo
A bit verbose. Maybe 1/3 fewer words and more concise sentences would be wonderful