You just know that the guy who loaned this card to LMG is watching this video and cringing perceptibly every single time Linus does 'a Linus' to his $10,000 card.
Reminds me of Michael Reeves "You lied to me, Boston Dynamics." XD Similar energy, its just that Linus has a lot more self control and professionality :P
nVidia is unlikely to really care that much about that aspect of it, they can have bookkeeping write it off as a promo cost and deduct it from taxes, if they really care. What they DO care about is Linus shitting on the card with stuff that doesn't matter, really, but non-techie customers may think matter. See, the way that you quantify "value" for something like this isn't the most intuitive thing in the world, and has no relation to the shizz Linus is talking about, but the big kahunas of datacenters, and their investors - again don't understand how that stuff works and may misjudge it based on faulty reasoning. If I build high end workstations for a living, and a fortnite kiddie wants to review one - I will say FU NO! to the kiddie - not because my workstation can't play fortnite but because how well it does that is irrelevant, AND I gain absolutely nothing from that review, while risking a lot - hardware getting broken, bad rep possibly etc etc.
Linus: "We can't just go out and get an A100 because it costs almost $10,000" Also Linus: Creates a solid gold Xbox controller that's worth more than many people's houses
A100s are no joke, no wonder AWS wants to bill me three arms and a leg to spin up instances with them just so I can get a "Sorry, we don't currently have enough capacity for this instance type" screen!
I had to write custom scripts which run endlessly to request the p4d instances (which has 8 of those, but the 400W versions) on aws, as they are not available in any AZs. Luckily the script managed to get one of those after 2 days in us-west-2
Fun fact, our A100 servers (8 80 GB SXM A100s per server) each have a max power draw of close to 5 KW. And Linus and Jake were right! Even with the 80 gig models, we still wish we had more memory. Never enough memory!
I replaced one of these cards for a customer who had 3 of them in total in a Dell 7515 server running dual AMD Epyc 7763 64 core processors. I remember thinking this APU is worth more than my car.
It was always fun working in a customer's cage and you open up the shipment that FedEx delivered and it is beat all to hell and find 6 server GPUs or a line card full of 100Gig Optics and realize that the package is worth more than you make in 5 -10 years.
Just wanted to point out that TensorFlow by default allocates the whole memory even if it's not using it, so the A100 may benefit from a larger batch size
I was one of the people handling repairs on amazon servers and I’ve seen thousands of them. They are crazy. Of course I can’t test them but just holding it you can tell it’s a beast
@@GodlyAwesome yeah I’ve repaired thousands and thousands of server racks. And they have sections dedicated for graphics cards and stuff. In a single server it would have anywhere between 2-12 graphics cards.
@@Whatismusic123 Despite the high price, they still would because it's like 100% more efficient for hashing. Just like Linus said, the running cost (electricity cost) of a gpu for mining far outweighs the price.
Most miners wouldn't buy this because they're just not eligible to. For the cost of 10 of these cards you could've purchased like 50-60 3090's even at these high prices and gave them proper cooling which would far outhash those enterprise cards. Yes it's cheaper to run those enterprise cards for the long term but you'd be looking at how long ethereum will last rather than how long the card will last
This GPU would be pointless to a miner, because it costs $10,000 and it would take them months or even years for them to justify the cost of it from mining, it doesn’t take extremely powerful cards to mine.
I love getting to see the incredibly expensive equipment that runs data centers, even though I understand about half of what they are used for. The efficiency is just insane
By default, Tensorflow allocates nearly all the GPU memory for itself regardless of the problem size. So you will see nearly full memory usage even for the smallest model.
Oh that explains a lot. I was wondering how they managed to tune it so perfectly, because Pytorch would simply crash if you tried to use more memory than available.
@13:55 what you guys are totally missing is that, the A100 has fewer CUDA cores but they do INT64/FP64 half throughput of INT32/FP32. The 3090 is what, 1/16th throughput or something? It's meant for higher precision calculation. The Desktop and Datacenter cores are different. You need to do a test on 64-bit calculations to compare.
@@kvncnr8031 It does 64-bit math like 10x faster than the 3090. So it's better for where you need high precision. Neural networks in particular can get away with much smaller numbers, like 8-bit values in the network. A bit is basically 1 or 0 in a binary number, so the number can represent a larger value with more bits. Or if it's a floating point number, it can have more precision (ie represent more decimal places). For scientific computing, like modelling the weather or physics simulations, you want higher precision math. That's why the A100 is tailored for 64-bit math, where as the 3090 is tailored for 32-bit math and below, which is the most common precision used for graphics.
The computer is as fast as it's slowest component. As example if you have a game that uses the GPU for everything but for some reason decides to use the shadow calculations using the CPU....you are limited by the CPU. As you increase the GPU, you need to increase the CPU. Then you'll be sitting at loading screens thinking "Why is this taking forever to load? My CPU and GPU are a beast" But you are running a standard HDD...Ahhhh time to upgrade! NVMe SSD FTW!. It's all a balance and why building your own PC will always be better (when you know when you are doing) compared to just buying a PC.
The reason I like Linus videos is that even though I don't understand 90% of the content, I still enjoy watching it without skipping a second. Keep it up dude
A1000 vs GTX 3090 A1000 having similar number or slightly more lanes per core runs at a lower power consumption while having almost 2x computing power. So A1000 is more efficient at computing numbers workloads but not so much with graphical loading.
Nice comparison. You could've rented one of those bad boys on Azure for less than $4/hour for the benchmark. In fact, 8 GPU A100s connected through NVLink are expected to be 1.5X times faster than stacking 8 A100s connected through motherboard.
While machine learning can be sped up using more memory, there are things that you literally can not do without more VRAM. For example, increasing the batch size even further will very quickly overwhelm the 3090. Batch size, contrary to popular belief, is not "parallelizing" the task, but actually computing the direction of improvement with higher accuracy. Using a batch size of one for example would not usually even converge on some datasets, and even if it does, it would take ages to do so.
Also, it depends a lot on the what is used. If you're running inference and your model is big, it will need a lot of VRAM (proportionally to the model size) and won't run if it doesn't have enough. You *could* split the model between cards, but it's running into bandwidth, model and performance problems.
I assume we're talking about neural networks. Using bigger batches just means feeding more data sets into the model before backpropagation. Why does this increase memory usage linearly?
@@nottheengineer4957 Imagine sending 32 images of 512x512 pixels in size with three channels, thats a batch of 32, which would be a fp-32 tensor of size 32*512*512*3, a bigger batch size would mean a larger floating point array to be handled by the GPU. So, batch of 64 would be a tensor of 64*512*512*3. This is effectively doubling the total memory required to process the tensor.
Linus you look great with a beard! Long time sub here from wayyyyy back when you and Luke did those build wars and watching your videos back when you had that old space where you connected each pc daisy chained to a copper water cooled set up. It’s awesome to see your sub count and how far you’ve come since I last watched your videos. He you are your family are all well and enjoying the holiday season!
Everyone with any pc building experience: "So graphics cards take pci-e power connectors and attempting to plug an eps connector in instead would be bad right?" Nvidia: "Well yes but no"
@@snowyowlll Well yeah i'm just referring to the pinout (Yes i know they're made so it's impossible or at least a lot harder to put one connector in the wrong spot)
So funny thing about that. The keying for PCI Express 8 pin and EPS 12 volt is basically compatible. The only difference between the two connectors is PCI Express has a little tab between pins seven and eight. If you were to plug a PCI Express power connector into an EPS 12 volt port you basically end up shorting 12v to ground. It may or may not know from experience 🤪
Trying to imagine a world where fans reach out to you to give you a 10k GPU whilst I struggled to obtain a 3060 so much that I bought a whole prebuilt PC just to pull it lol
For me, I bought a laptop instead. Lenovo legion 7 (16" 16:10 version) with rtx 3060. you would think a laptop with that GPU wouldn't have the same performance as the desktop PC equivalence, but the laptop is big enough for the heat and everything that it is extremely close. it runs the same performance and sometimes higher than my friend's rtx 2070 desktop GPU
Oh and it was around £1600, one of the best bang for the buck price performance wise for a gaming laptop. beat only by Lenovo legion 5 pro which is a bit cheaper but looks quite uglier
When I interned at this machine learning lab I got the opportunity to train my models on a supercomputer node which had 4 of these cards. Even though my code was not optimized at all, it crunched my dataset of 500.000 images for 80 epochs in about 5 hours. For reference, my single RTX2060 Super card was on track to do it in about 4 days. I think the main advantage of these cards in machine learning is mainly the crazy amount of memory. My own gpu could handle batches of 64 images while the node could at least handle 512 with memory to spar (I didn't go further as the bigger batch sizes give diminishing returns in training accuracy)
I get what you're getting at but that comparison seems to be a bit extreme. If you put your workload on one a100 only that costs 10000$ and then on two 3090 that cost ya 2000$, you would save a lot of money and get better performance. If you consider the power usage then yes, you would be saving but still to get to 8000$ worth of difference it would take many years. People of course pay for these things because they are made with tons of memory and linkability and data centers need that but comparing just the processor power these chips aren't better than the more affordable gaming cards. There's a big price hike that nvidia applies to the pro cards because they can and the clients can and do pay.
I haven't messed around with an nVidia Tesla GPU past the Maxwell line but I do remember it is possible to switch them to WDDM mode through nVidia SMI in Windows command prompt which will let you use the Tesla GPU for gaming provided you have an iGPU passthrough. By default, nVidia Tesla GPUs like the A100 will run in compute mode which Task Manager and Windows advanced graphics settings won't recognize as a GPU that you can apply to games and apps. But idk if WDDM has been removed in later nVidia Tesla GPUs like the A100 or not.
I recall reading WDDM not being available by default on some modern Tesla cards because the standard drivers only support TCC mode and specific driver packages from Nvidia are needed to do it. I have no idea how this applies to Ampere but I imagine it's similar.
You'd get a significant boost in speed with Blender when you render with a GPU of you set big tile sizes, like 1024 or 2048, under the performance window.
@@hrithvikkondalkar7588 The device ID is not unique. Every single card of the same model will have the same Device ID. For example, every 980Ti the same as mine (I can't say which specific model of 980Ti it is, as I bought it second hand with a waterblock fitted) will show 10DE 17C8 - 10DE 1151. You can google that and see for yourself.
The difference in finishes you see at 6:10 looks like part of the shroud was milled. The matte parts look to be as-is (likely stamped or cast depending on the thickness). The smoother parts with lines going in squares are milled (kind of like a 3D drill to cut away material). This means they were taking higher-volume parts and further customizing them for these cards (milling is done for much smaller production runs than stamping or casting/molding).
Agree, my AI buddy already has his company ordering a few (80 GB model) where 2 of them will go to his high-end workstation. How lucky. But like you said, if you have large scale learning data sets or doing deep learning, these hards are at the top. Anything else, and these cards are likely not worth it.
6:15 This might be the point where the case was mounted to a big industrial suction cup. Manufacturers often do that when spray painting a piece of metal. You can see that on a lot of metal stuff that doesn't need to look good from the inside.
Whoever sent that in is literally putting their job on the line and in the hands of a clumsy Linus, watching him take this apart gave me huge anxiety! 😂
Probably "borrowed" from his work and hoping his manager doesn't see any identifying marks on the missing card. Also, if you gonna steal a 10k card you're probably not that bright to begin with.
Nope probably just a miner with extra cash looking to increase efficiency. 70% higher hash rate over a 3090 w/ 25-30% lower power consumption seems good but at 3-4x the initial cost. It will take 5 years of continuous running to pay for itself, assuming about 5.50 a day profit. 3090 will pay for itself in 2.5 years.. this is all of course assuming crypto remains completely flat, which is highly unlikely. If I can get some of these at a good discount I will probably pick some up.
"Yo test mining with it", dude prolly jacked the thing from somewhere or got it from the market and threw it a linus before he puts it with the rest of his mining operation to see what he's dealing with.
Can you imagine the process that guy probably had to go through for sending that card over? Like disclosures for if Linus drops it or Jake misplaces a screw lol
I hope the guy that is the owner of this card didnt die 14 times from a heart attack... Also thank you actual owner for making all of us able to watch this tear down and video!
Tensorflow allocates all of the GPU that you give it. That's why the VRAM usage is almost 100% in both cases. 512 batch size on a ResNet50 barely uses any memory, so this benchmark might not actually be pushing the cards to their limit.
When, after admiring Linus and the crew for years and counting, you realise you bought a pair of those babies at work and you have an ssh key to login and use them you immediately figure how far you have gone since the first inspiration you got from LTT. Thanks guys, you are a good part of what I’ve got to!
I like how they always make a separate video for the top of the line HPC/professional nvidia card of each generation and make it so hype like it's a gaming card and it just released instead of year(s) ago. I don't mean that in a negative way.
Note on TensorFlow and VRam utilization: TensorFlow allocates all of the available VRam even though it might not use all of it. Furthermore, in my studies, models ran considerably slower with XLA enabled. Would be interesting to know how the cards perform with XLA off!
The HBM2 will be saving quite a bit of power vs the GDDR6X on the 3090. It’ll also be a huge boost in some workloads. TSMC’s 7nm process is no doubt better than Samsungs N8, it’ll be interesting to see how Lovelace and RDNA3 do on the same 5nm node.
People will shit on you if you even dare to mention that RDNA 2.0 is worse than Ampere as an _architecture_ because it has a pretty significant node advantage and still only trades blows with Ampere. But just look at this Ampere on TSMC's 7nm, it's quite darn efficient. It will indeed be interesting to see the Lovelace vs RDNA3 on the same node.
@@PAcifisti To be fair, the A100 has low clocks and has a massive die size at 830mm2. It's not even fair lmao. Same thing about current desktop Ampere: 20% larger than the largest RDNA2 die.
5:15 Great, now Nvidia can super sample that fingerprint and find the technician that assembled the card, figure out where the technician worked, then where this card was assembled, then track down where it was sold, so they can find who it was sold to. oh no 😂😂😂
I know this is just a joke anyway, but that "technician" is probably some chinese kid assembling hundreds of those a day. You need to add some stupid detail to narrow it down further. Maybe there are 9/10 prints on the card and this is explained by the technician having a cut on his 10th finger and taping it. But only for 20 minutes for the bleeding to stop, so you can narrow it down to a handful of cards whose owners you can then manually check out. Yes, I totally should write episodes for Navy CIS.
"All 40 GB used!" Well, that's just how tensorflow works. It reserves all memory on the card. With batch size 512 and fp16 training ResNet 50 will use maybe 16 GB? Not sure, I use Pytorch.
Came here for this comment. Although your estimate seems to be off: the memory usage for half precision ResNet50 and 512 batch size should be closer to 26 Gb, putting it out of reach for training on a 3090. I am glad this kind of production work finally gets coverage though.
That was probably the rate it was being filled though. When I load GPT-NeoX-20B PyTorch allocates 40GB almost instantly, and then fills it up. That's different to loading a model with HuggingFace transformers, where usage increases relatively gradually like the use case in the video.
I'd be interested in seeing how they compare rendering a single tile all at once. That's what got me when I switched from a 2080 Ti to a 3090, not realizing right away that it would render 2k tiles without sweating, but rendering out a bunch of smaller tiles they were rendering basically the same times.
Ohhhh this takes me back to my grad school days when these cards were branded as "Tesla." It was wild how a single (albeit $10k) graphics card could enable me to run numerical simulations faster than I could on my undergrad's Beowulf cluster. This was back in the days where if you wanted to do GPGPU, you were writing your own low-level routines in CUDA's C library and mostly having to figure out your own automation. I can't even imagine how things have changed now that you can abstract everything to a high level python API and just let the thing do its thing.
It's pretty fucking dope. Source- I worked with a guy doing his PhD in computational neuroscience using a python API machine learning algorithm to search for putative appositions in rat brains.
Check the temperature of a100. If it's touching 80 C then there is a performance throttle. Tried similar setup with v100, there was huge performance gain using a 6000 rpm fan for cooling.
19:37 The memory usage is a bit misleading. If you are using tensorflow, it always reserves all the GPU memory when it launch even though it don't actually use that much.
If it was meant for crunching big datasets and machine learning, I would expect it to be optimized for massive mathematical computations not rendering. (large matrix multiplication)
as a structural biologist that uses this kind of hardware fairly often, its really cool actually contextualising how crazy the hardware we use is hahahaha
man I wish I was half as smart and cool as you guys. I enjoy watching these videos as a casual consumer and its so cool you guys basically dance around code and near instantly recognise items instantly and with ease! I mean seeing the teardowns are always a joy too.
This must be EXTREMELY confidential if you had to go to SUCH an extreme just to prevent your "lender" from being identified. I know doing this is very much a risk for them, so good on you for minimizing that risk for them.
Okay, I, just, perfectly understand, but, unfortunately, I have one, huge, question for you, Linus. Preventing your lender from being identified, it, just, kind of boggles me. This is, because, in the words of car enthusiast, @Doug DeMuro, in his first video of one of Japanese car manufacturer, @Nissan’s 4, very obscure, “Pike Cars”, The 1989 @Nissan S - Cargo, quote, “Why would you do this?”. I need, in this context, that, question to be answered, and, after you ship it back, Linus, actually, reveal who, exactly, though, sent you the card, or, maybe in your next Tech Tips episode, or, maybe, even, in next week’s WAN Show.
This is a very interesting video. I liked it a lot more than I thought I would. There isn't much coverage of Nvidia's data center line from tech TH-camrs typically so it was fun learning a bit about that aspect of their GPU division 😊👍
I love Linus and Jakes's chemistry so damn much and I love how it has evolved. Jakes started out has the apprentice while nowadays, he's actually the more mature of the two
6:56 in response to Jake’s alcohol offer, Linus - “I like to go in dry first.” Did you notice Jake’s immediate modified face-palm? We know you laughed, Jake. Gotta ❤️ these guys.
There's a typo at 3:25 The A100 has ~54 Billion transistors on it. The 54.2 million listed would put the card firmly between a Pentium III and a Pentium 4 in terms of transistor count with a curiously big die for the lithographic node.
you guys are the geekiest.....I don't understand a thing you are talking about, but I am fascinated and really enjoying watching and listening to you geek out....I will like and subscribe just to reward your enthusiasm!!
@@spaghettiarmmachine7445 here is the dictionary definition of "geek" "engage in or discuss computer-related tasks obsessively or with great attention to technical detail. "we all geeked out for a bit and exchanged ICQ/MSN/AOL/website information" It was not meant as an insult or derogatory. I do believe they engaged in computer related tasks with great attention to technical detail. Anyway I loved their enthusiasm for their subject and although I did not understand it.... I enjoyed watching their absolute joy discovering the technical intricacies of the product they were reviewing. Sorry if I offended you...
@@spaghettiarmmachine7445 How tf do you watch this video and *NOT* think they are LMFAO. Like bro when you're literally fawning over a piece of computer tech that pretty much no normal consumer will ever own in their life, and spitting nerdfacts and terminology that almost nobody will intricately understand unless you have a very deep grasp of the subject matter...at that point is literally the definition of the word.
Yeah, I've seen AI papers that say they had to train AI models on 512 of these for 10 days straight! The cost of neural networks is immense, both financially and environmentally.
It's pretty amazing how fast we can render stuff with consumer hardware nowadays. Blender's 3.0 version (with Cycles X) renders even faster now. What used to take multiple minutes a few years ago with my GTX 1080 now takes less than 30 seconds with my 3080
Yeah and Linus is still using the 2.x version of Blender here which uses tile rendering. 3.0 uses progressive rendering and is in some cases double the speed. Linus should upgrade blender :)
switching a 3090 card with an a100 card keeping the 3090 cooler would be the most expensive and wierd gift/prank on your homie. or if its not a gift i mean its a pretty wierd flex
The near double performance you're getting near the end is because of nvidia's 1:1 fp16 ratio on consumer GPUs, also nvidia has a 1:64 fp64 ratio on consumer GPUs. Basically they artificially limited consumer GPUs to force machine learning(wants fp16) and scientific calculation(wants fp64) people to buy way more expensive "professional" products. LHR is just a newcomer to the party, not exactly a new concept.
Can you elaborate on what you mean by '1:1 fp16 ratio' and '1:64 fp64 ratio'? (I am familiar with the datatypes themselves and the consequences for performance/vectorization on CPUs)
@@Cyberguy42 performance compared to fp32, 1:1 meaning it's the same speed and 1:64 meaning fuhgeddaboudit. With a tiny bit of extra circuitry you can make fp16 go twice as fast and fp64 half speed compared to fp32. At least Nvidia relented and unlocked fp16, because that used to run at abysmal speed as well, so you can at least save on memory (which is also scarce on consumer cards) and make use of tensor cores for some operations.
Yeah that's not really accurate. Geforce cards physically don't have many fp64 units to make them cheaper, also tensorcore fp16 matmul with 16-bit accumulator runs at full rate. Only fp16 matmul with fp32 accumulator is artificially restricted, which can be useful in some workloads but often accumulation in fp16 does the job. The 1:1 fp16 ratio you talk about is on Cuda cores which nobody uses for deep learning since tensor cores are much faster.
watching linus handle someone elses expensive hardware is like watching a thriller
now viewing it again whilst Michael Jackson Plays in the background. !
You just know that the guy who loaned this card to LMG is watching this video and cringing perceptibly every single time Linus does 'a Linus' to his $10,000 card.
Lmao it was cringe as hell, i was traumatized throughout the entire video.
@@BLCKKNIGHT92 ok soy
Cuz this is THRILLERRR
thanks for pointing out Jake, really helped me recognize him.
But who's that other guy with him?
Just what I had in my mind
@@qovro just what i wanted to say ...whos that dude doing all the work
I mean its right there above my coment
Who's the other guy they didn't tag him
Nvidia : "no linus you can't have that"
Linus: "and I took that personally"
Linus always finds a way
@@generalgrievous2726 Well, the way has found him.
Reminds me of Michael Reeves "You lied to me, Boston Dynamics." XD
Similar energy, its just that Linus has a lot more self control and professionality :P
They just did not want him to drop it lol
Just makes no sense to send him this sort of card. The people buying them don't get tech advice from fucking linus lol.
*Linus who has broken something, on everything, in every video created*
Linus: “I don’t know why Nvidia wouldn’t send us the card”
Soon To Be Every Video, jk, ^_-
Linus sex tips
Some guy sends it and essentially says please don't fuck it up
Linus: drops it almost immediately
🤣
nVidia is unlikely to really care that much about that aspect of it, they can have bookkeeping write it off as a promo cost and deduct it from taxes, if they really care. What they DO care about is Linus shitting on the card with stuff that doesn't matter, really, but non-techie customers may think matter. See, the way that you quantify "value" for something like this isn't the most intuitive thing in the world, and has no relation to the shizz Linus is talking about, but the big kahunas of datacenters, and their investors - again don't understand how that stuff works and may misjudge it based on faulty reasoning.
If I build high end workstations for a living, and a fortnite kiddie wants to review one - I will say FU NO! to the kiddie - not because my workstation can't play fortnite but because how well it does that is irrelevant, AND I gain absolutely nothing from that review, while risking a lot - hardware getting broken, bad rep possibly etc etc.
Linus: "We can't just go out and get an A100 because it costs almost $10,000"
Also Linus: Creates a solid gold Xbox controller that's worth more than many people's houses
Most people don't even have houses
That gold can be melted down,allowing you to recover most of its value.Try doing that with a graphics card.
@@monsterhunter445 the comment would obviously apply to people that own houses.
You are the type of people who like to quote out of context. but it's okay
Well, if he spent all his money on the golden gamepad, that could be why he can't afford the A100.
A100s are no joke, no wonder AWS wants to bill me three arms and a leg to spin up instances with them just so I can get a "Sorry, we don't currently have enough capacity for this instance type" screen!
The shortage is in the cloud! (Obviously but it's funny to say)
@@CreativityNull "It's... it's all in the cloud?"
*cocks gun*
"Always has been."
I had to write custom scripts which run endlessly to request the p4d instances (which has 8 of those, but the 400W versions) on aws, as they are not available in any AZs. Luckily the script managed to get one of those after 2 days in us-west-2
to be fair, p4d.24xlarges have 8 of these in them
the reserved prices are not too bad, considering the hardware
Same for top end azure instances rn
Fun fact, our A100 servers (8 80 GB SXM A100s per server) each have a max power draw of close to 5 KW. And Linus and Jake were right! Even with the 80 gig models, we still wish we had more memory. Never enough memory!
Big Iron.
what are you doing with that hardware?
@@velo1337 Skynet, duh.
@@velo1337 ur mum
@@velo1337 Playing Crysis, probably.
LTT: NVIDIA refuses to send us a super powerful gpu
Also LTT: drops a $10k cpu by accident breaks it and attempts to fix it with a vicegrip
It's not a LTT video if something expensive doesn't get dropped
@@Immadeus he litterally knocks the thing over not too far in lmao
@@Shadowclaw6612 thats the silver one not the one he got sent
If I owner of the GPU my condition would be "You have to return it in working condition, or buy a replacement, but do whatever you want"
@@Shadowclaw6612 that's done for comedic effect.
I replaced one of these cards for a customer who had 3 of them in total in a Dell 7515 server running dual AMD Epyc 7763 64 core processors. I remember thinking this APU is worth more than my car.
At that point it may be worth more than a small apartment.
@@megan00b8 *Cries in Australian*
@@megan00b8 In my country, it's worth more than our life long income
It was always fun working in a customer's cage and you open up the shipment that FedEx delivered and it is beat all to hell and find 6 server GPUs or a line card full of 100Gig Optics and realize that the package is worth more than you make in 5 -10 years.
@@IgoByaGo cage?
Just wanted to point out that TensorFlow by default allocates the whole memory even if it's not using it, so the A100 may benefit from a larger batch size
th-cam.com/video/Fe9zPOZvDxI/w-d-xo.html
Yeah! that what this GPU is for. You can train really big stuff there!
This is usually used in data centers right? so this might be what we've been sharing in cloud computing
@Jesus is LORD Hey dude, remember when those little kids made fun of a guy for being bald, so God sent a bear to kill and eat them?
@@ChristopherHallett man, the old testament God was way cooler than the new testament one. At least regarding roman-era like entertainment
I was one of the people handling repairs on amazon servers and I’ve seen thousands of them. They are crazy. Of course I can’t test them but just holding it you can tell it’s a beast
Wait, thousands went for repair.....? So they break often? 🤔.
@@EnsignLovell i think he meant more in a metaphor Type of way
@@Sn1ffko definitely meant he had to go to the datacenter itself and saw all the cards there in the racks
That’s what she said
@@GodlyAwesome yeah I’ve repaired thousands and thousands of server racks. And they have sections dedicated for graphics cards and stuff. In a single server it would have anywhere between 2-12 graphics cards.
Amazing how I can understand so little yet be so thoroughly entertained. 10/10.
actually funny
@@xqzyu ye
Linus sex tips
I’ve been a PC guru for over two decades and even i’m outta my league here.
Oh thank God! I thought I was the only one
Nvidia should sell this kind of cards to miners instead of selling consumer-grade gpu's in bulk to them.
they would still bot buy them even with this
@@Whatismusic123 Despite the high price, they still would because it's like 100% more efficient for hashing. Just like Linus said, the running cost (electricity cost) of a gpu for mining far outweighs the price.
yes
Most miners wouldn't buy this because they're just not eligible to. For the cost of 10 of these cards you could've purchased like 50-60 3090's even at these high prices and gave them proper cooling which would far outhash those enterprise cards. Yes it's cheaper to run those enterprise cards for the long term but you'd be looking at how long ethereum will last rather than how long the card will last
This GPU would be pointless to a miner, because it costs $10,000 and it would take them months or even years for them to justify the cost of it from mining, it doesn’t take extremely powerful cards to mine.
I'm glad Jake said "ah, it has an IHS" because for a split second I thought that was all GPU die and nearly had a stroke
amogus
Yeah, me too. That would've been the most monstrous die I've ever seen.
Same. I couldn’t believe what I was seeing!
fucking exactly
yesss, my exact thoughts
I love getting to see the incredibly expensive equipment that runs data centers, even though I understand about half of what they are used for. The efficiency is just insane
Understanding half of what goes on in a data center isn't too bad, though.
Basically, it has half the gpu cores, but way more AI cores, apu, to do AI tasks, at about half the power.
6:58
The nvidia employee watching the chip serial number: 👁️👄👁️
Lol
How did the not think of that. The SECOND they showed the taped serial, I thought "they'll probably fuck this up somehow" @@Zatch_Brago
By default, Tensorflow allocates nearly all the GPU memory for itself regardless of the problem size. So you will see nearly full memory usage even for the smallest model.
cuda_error = cudaMalloc((void **)&x_ptr, all_the_GPU_mem);
As much as I like LTT, they never do benchmark's involving AI/Deep Learning properly.
@@gfeie2 start with USIZE_MAX memory and binary search your way down to an allocation that doesn't fail XD
Oh that explains a lot. I was wondering how they managed to tune it so perfectly, because Pytorch would simply crash if you tried to use more memory than available.
should've used pytorch yeah
@13:55 what you guys are totally missing is that, the A100 has fewer CUDA cores but they do INT64/FP64 half throughput of INT32/FP32. The 3090 is what, 1/16th throughput or something? It's meant for higher precision calculation. The Desktop and Datacenter cores are different. You need to do a test on 64-bit calculations to compare.
Didn't understand but you sound like you know your shit
Nerd
@@kvncnr8031 It does 64-bit math like 10x faster than the 3090. So it's better for where you need high precision. Neural networks in particular can get away with much smaller numbers, like 8-bit values in the network. A bit is basically 1 or 0 in a binary number, so the number can represent a larger value with more bits. Or if it's a floating point number, it can have more precision (ie represent more decimal places). For scientific computing, like modelling the weather or physics simulations, you want higher precision math. That's why the A100 is tailored for 64-bit math, where as the 3090 is tailored for 32-bit math and below, which is the most common precision used for graphics.
I think the amount of Tensor cores is also different. Not even sure the older graphics cards have Tensor cores
@@Daireishi i like your funny words magic man
At this point, the GPU has become the real computer and the CPU is just there to get it going.
cpu is the coworker that got in because their relaitive works there
Cpu handles multi tasking/ software management. Without cpu we wouldn't have multiplayer games.
@@cradlepen5621 Singleplayer is the future
The computer is as fast as it's slowest component. As example if you have a game that uses the GPU for everything but for some reason decides to use the shadow calculations using the CPU....you are limited by the CPU.
As you increase the GPU, you need to increase the CPU.
Then you'll be sitting at loading screens thinking "Why is this taking forever to load? My CPU and GPU are a beast"
But you are running a standard HDD...Ahhhh time to upgrade! NVMe SSD FTW!.
It's all a balance and why building your own PC will always be better (when you know when you are doing) compared to just buying a PC.
This is the weirdest take I've read all week.
I like how youtube has labelled this video as "Exclusive Access" as if Nvidia have allowed this at all lol
Linus sex tips
They have? where?
That looks like sponsor block to me...
that's soapser block (it's blacklisted)
I always love the moments where I realize that 3090s aren't the peaks of it's generation.
They probably have the technology for 10x 3090 but not good for business to lay it all now
In terms of gaming cards, it is top of the line
@@evanshireman5644 well it's not. the 6900xt is mostly faster at 1080p and even at nvidia, there's a 3090 ti in existence.
@@ProjectPhysX except for those that are memory limited.
@@Ornithopter470 yep, you can never have enough memory... but 80GB is already quite a lot :D
One day our grandkids will call this GPU the "potato/calculator", just like we call all the hardware that launched people into space 50 years ago...
well we did hit the size limit for our logic gates and whatnot, and quantum tech is only used for crunching numbers. So that's unlikely.
Crazy to think this much power could be available in a phone in 10 years.
@@jorge69696 Also no, size constraints
Ah yes the A100..
A outdated historical relic compared to tech in 2077
or the classic we have those in our phones now
PS3 and Xbox360 games still look graphically impressive. We're not advancing as fast as before.
The reason I like Linus videos is that even though I don't understand 90% of the content, I still enjoy watching it without skipping a second. Keep it up dude
A1000 vs GTX 3090
A1000 having similar number or slightly more lanes per core runs at a lower power consumption while having almost 2x computing power.
So A1000 is more efficient at computing numbers workloads but not so much with graphical loading.
Same thought mid video
@@DmanLucky_98 A1000 looks like a big golden chocolate bar
@@DmanLucky_98 A1000 go brrr
Bro I'm here for the segways
Nice comparison. You could've rented one of those bad boys on Azure for less than $4/hour for the benchmark. In fact, 8 GPU A100s connected through NVLink are expected to be 1.5X times faster than stacking 8 A100s connected through motherboard.
While machine learning can be sped up using more memory, there are things that you literally can not do without more VRAM. For example, increasing the batch size even further will very quickly overwhelm the 3090. Batch size, contrary to popular belief, is not "parallelizing" the task, but actually computing the direction of improvement with higher accuracy. Using a batch size of one for example would not usually even converge on some datasets, and even if it does, it would take ages to do so.
big batch sizes dont converge necessarily either, which is why you might want to start with a big one but lower it eventually as training goes on
Also, it depends a lot on the what is used. If you're running inference and your model is big, it will need a lot of VRAM (proportionally to the model size) and won't run if it doesn't have enough. You *could* split the model between cards, but it's running into bandwidth, model and performance problems.
I assume we're talking about neural networks. Using bigger batches just means feeding more data sets into the model before backpropagation. Why does this increase memory usage linearly?
@@nottheengineer4957 Imagine sending 32 images of 512x512 pixels in size with three channels, thats a batch of 32, which would be a fp-32 tensor of size 32*512*512*3, a bigger batch size would mean a larger floating point array to be handled by the GPU. So, batch of 64 would be a tensor of 64*512*512*3. This is effectively doubling the total memory required to process the tensor.
I understood like two words
Jake is really growing
Bien
Jake is how I imagine young gabe newell
@@tabovilla a Mousquetaires Gabe
Jake show I imagine you gabe newall
Bien
Linus - “I like to go in dry first.”
Jake- *Please don’t look at me.*
😂
Why does Linus surround himself with fat dudes ???
@@travisash8180 they bring food with them
@@aryanluharuwala6407 I think that Linus is a chubby chaser !!!
That is definitely NOT what she said.
Linus you look great with a beard! Long time sub here from wayyyyy back when you and Luke did those build wars and watching your videos back when you had that old space where you connected each pc daisy chained to a copper water cooled set up. It’s awesome to see your sub count and how far you’ve come since I last watched your videos. He you are your family are all well and enjoying the holiday season!
@Teamgeist the beanie suits him!
250W for such a card is excellent. I was expecting more like 400W up.
7nm TSMC, that's why
@@mihailcirlig8187 i think the A178-9 and the NVIDIA 9050 is way faster. I have it currently.
250 is still a lot bro
nvm
The A100 SXM version does have a 400W draw.
Everyone with any pc building experience: "So graphics cards take pci-e power connectors and attempting to plug an eps connector in instead would be bad right?"
Nvidia: "Well yes but no"
it’s a power connector. it’s like saying nema 5-15p connectors can only be used in the usa.
@@snowyowlll Well yeah i'm just referring to the pinout
(Yes i know they're made so it's impossible or at least a lot harder to put one connector in the wrong spot)
So funny thing about that. The keying for PCI Express 8 pin and EPS 12 volt is basically compatible. The only difference between the two connectors is PCI Express has a little tab between pins seven and eight. If you were to plug a PCI Express power connector into an EPS 12 volt port you basically end up shorting 12v to ground. It may or may not know from experience 🤪
Trying to imagine a world where fans reach out to you to give you a 10k GPU whilst I struggled to obtain a 3060 so much that I bought a whole prebuilt PC just to pull it lol
Influencer live is pretty dank, innit... ahh, the dreams...
lmaooo even i did the same thing recently
I have a 1060
For me, I bought a laptop instead. Lenovo legion 7 (16" 16:10 version) with rtx 3060. you would think a laptop with that GPU wouldn't have the same performance as the desktop PC equivalence, but the laptop is big enough for the heat and everything that it is extremely close. it runs the same performance and sometimes higher than my friend's rtx 2070 desktop GPU
Oh and it was around £1600, one of the best bang for the buck price performance wise for a gaming laptop. beat only by Lenovo legion 5 pro which is a bit cheaper but looks quite uglier
When I interned at this machine learning lab I got the opportunity to train my models on a supercomputer node which had 4 of these cards. Even though my code was not optimized at all, it crunched my dataset of 500.000 images for 80 epochs in about 5 hours. For reference, my single RTX2060 Super card was on track to do it in about 4 days.
I think the main advantage of these cards in machine learning is mainly the crazy amount of memory. My own gpu could handle batches of 64 images while the node could at least handle 512 with memory to spar (I didn't go further as the bigger batch sizes give diminishing returns in training accuracy)
I get what you're getting at but that comparison seems to be a bit extreme. If you put your workload on one a100 only that costs 10000$ and then on two 3090 that cost ya 2000$, you would save a lot of money and get better performance. If you consider the power usage then yes, you would be saving but still to get to 8000$ worth of difference it would take many years. People of course pay for these things because they are made with tons of memory and linkability and data centers need that but comparing just the processor power these chips aren't better than the more affordable gaming cards. There's a big price hike that nvidia applies to the pro cards because they can and the clients can and do pay.
As a tip, Nvidia-smi runs on Windows too, its included in the driver.
I used to use to lower the power target without needing to install anything.
mine always closes immediately and I cant change settings. Been working to get a tesla functional on my rig and haven't been able just yet.
@@tobiwonkanogy2975 Add that directory to the windows path.
Interesting thing with NVIDIA drivers is that they are essentially the same cross platform. That's why NV wont release source.
smi can also be used to overclock and adjust memory timings, that 174MH could be 200+ with tweaks.
Thanks for the tips ill try em out
"You can do anything you want with it"
Linus: *drops the card*
Linus: "I've found my gold"
Jake: "what?"
Linus: "Yvonne"
Jake: *dies of cringe*
the way he said "Yvonne" was so endearing tho
@@WyattWinters I mean to be fair, that's how I feel about my wife and when you find the one you just know it
Such a lovely moment! I hope she sees it accidentally and smiles
Jake: *dies of cringe*
Audience: AAAWWWW that's so sweet!
As a married man, I saw this coming from a mile. That's sweet.
Building a new server at work and I’m using one of these. Pretty excited
What sort of things do they even use these for, is it like protein folding models in biotech firms or something?
I haven't messed around with an nVidia Tesla GPU past the Maxwell line but I do remember it is possible to switch them to WDDM mode through nVidia SMI in Windows command prompt which will let you use the Tesla GPU for gaming provided you have an iGPU passthrough. By default, nVidia Tesla GPUs like the A100 will run in compute mode which Task Manager and Windows advanced graphics settings won't recognize as a GPU that you can apply to games and apps. But idk if WDDM has been removed in later nVidia Tesla GPUs like the A100 or not.
You said you did what in the who now 😕😵
I recall reading WDDM not being available by default on some modern Tesla cards because the standard drivers only support TCC mode and specific driver packages from Nvidia are needed to do it. I have no idea how this applies to Ampere but I imagine it's similar.
You need more halo lore vids lol
@@SkullGamingNation fr
so strange when two of my completely unrelated hobbies come together randomly like this
So glad you guys are now including AI benchmarks. Please continue to do so! Some of your viewers are gamers and data scientists!
What is a data scientist In terms an idiot can understand? 😂
SOME of their viewers are gamers?
@@DakanX Some viewers are *BOTH* gamers and data scientists.
It was just good here because of the GPUs involved.
You'd get a significant boost in speed with Blender when you render with a GPU of you set big tile sizes, like 1024 or 2048, under the performance window.
@@Barnaclebeard me? What? why?
256 for 1080p renders and up. that is how you get the fastest speed. if it is 4k, you go with 1024.
I dont get it. I got 4gb doodoo gpu and blender automaticly sets it up to 2048
@@sayochikun3288 modern blender doesn't use tiles the sane way
@@1e1001 but in the video there is older blender 2.9 or 2.8
"No I like to go in dry first" Accurate depiction of how Linus' treats hardware
Linus: "We'll mask the serial so they can't find the person."
**Shows the chip serial instead**
also device id at 14:36
...hmm
@@hrithvikkondalkar7588 The device ID is not unique. Every single card of the same model will have the same Device ID. For example, every 980Ti the same as mine (I can't say which specific model of 980Ti it is, as I bought it second hand with a waterblock fitted) will show 10DE 17C8 - 10DE 1151. You can google that and see for yourself.
That's not the chip serial. That's the model and revision.
Device ID is same across GPU models, it's part of the PCIe spec.
The difference in finishes you see at 6:10 looks like part of the shroud was milled. The matte parts look to be as-is (likely stamped or cast depending on the thickness). The smoother parts with lines going in squares are milled (kind of like a 3D drill to cut away material). This means they were taking higher-volume parts and further customizing them for these cards (milling is done for much smaller production runs than stamping or casting/molding).
Jake’s “It’s just so thick, why would you ever use it?” about the “spiciest” 3090 DID NOT age well now that the 4000 series’s are out 😂
But the 4000s suk
@@everythingsalright1121 they're great gpu's just the price is out of this world
@@everythingsalright1121 Turns out that was a lie. the 4080 and 4090 are really good cards, they're just horribly overpriced.
For training deep learin ai, machine learning or similar, this one is a beast. For rendering also great because both need lotsss of GPU memory.
Agree, my AI buddy already has his company ordering a few (80 GB model) where 2 of them will go to his high-end workstation. How lucky. But like you said, if you have large scale learning data sets or doing deep learning, these hards are at the top. Anything else, and these cards are likely not worth it.
you know tbh
this gpu could make a nasa supercomputer
6:15 This might be the point where the case was mounted to a big industrial suction cup. Manufacturers often do that when spray painting a piece of metal. You can see that on a lot of metal stuff that doesn't need to look good from the inside.
Whoever sent that in is literally putting their job on the line and in the hands of a clumsy Linus, watching him take this apart gave me huge anxiety! 😂
Kinda doubt it, if you need that much, why send it. He now can't use for x amount of days
Probably "borrowed" from his work and hoping his manager doesn't see any identifying marks on the missing card.
Also, if you gonna steal a 10k card you're probably not that bright to begin with.
@@RomboutVersluijs He wanted it back with a cooler and a mining benchmark he probably doesn't know how to use it. lol
Nope probably just a miner with extra cash looking to increase efficiency. 70% higher hash rate over a 3090 w/ 25-30% lower power consumption seems good but at 3-4x the initial cost. It will take 5 years of continuous running to pay for itself, assuming about 5.50 a day profit. 3090 will pay for itself in 2.5 years.. this is all of course assuming crypto remains completely flat, which is highly unlikely.
If I can get some of these at a good discount I will probably pick some up.
"Yo test mining with it", dude prolly jacked the thing from somewhere or got it from the market and threw it a linus before he puts it with the rest of his mining operation to see what he's dealing with.
"No, I like to go in dry first" 6:53
- Linus 2022
"Your butt is nerds butt" Yea these guys super ghaaaaaayyyyyyyyy
Can you imagine the process that guy probably had to go through for sending that card over? Like disclosures for if Linus drops it or Jake misplaces a screw lol
Number one thing I thought of when I saw the title was there done with linus dropping their shit 😂
Well, since it was quasi-legal and trying to keep it on the DL, I'd say he just wrapped it up in bubble wrap and a box and sent it UPS.
Pretty sure if Linus broke it he'd buy a new one
@@filonin2 well, it's 100% legal, he just didn't want to ruin relationship with Nvidia.
I would not trust a shipping company to handle it appropriately during transit...
I hope the guy that is the owner of this card didnt die 14 times from a heart attack...
Also thank you actual owner for making all of us able to watch this tear down and video!
Tensorflow allocates all of the GPU that you give it. That's why the VRAM usage is almost 100% in both cases. 512 batch size on a ResNet50 barely uses any memory, so this benchmark might not actually be pushing the cards to their limit.
22:08 that aged well...
Indeed
When, after admiring Linus and the crew for years and counting, you realise you bought a pair of those babies at work and you have an ssh key to login and use them you immediately figure how far you have gone since the first inspiration you got from LTT. Thanks guys, you are a good part of what I’ve got to!
People encrypt their backup.
You better ALSO encrypt your .ssh/ xD Holy crap. Congrats on your achievements tho!
I like how they always make a separate video for the top of the line HPC/professional nvidia card of each generation and make it so hype like it's a gaming card and it just released instead of year(s) ago. I don't mean that in a negative way.
Note on TensorFlow and VRam utilization: TensorFlow allocates all of the available VRam even though it might not use all of it. Furthermore, in my studies, models ran considerably slower with XLA enabled. Would be interesting to know how the cards perform with XLA off!
1:59 ✈️ 🏬🏬 bro 💀💀💀
😮
what's wrong with you
@@madbanana22 maybe he got free healthcare that's why he's joking about lol
Bro wtfff 😭😭😭😭
1:48 Yeah.... That's a VERY GOOD SIGN about Linus returning the fan's GPU *"in working condition".*
"You can do anything you want with it"
*Linus chooses to drop the card*
The HBM2 will be saving quite a bit of power vs the GDDR6X on the 3090. It’ll also be a huge boost in some workloads. TSMC’s 7nm process is no doubt better than Samsungs N8, it’ll be interesting to see how Lovelace and RDNA3 do on the same 5nm node.
TSMC N7 is better than Samsung's 8 nm for sure, but the reason the A100 is so much more efficient than the 3090 is not because of the die technology.
RNDA3 will use MCM technology. This will possibly allow AMD to win in rasterization performance and be much more power efficient than Lovelace.
People will shit on you if you even dare to mention that RDNA 2.0 is worse than Ampere as an _architecture_ because it has a pretty significant node advantage and still only trades blows with Ampere. But just look at this Ampere on TSMC's 7nm, it's quite darn efficient. It will indeed be interesting to see the Lovelace vs RDNA3 on the same node.
@@PAcifisti well yes but this a100 card also has a die like 3x the size so it spreads heat out better then any of the gaming cards
@@PAcifisti To be fair, the A100 has low clocks and has a massive die size at 830mm2.
It's not even fair lmao.
Same thing about current desktop Ampere: 20% larger than the largest RDNA2 die.
I would so buy this whole thing
the promoted service at the begining seems very tempting
When Linus said "I found my gold" I thought "how sweet, he’s talking about his wife" and jake was just "pshhh please" xD
5:15 Great, now Nvidia can super sample that fingerprint and find the technician that assembled the card, figure out where the technician worked, then where this card was assembled, then track down where it was sold, so they can find who it was sold to. oh no 😂😂😂
I know this is just a joke anyway, but that "technician" is probably some chinese kid assembling hundreds of those a day.
You need to add some stupid detail to narrow it down further. Maybe there are 9/10 prints on the card and this is explained by the technician having a cut on his 10th finger and taping it. But only for 20 minutes for the bleeding to stop, so you can narrow it down to a handful of cards whose owners you can then manually check out.
Yes, I totally should write episodes for Navy CIS.
@@lunakoala5053 I mean imagine working in a factory to be assembling something this expensive
@@lunakoala5053 Collab with Joel Haver maybe? damn ❤️
@@fjjwfp7819 *imagine working in a factory to be assembling something this expensive, that ends up in Linus steady hands 😂😂
I think the "cooling solution" would have worked better if you would've reversed the airflow
ok
ok
ko
ok
so...does it?
bend a multi layer pcb and you run the risk of breaking tracks guys that may "reconnect" temporarily then disconnect randomly under under heating/etc
That fan sending the GPU in for them to do whatever they want to it almost makes up for them being a cryptobro.
Almost.
You are the reason they decided to stay anonymous. You and the whole toxic gaming community.
cringe
@@joeschmo123 what is cringe
@@AR15ORIGINAL you
@@joeschmo123 W
Linus: "It's not ribbed for my pleasure"
They're never ribbed for "Your" pleasure.
Flip it inside out? Lmao
No, Linus said it quite right... ;-)
@@oldguy9051 it means he's the one getting poked
Depends on your configuration.
Hey, we don't know what Linus and Yvonne are into.
"All 40 GB used!"
Well, that's just how tensorflow works. It reserves all memory on the card. With batch size 512 and fp16 training ResNet 50 will use maybe 16 GB? Not sure, I use Pytorch.
Your comment makes me question my education.
I like playing Minecraft and watching TH-cam while I eat chickey nuggeys. We are not the same.
Came here for this comment. Although your estimate seems to be off: the memory usage for half precision ResNet50 and 512 batch size should be closer to 26 Gb, putting it out of reach for training on a 3090.
I am glad this kind of production work finally gets coverage though.
That was probably the rate it was being filled though. When I load GPT-NeoX-20B PyTorch allocates 40GB almost instantly, and then fills it up. That's different to loading a model with HuggingFace transformers, where usage increases relatively gradually like the use case in the video.
@@maaadkat ResNet 50 is a different model though. Comparatively tiny by today's standards. Most of that memory is used by intermediate activations.
1:48 The fear in that man's face is surreal. Like he had seen a ghost 😂
6:53 Linus out of context:
I'd be interested in seeing how they compare rendering a single tile all at once. That's what got me when I switched from a 2080 Ti to a 3090, not realizing right away that it would render 2k tiles without sweating, but rendering out a bunch of smaller tiles they were rendering basically the same times.
ooooh, unlisted with 5 views? nice
Yup
ok
Why is is unlisted?
@@noahearl nah, I just got to the video originally when it was unlisted with only 5 views
@@Dominatnix me too 😃
"I'd like to go in dry first" -Linus Sebastian 2022
Ohhhh this takes me back to my grad school days when these cards were branded as "Tesla." It was wild how a single (albeit $10k) graphics card could enable me to run numerical simulations faster than I could on my undergrad's Beowulf cluster. This was back in the days where if you wanted to do GPGPU, you were writing your own low-level routines in CUDA's C library and mostly having to figure out your own automation. I can't even imagine how things have changed now that you can abstract everything to a high level python API and just let the thing do its thing.
cool story
Thats pretty darned hardcore :)
No I totally understand what you mean. Every single word. Totally.
It's pretty fucking dope. Source- I worked with a guy doing his PhD in computational neuroscience using a python API machine learning algorithm to search for putative appositions in rat brains.
that sounds like it makes sense.
Check the temperature of a100. If it's touching 80 C then there is a performance throttle. Tried similar setup with v100, there was huge performance gain using a 6000 rpm fan for cooling.
9:20 was so unexpectedly sincere and now I’m crying while watching a fracking tech tips video
6:53 Linus: Nah, I like to go in dry first...
great 😄
19:37 The memory usage is a bit misleading. If you are using tensorflow, it always reserves all the GPU memory when it launch even though it don't actually use that much.
yes
yeah, you need to set the memory growth to true, to actually get an accurate memory reading.
If it was meant for crunching big datasets and machine learning, I would expect it to be optimized for massive mathematical computations not rendering. (large matrix multiplication)
Correct - the FP64 numbers even on the A5000 - A6000 completely destroy the RTX 3090. I'm not sure why he tested blender with it lol.
as a structural biologist that uses this kind of hardware fairly often, its really cool actually contextualising how crazy the hardware we use is hahahaha
No you not
man I wish I was half as smart and cool as you guys. I enjoy watching these videos as a casual consumer and its so cool you guys basically dance around code and near instantly recognise items instantly and with ease! I mean seeing the teardowns are always a joy too.
You can fit a whole operating system in the VRAM of that card. Crazy.
Windows 11 is like less than 10 or 5 gbs
I know what you mean, but you know, DOS fits on a floppy
Most Linux Distros as well
There are fully featured Linux distros that are less than 100 MB, so that's not really that impressive.
Use lubuntu. That's what, less than 1 GB? It performs pretty decent too.
This must be EXTREMELY confidential if you had to go to SUCH an extreme just to prevent your "lender" from being identified. I know doing this is very much a risk for them, so good on you for minimizing that risk for them.
Okay, I, just, perfectly understand, but, unfortunately, I have one, huge, question for you, Linus. Preventing your lender from being identified, it, just, kind of boggles me. This is, because, in the words of car enthusiast, @Doug DeMuro, in his first video of one of Japanese car manufacturer, @Nissan’s 4, very obscure, “Pike Cars”, The 1989 @Nissan S - Cargo, quote, “Why would you do this?”. I need, in this context, that, question to be answered, and, after you ship it back, Linus, actually, reveal who, exactly, though, sent you the card, or, maybe in your next Tech Tips episode, or, maybe, even, in next week’s WAN Show.
@@avoprim5028 Bruhhh there's a reason why they're unidentified.
@@avoprim5028 Why bring Doug into this?
@@leepicgaymer5464 yeah im confused
What’s nvidia gonna do though assassinate the lender? Just sounds paranoid to me.
This is a very interesting video. I liked it a lot more than I thought I would. There isn't much coverage of Nvidia's data center line from tech TH-camrs typically so it was fun learning a bit about that aspect of their GPU division 😊👍
The deep learning performance mostly comes from more memory and faster memory.
I love Linus and Jakes's chemistry so damn much and I love how it has evolved.
Jakes started out has the apprentice while nowadays, he's actually the more mature of the two
I found it the opposite, awkward between the two
Ants the real brains at ltt
@@fuckingwebsite1 The Mancubus, thankyou very much.
you love their chemistry ? do you want to see them naked "googling" each other ? you do ! don't you !
@@yellowboat8773 must have been one of those negative reality inversions...
6:56 in response to Jake’s alcohol offer, Linus - “I like to go in dry first.” Did you notice Jake’s immediate modified face-palm? We know you laughed, Jake. Gotta ❤️ these guys.
Tensorflow grabs all of the GPU memory unless you have growth enabled.
There's a typo at 3:25
The A100 has ~54 Billion transistors on it. The 54.2 million listed would put the card firmly between a Pentium III and a Pentium 4 in terms of transistor count with a curiously big die for the lithographic node.
Glad to see them represented: p We have two of them in our GPU server
Insane!
what do your server do?
@@suyashsingh9865 he runs a nasa computer
@@suyashsingh9865 we run AI workloads on it
Just wanna say Linus' cheesy but wholesome shoutout to his wife at 9:18 made my day. Classy move, Linus, classy move.
Jake: That's not much alcohol
Linus: No, I like to go in dry first
Linus you monster if you are going to go in dry, at least provide more alcohol
you guys are the geekiest.....I don't understand a thing you are talking about, but I am fascinated and really enjoying watching and listening to you geek out....I will like and subscribe just to reward your enthusiasm!!
how tf r they geeky
@@spaghettiarmmachine7445 here is the dictionary definition of "geek"
"engage in or discuss computer-related tasks obsessively or with great attention to technical detail.
"we all geeked out for a bit and exchanged ICQ/MSN/AOL/website information"
It was not meant as an insult or derogatory. I do believe they engaged in computer related tasks with great attention to technical detail. Anyway I loved their enthusiasm for their subject and although I did not understand it.... I enjoyed watching their absolute joy discovering the technical intricacies of the product they were reviewing. Sorry if I offended you...
@@spaghettiarmmachine7445 How tf do you watch this video and *NOT* think they are LMFAO.
Like bro when you're literally fawning over a piece of computer tech that pretty much no normal consumer will ever own in their life, and spitting nerdfacts and terminology that almost nobody will intricately understand unless you have a very deep grasp of the subject matter...at that point is literally the definition of the word.
@@cantunerecordsalvinharriso2872 Don't apologize to these spoon brains lol. They all dress up in their mothers undergarments.
@@spaghettiarmmachine7445 get
Yeah, I've seen AI papers that say they had to train AI models on 512 of these for 10 days straight! The cost of neural networks is immense, both financially and environmentally.
lol and yet the human brain runs on about 20W of power... And probably dwarfs those cards in terms of computation power :D
@@TheMightyZwom Yeah, but humans take about 20 years to train at the low end
@@TheMightyZwom Inference vs training :) (I see Tomi97_videos pointed it out!)
@@Tomi97_videos That's an associate's degree.
@@TheMightyZwom Idk most of what you eat goes to your brain not just the watt output
Linus Torvalds: "Nvidia, f*ck you!"
Linus: "Nvidia didn't want to send me this card, but I got it anyways"
Thats the same thing right there
Not even close.
Two tech masters linus and linus
Torvalds' "f you" is genuine and he has no interest in NVIDIA's/Intel's money to keep shut about their practices ;-)
"look how crazy efficient nvidia is!" isn't quite the same as "fuck you".
@@lunakoala5053 Linus is still learning.
This thing would eat large batches of training data like a champ. I used it in a research lab, absolutely insane cards.
I LOOVEE HOW HIS INTRO IS STILL IN THE 2015 STYLE
It's pretty amazing how fast we can render stuff with consumer hardware nowadays. Blender's 3.0 version (with Cycles X) renders even faster now. What used to take multiple minutes a few years ago with my GTX 1080 now takes less than 30 seconds with my 3080
Wow, this guy had a 1080. Way to flex on everyone still trying to find something better than a 730 lmao
@@sakaraist Not his problem when some people are this crazy behind whats usable at the moment.
Yeah and Linus is still using the 2.x version of Blender here which uses tile rendering. 3.0 uses progressive rendering and is in some cases double the speed. Linus should upgrade blender :)
@@sakaraist Lmao I sold it at the start of the gpu shortage when prices were still relatively normal so I didn't get much for it
@@sakaraist My 5600g be like,
The fastest GPU ever video arrived in my recommendations 18 seconds after publishing. That's fast!
So, given the extra memory on the A100, you should be able to have a significantly larger batch size, which means way more samples/second.
switching a 3090 card with an a100 card keeping the 3090 cooler would be the most expensive and wierd gift/prank on your homie. or if its not a gift i mean its a pretty wierd flex
The near double performance you're getting near the end is because of nvidia's 1:1 fp16 ratio on consumer GPUs, also nvidia has a 1:64 fp64 ratio on consumer GPUs.
Basically they artificially limited consumer GPUs to force machine learning(wants fp16) and scientific calculation(wants fp64) people to buy way more expensive "professional" products.
LHR is just a newcomer to the party, not exactly a new concept.
Same energy
Can you elaborate on what you mean by '1:1 fp16 ratio' and '1:64 fp64 ratio'? (I am familiar with the datatypes themselves and the consequences for performance/vectorization on CPUs)
@@Cyberguy42 performance compared to fp32, 1:1 meaning it's the same speed and 1:64 meaning fuhgeddaboudit.
With a tiny bit of extra circuitry you can make fp16 go twice as fast and fp64 half speed compared to fp32.
At least Nvidia relented and unlocked fp16, because that used to run at abysmal speed as well, so you can at least save on memory (which is also scarce on consumer cards) and make use of tensor cores for some operations.
BTW AMD has full speed FP16 and 1:16 FP64 on their consumer GPUs.
Yeah that's not really accurate. Geforce cards physically don't have many fp64 units to make them cheaper, also tensorcore fp16 matmul with 16-bit accumulator runs at full rate. Only fp16 matmul with fp32 accumulator is artificially restricted, which can be useful in some workloads but often accumulation in fp16 does the job. The 1:1 fp16 ratio you talk about is on Cuda cores which nobody uses for deep learning since tensor cores are much faster.