You just know that the guy who loaned this card to LMG is watching this video and cringing perceptibly every single time Linus does 'a Linus' to his $10,000 card.
Reminds me of Michael Reeves "You lied to me, Boston Dynamics." XD Similar energy, its just that Linus has a lot more self control and professionality :P
A100s are no joke, no wonder AWS wants to bill me three arms and a leg to spin up instances with them just so I can get a "Sorry, we don't currently have enough capacity for this instance type" screen!
I had to write custom scripts which run endlessly to request the p4d instances (which has 8 of those, but the 400W versions) on aws, as they are not available in any AZs. Luckily the script managed to get one of those after 2 days in us-west-2
Linus: "We can't just go out and get an A100 because it costs almost $10,000" Also Linus: Creates a solid gold Xbox controller that's worth more than many people's houses
Just wanted to point out that TensorFlow by default allocates the whole memory even if it's not using it, so the A100 may benefit from a larger batch size
nVidia is unlikely to really care that much about that aspect of it, they can have bookkeeping write it off as a promo cost and deduct it from taxes, if they really care. What they DO care about is Linus shitting on the card with stuff that doesn't matter, really, but non-techie customers may think matter. See, the way that you quantify "value" for something like this isn't the most intuitive thing in the world, and has no relation to the shizz Linus is talking about, but the big kahunas of datacenters, and their investors - again don't understand how that stuff works and may misjudge it based on faulty reasoning. If I build high end workstations for a living, and a fortnite kiddie wants to review one - I will say FU NO! to the kiddie - not because my workstation can't play fortnite but because how well it does that is irrelevant, AND I gain absolutely nothing from that review, while risking a lot - hardware getting broken, bad rep possibly etc etc.
I replaced one of these cards for a customer who had 3 of them in total in a Dell 7515 server running dual AMD Epyc 7763 64 core processors. I remember thinking this APU is worth more than my car.
It was always fun working in a customer's cage and you open up the shipment that FedEx delivered and it is beat all to hell and find 6 server GPUs or a line card full of 100Gig Optics and realize that the package is worth more than you make in 5 -10 years.
Fun fact, our A100 servers (8 80 GB SXM A100s per server) each have a max power draw of close to 5 KW. And Linus and Jake were right! Even with the 80 gig models, we still wish we had more memory. Never enough memory!
I was one of the people handling repairs on amazon servers and I’ve seen thousands of them. They are crazy. Of course I can’t test them but just holding it you can tell it’s a beast
@@GodlyAwesome yeah I’ve repaired thousands and thousands of server racks. And they have sections dedicated for graphics cards and stuff. In a single server it would have anywhere between 2-12 graphics cards.
@@Whatismusic123 Despite the high price, they still would because it's like 100% more efficient for hashing. Just like Linus said, the running cost (electricity cost) of a gpu for mining far outweighs the price.
Most miners wouldn't buy this because they're just not eligible to. For the cost of 10 of these cards you could've purchased like 50-60 3090's even at these high prices and gave them proper cooling which would far outhash those enterprise cards. Yes it's cheaper to run those enterprise cards for the long term but you'd be looking at how long ethereum will last rather than how long the card will last
This GPU would be pointless to a miner, because it costs $10,000 and it would take them months or even years for them to justify the cost of it from mining, it doesn’t take extremely powerful cards to mine.
By default, Tensorflow allocates nearly all the GPU memory for itself regardless of the problem size. So you will see nearly full memory usage even for the smallest model.
Oh that explains a lot. I was wondering how they managed to tune it so perfectly, because Pytorch would simply crash if you tried to use more memory than available.
I love getting to see the incredibly expensive equipment that runs data centers, even though I understand about half of what they are used for. The efficiency is just insane
The reason I like Linus videos is that even though I don't understand 90% of the content, I still enjoy watching it without skipping a second. Keep it up dude
A1000 vs GTX 3090 A1000 having similar number or slightly more lanes per core runs at a lower power consumption while having almost 2x computing power. So A1000 is more efficient at computing numbers workloads but not so much with graphical loading.
The computer is as fast as it's slowest component. As example if you have a game that uses the GPU for everything but for some reason decides to use the shadow calculations using the CPU....you are limited by the CPU. As you increase the GPU, you need to increase the CPU. Then you'll be sitting at loading screens thinking "Why is this taking forever to load? My CPU and GPU are a beast" But you are running a standard HDD...Ahhhh time to upgrade! NVMe SSD FTW!. It's all a balance and why building your own PC will always be better (when you know when you are doing) compared to just buying a PC.
@13:55 what you guys are totally missing is that, the A100 has fewer CUDA cores but they do INT64/FP64 half throughput of INT32/FP32. The 3090 is what, 1/16th throughput or something? It's meant for higher precision calculation. The Desktop and Datacenter cores are different. You need to do a test on 64-bit calculations to compare.
@@kvncnr8031 It does 64-bit math like 10x faster than the 3090. So it's better for where you need high precision. Neural networks in particular can get away with much smaller numbers, like 8-bit values in the network. A bit is basically 1 or 0 in a binary number, so the number can represent a larger value with more bits. Or if it's a floating point number, it can have more precision (ie represent more decimal places). For scientific computing, like modelling the weather or physics simulations, you want higher precision math. That's why the A100 is tailored for 64-bit math, where as the 3090 is tailored for 32-bit math and below, which is the most common precision used for graphics.
Nice comparison. You could've rented one of those bad boys on Azure for less than $4/hour for the benchmark. In fact, 8 GPU A100s connected through NVLink are expected to be 1.5X times faster than stacking 8 A100s connected through motherboard.
Everyone with any pc building experience: "So graphics cards take pci-e power connectors and attempting to plug an eps connector in instead would be bad right?" Nvidia: "Well yes but no"
@@snowyowlll Well yeah i'm just referring to the pinout (Yes i know they're made so it's impossible or at least a lot harder to put one connector in the wrong spot)
So funny thing about that. The keying for PCI Express 8 pin and EPS 12 volt is basically compatible. The only difference between the two connectors is PCI Express has a little tab between pins seven and eight. If you were to plug a PCI Express power connector into an EPS 12 volt port you basically end up shorting 12v to ground. It may or may not know from experience 🤪
I haven't messed around with an nVidia Tesla GPU past the Maxwell line but I do remember it is possible to switch them to WDDM mode through nVidia SMI in Windows command prompt which will let you use the Tesla GPU for gaming provided you have an iGPU passthrough. By default, nVidia Tesla GPUs like the A100 will run in compute mode which Task Manager and Windows advanced graphics settings won't recognize as a GPU that you can apply to games and apps. But idk if WDDM has been removed in later nVidia Tesla GPUs like the A100 or not.
I recall reading WDDM not being available by default on some modern Tesla cards because the standard drivers only support TCC mode and specific driver packages from Nvidia are needed to do it. I have no idea how this applies to Ampere but I imagine it's similar.
Linus you look great with a beard! Long time sub here from wayyyyy back when you and Luke did those build wars and watching your videos back when you had that old space where you connected each pc daisy chained to a copper water cooled set up. It’s awesome to see your sub count and how far you’ve come since I last watched your videos. He you are your family are all well and enjoying the holiday season!
Trying to imagine a world where fans reach out to you to give you a 10k GPU whilst I struggled to obtain a 3060 so much that I bought a whole prebuilt PC just to pull it lol
For me, I bought a laptop instead. Lenovo legion 7 (16" 16:10 version) with rtx 3060. you would think a laptop with that GPU wouldn't have the same performance as the desktop PC equivalence, but the laptop is big enough for the heat and everything that it is extremely close. it runs the same performance and sometimes higher than my friend's rtx 2070 desktop GPU
Oh and it was around £1600, one of the best bang for the buck price performance wise for a gaming laptop. beat only by Lenovo legion 5 pro which is a bit cheaper but looks quite uglier
While machine learning can be sped up using more memory, there are things that you literally can not do without more VRAM. For example, increasing the batch size even further will very quickly overwhelm the 3090. Batch size, contrary to popular belief, is not "parallelizing" the task, but actually computing the direction of improvement with higher accuracy. Using a batch size of one for example would not usually even converge on some datasets, and even if it does, it would take ages to do so.
Also, it depends a lot on the what is used. If you're running inference and your model is big, it will need a lot of VRAM (proportionally to the model size) and won't run if it doesn't have enough. You *could* split the model between cards, but it's running into bandwidth, model and performance problems.
I assume we're talking about neural networks. Using bigger batches just means feeding more data sets into the model before backpropagation. Why does this increase memory usage linearly?
@@nottheengineer4957 Imagine sending 32 images of 512x512 pixels in size with three channels, thats a batch of 32, which would be a fp-32 tensor of size 32*512*512*3, a bigger batch size would mean a larger floating point array to be handled by the GPU. So, batch of 64 would be a tensor of 64*512*512*3. This is effectively doubling the total memory required to process the tensor.
When I interned at this machine learning lab I got the opportunity to train my models on a supercomputer node which had 4 of these cards. Even though my code was not optimized at all, it crunched my dataset of 500.000 images for 80 epochs in about 5 hours. For reference, my single RTX2060 Super card was on track to do it in about 4 days. I think the main advantage of these cards in machine learning is mainly the crazy amount of memory. My own gpu could handle batches of 64 images while the node could at least handle 512 with memory to spar (I didn't go further as the bigger batch sizes give diminishing returns in training accuracy)
I get what you're getting at but that comparison seems to be a bit extreme. If you put your workload on one a100 only that costs 10000$ and then on two 3090 that cost ya 2000$, you would save a lot of money and get better performance. If you consider the power usage then yes, you would be saving but still to get to 8000$ worth of difference it would take many years. People of course pay for these things because they are made with tons of memory and linkability and data centers need that but comparing just the processor power these chips aren't better than the more affordable gaming cards. There's a big price hike that nvidia applies to the pro cards because they can and the clients can and do pay.
@@fireboy2623 well I actually just quit my job today, and that card was used in said job. Since I don't have to worry about being fired anymore I figured I'd finally leave a comment on the video! ^-^
@@hrithvikkondalkar7588 The device ID is not unique. Every single card of the same model will have the same Device ID. For example, every 980Ti the same as mine (I can't say which specific model of 980Ti it is, as I bought it second hand with a waterblock fitted) will show 10DE 17C8 - 10DE 1151. You can google that and see for yourself.
You'd get a significant boost in speed with Blender when you render with a GPU of you set big tile sizes, like 1024 or 2048, under the performance window.
Can you imagine the process that guy probably had to go through for sending that card over? Like disclosures for if Linus drops it or Jake misplaces a screw lol
When, after admiring Linus and the crew for years and counting, you realise you bought a pair of those babies at work and you have an ssh key to login and use them you immediately figure how far you have gone since the first inspiration you got from LTT. Thanks guys, you are a good part of what I’ve got to!
Tensorflow allocates all of the GPU that you give it. That's why the VRAM usage is almost 100% in both cases. 512 batch size on a ResNet50 barely uses any memory, so this benchmark might not actually be pushing the cards to their limit.
Agree, my AI buddy already has his company ordering a few (80 GB model) where 2 of them will go to his high-end workstation. How lucky. But like you said, if you have large scale learning data sets or doing deep learning, these hards are at the top. Anything else, and these cards are likely not worth it.
5:15 Great, now Nvidia can super sample that fingerprint and find the technician that assembled the card, figure out where the technician worked, then where this card was assembled, then track down where it was sold, so they can find who it was sold to. oh no 😂😂😂
I know this is just a joke anyway, but that "technician" is probably some chinese kid assembling hundreds of those a day. You need to add some stupid detail to narrow it down further. Maybe there are 9/10 prints on the card and this is explained by the technician having a cut on his 10th finger and taping it. But only for 20 minutes for the bleeding to stop, so you can narrow it down to a handful of cards whose owners you can then manually check out. Yes, I totally should write episodes for Navy CIS.
The difference in finishes you see at 6:10 looks like part of the shroud was milled. The matte parts look to be as-is (likely stamped or cast depending on the thickness). The smoother parts with lines going in squares are milled (kind of like a 3D drill to cut away material). This means they were taking higher-volume parts and further customizing them for these cards (milling is done for much smaller production runs than stamping or casting/molding).
Whoever sent that in is literally putting their job on the line and in the hands of a clumsy Linus, watching him take this apart gave me huge anxiety! 😂
Probably "borrowed" from his work and hoping his manager doesn't see any identifying marks on the missing card. Also, if you gonna steal a 10k card you're probably not that bright to begin with.
Nope probably just a miner with extra cash looking to increase efficiency. 70% higher hash rate over a 3090 w/ 25-30% lower power consumption seems good but at 3-4x the initial cost. It will take 5 years of continuous running to pay for itself, assuming about 5.50 a day profit. 3090 will pay for itself in 2.5 years.. this is all of course assuming crypto remains completely flat, which is highly unlikely. If I can get some of these at a good discount I will probably pick some up.
"Yo test mining with it", dude prolly jacked the thing from somewhere or got it from the market and threw it a linus before he puts it with the rest of his mining operation to see what he's dealing with.
I like how they always make a separate video for the top of the line HPC/professional nvidia card of each generation and make it so hype like it's a gaming card and it just released instead of year(s) ago. I don't mean that in a negative way.
The HBM2 will be saving quite a bit of power vs the GDDR6X on the 3090. It’ll also be a huge boost in some workloads. TSMC’s 7nm process is no doubt better than Samsungs N8, it’ll be interesting to see how Lovelace and RDNA3 do on the same 5nm node.
People will shit on you if you even dare to mention that RDNA 2.0 is worse than Ampere as an _architecture_ because it has a pretty significant node advantage and still only trades blows with Ampere. But just look at this Ampere on TSMC's 7nm, it's quite darn efficient. It will indeed be interesting to see the Lovelace vs RDNA3 on the same node.
@@PAcifisti To be fair, the A100 has low clocks and has a massive die size at 830mm2. It's not even fair lmao. Same thing about current desktop Ampere: 20% larger than the largest RDNA2 die.
I hope the guy that is the owner of this card didnt die 14 times from a heart attack... Also thank you actual owner for making all of us able to watch this tear down and video!
6:15 This might be the point where the case was mounted to a big industrial suction cup. Manufacturers often do that when spray painting a piece of metal. You can see that on a lot of metal stuff that doesn't need to look good from the inside.
Note on TensorFlow and VRam utilization: TensorFlow allocates all of the available VRam even though it might not use all of it. Furthermore, in my studies, models ran considerably slower with XLA enabled. Would be interesting to know how the cards perform with XLA off!
you guys are the geekiest.....I don't understand a thing you are talking about, but I am fascinated and really enjoying watching and listening to you geek out....I will like and subscribe just to reward your enthusiasm!!
@@spaghettiarmmachine7445 here is the dictionary definition of "geek" "engage in or discuss computer-related tasks obsessively or with great attention to technical detail. "we all geeked out for a bit and exchanged ICQ/MSN/AOL/website information" It was not meant as an insult or derogatory. I do believe they engaged in computer related tasks with great attention to technical detail. Anyway I loved their enthusiasm for their subject and although I did not understand it.... I enjoyed watching their absolute joy discovering the technical intricacies of the product they were reviewing. Sorry if I offended you...
@@spaghettiarmmachine7445 How tf do you watch this video and *NOT* think they are LMFAO. Like bro when you're literally fawning over a piece of computer tech that pretty much no normal consumer will ever own in their life, and spitting nerdfacts and terminology that almost nobody will intricately understand unless you have a very deep grasp of the subject matter...at that point is literally the definition of the word.
6:56 in response to Jake’s alcohol offer, Linus - “I like to go in dry first.” Did you notice Jake’s immediate modified face-palm? We know you laughed, Jake. Gotta ❤️ these guys.
Nvidia: "You can't have that." Linus: "Why not?" Nvidia: (gestures wildly at 1:18, where Linus puts his greasy fingers all over the contact leads for the socket on a LOANER CARD.)
@@Telogor that matters less to me. What matters to me is the extremely candid mishandling of equipment that does not belong to you. Linus should know better, and he chooses to do stupid things anyway.
If he's into crypto, and got it for a presumably larger mining op, he obviously doesn't give a shit about $10k as it's what our portfolios move every 30 sec lol. If it was my card i wouldn't care if he wiped his ass on it so long it was cleaned and working when i got it back, like the guy said. You can get off your high horse now, go be upset over nothing somewhere else.
as a structural biologist that uses this kind of hardware fairly often, its really cool actually contextualising how crazy the hardware we use is hahahaha
Check the temperature of a100. If it's touching 80 C then there is a performance throttle. Tried similar setup with v100, there was huge performance gain using a 6000 rpm fan for cooling.
I'd be interested in seeing how they compare rendering a single tile all at once. That's what got me when I switched from a 2080 Ti to a 3090, not realizing right away that it would render 2k tiles without sweating, but rendering out a bunch of smaller tiles they were rendering basically the same times.
It's pretty amazing how fast we can render stuff with consumer hardware nowadays. Blender's 3.0 version (with Cycles X) renders even faster now. What used to take multiple minutes a few years ago with my GTX 1080 now takes less than 30 seconds with my 3080
Yeah and Linus is still using the 2.x version of Blender here which uses tile rendering. 3.0 uses progressive rendering and is in some cases double the speed. Linus should upgrade blender :)
I love Linus and Jakes's chemistry so damn much and I love how it has evolved. Jakes started out has the apprentice while nowadays, he's actually the more mature of the two
If it was meant for crunching big datasets and machine learning, I would expect it to be optimized for massive mathematical computations not rendering. (large matrix multiplication)
19:37 The memory usage is a bit misleading. If you are using tensorflow, it always reserves all the GPU memory when it launch even though it don't actually use that much.
Ohhhh this takes me back to my grad school days when these cards were branded as "Tesla." It was wild how a single (albeit $10k) graphics card could enable me to run numerical simulations faster than I could on my undergrad's Beowulf cluster. This was back in the days where if you wanted to do GPGPU, you were writing your own low-level routines in CUDA's C library and mostly having to figure out your own automation. I can't even imagine how things have changed now that you can abstract everything to a high level python API and just let the thing do its thing.
It's pretty fucking dope. Source- I worked with a guy doing his PhD in computational neuroscience using a python API machine learning algorithm to search for putative appositions in rat brains.
There's a typo at 3:25 The A100 has ~54 Billion transistors on it. The 54.2 million listed would put the card firmly between a Pentium III and a Pentium 4 in terms of transistor count with a curiously big die for the lithographic node.
This must be EXTREMELY confidential if you had to go to SUCH an extreme just to prevent your "lender" from being identified. I know doing this is very much a risk for them, so good on you for minimizing that risk for them.
Okay, I, just, perfectly understand, but, unfortunately, I have one, huge, question for you, Linus. Preventing your lender from being identified, it, just, kind of boggles me. This is, because, in the words of car enthusiast, @Doug DeMuro, in his first video of one of Japanese car manufacturer, @Nissan’s 4, very obscure, “Pike Cars”, The 1989 @Nissan S - Cargo, quote, “Why would you do this?”. I need, in this context, that, question to be answered, and, after you ship it back, Linus, actually, reveal who, exactly, though, sent you the card, or, maybe in your next Tech Tips episode, or, maybe, even, in next week’s WAN Show.
The near double performance you're getting near the end is because of nvidia's 1:1 fp16 ratio on consumer GPUs, also nvidia has a 1:64 fp64 ratio on consumer GPUs. Basically they artificially limited consumer GPUs to force machine learning(wants fp16) and scientific calculation(wants fp64) people to buy way more expensive "professional" products. LHR is just a newcomer to the party, not exactly a new concept.
Can you elaborate on what you mean by '1:1 fp16 ratio' and '1:64 fp64 ratio'? (I am familiar with the datatypes themselves and the consequences for performance/vectorization on CPUs)
Yeah that's not really accurate. Geforce cards physically don't have many fp64 units to make them cheaper, also tensorcore fp16 matmul with 16-bit accumulator runs at full rate. Only fp16 matmul with fp32 accumulator is artificially restricted, which can be useful in some workloads but often accumulation in fp16 does the job. The 1:1 fp16 ratio you talk about is on Cuda cores which nobody uses for deep learning since tensor cores are much faster.
This is a very interesting video. I liked it a lot more than I thought I would. There isn't much coverage of Nvidia's data center line from tech TH-camrs typically so it was fun learning a bit about that aspect of their GPU division 😊👍
man I wish I was half as smart and cool as you guys. I enjoy watching these videos as a casual consumer and its so cool you guys basically dance around code and near instantly recognise items instantly and with ease! I mean seeing the teardowns are always a joy too.
I've watched a few Linus videos here and there, and this one I also just happened to click on by chance, but I just want to say I really like that Jake guy! He seems very knowledgable but also very chill and humorous. He's a great addition to the team!
Very cool Video about this amazing card. A good comparison would have been the Tesla T4, which is another popular card for machine learning (since it offers the currently best total cost to performance tradeoff for that workload). Also a small hint: The benchmark doesn't acutally use that much VRAM, but tensorflow allocates almost the entire VRAM by default (even if it doesn't require / use it). The actual VRAM usage of ResNet50 FP16 with batch size 512 should be about 22 GB (since I know its 22GB for FP32 bs 256).
so glad they contextualized the AI performance, wish they would branch into that field more, and even have a tech review integration for AI performance. they *must* know about stuff like Stable Diffusion, so it'd be useful and fun, especially when the 40 series comes out, or testing things like the ARC if they ever get an opportunity. i'm still curious if they ever explored AI performance of ARC.
watching linus handle someone elses expensive hardware is like watching a thriller
now viewing it again whilst Michael Jackson Plays in the background. !
You just know that the guy who loaned this card to LMG is watching this video and cringing perceptibly every single time Linus does 'a Linus' to his $10,000 card.
Lmao it was cringe as hell, i was traumatized throughout the entire video.
@@BLCKKNIGHT92 ok soy
Cuz this is THRILLERRR
Nvidia : "no linus you can't have that"
Linus: "and I took that personally"
Linus always finds a way
@@generalgrievous2726 Well, the way has found him.
Reminds me of Michael Reeves "You lied to me, Boston Dynamics." XD
Similar energy, its just that Linus has a lot more self control and professionality :P
They just did not want him to drop it lol
Just makes no sense to send him this sort of card. The people buying them don't get tech advice from fucking linus lol.
thanks for pointing out Jake, really helped me recognize him.
But who's that other guy with him?
Just what I had in my mind
@@qovro just what i wanted to say ...whos that dude doing all the work
I mean its right there above my coment
Who's the other guy they didn't tag him
LTT: NVIDIA refuses to send us a super powerful gpu
Also LTT: drops a $10k cpu by accident breaks it and attempts to fix it with a vicegrip
It's not a LTT video if something expensive doesn't get dropped
@@Immadeus he litterally knocks the thing over not too far in lmao
@@Shadowclaw6612 thats the silver one not the one he got sent
If I owner of the GPU my condition would be "You have to return it in working condition, or buy a replacement, but do whatever you want"
@@Shadowclaw6612 that's done for comedic effect.
A100s are no joke, no wonder AWS wants to bill me three arms and a leg to spin up instances with them just so I can get a "Sorry, we don't currently have enough capacity for this instance type" screen!
The shortage is in the cloud! (Obviously but it's funny to say)
@@CreativityNull "It's... it's all in the cloud?"
*cocks gun*
"Always has been."
I had to write custom scripts which run endlessly to request the p4d instances (which has 8 of those, but the 400W versions) on aws, as they are not available in any AZs. Luckily the script managed to get one of those after 2 days in us-west-2
to be fair, p4d.24xlarges have 8 of these in them
the reserved prices are not too bad, considering the hardware
Same for top end azure instances rn
Linus: "We can't just go out and get an A100 because it costs almost $10,000"
Also Linus: Creates a solid gold Xbox controller that's worth more than many people's houses
Most people don't even have houses
That gold can be melted down,allowing you to recover most of its value.Try doing that with a graphics card.
@@monsterhunter445 the comment would obviously apply to people that own houses.
You are the type of people who like to quote out of context. but it's okay
Well, if he spent all his money on the golden gamepad, that could be why he can't afford the A100.
Just wanted to point out that TensorFlow by default allocates the whole memory even if it's not using it, so the A100 may benefit from a larger batch size
th-cam.com/video/Fe9zPOZvDxI/w-d-xo.html
Yeah! that what this GPU is for. You can train really big stuff there!
This is usually used in data centers right? so this might be what we've been sharing in cloud computing
@Jesus is LORD Hey dude, remember when those little kids made fun of a guy for being bald, so God sent a bear to kill and eat them?
@@ChristopherHallett man, the old testament God was way cooler than the new testament one. At least regarding roman-era like entertainment
*Linus who has broken something, on everything, in every video created*
Linus: “I don’t know why Nvidia wouldn’t send us the card”
Soon To Be Every Video, jk, ^_-
Linus sex tips
Some guy sends it and essentially says please don't fuck it up
Linus: drops it almost immediately
🤣
nVidia is unlikely to really care that much about that aspect of it, they can have bookkeeping write it off as a promo cost and deduct it from taxes, if they really care. What they DO care about is Linus shitting on the card with stuff that doesn't matter, really, but non-techie customers may think matter. See, the way that you quantify "value" for something like this isn't the most intuitive thing in the world, and has no relation to the shizz Linus is talking about, but the big kahunas of datacenters, and their investors - again don't understand how that stuff works and may misjudge it based on faulty reasoning.
If I build high end workstations for a living, and a fortnite kiddie wants to review one - I will say FU NO! to the kiddie - not because my workstation can't play fortnite but because how well it does that is irrelevant, AND I gain absolutely nothing from that review, while risking a lot - hardware getting broken, bad rep possibly etc etc.
I replaced one of these cards for a customer who had 3 of them in total in a Dell 7515 server running dual AMD Epyc 7763 64 core processors. I remember thinking this APU is worth more than my car.
At that point it may be worth more than a small apartment.
@@megan00b8 *Cries in Australian*
@@megan00b8 In my country, it's worth more than our life long income
It was always fun working in a customer's cage and you open up the shipment that FedEx delivered and it is beat all to hell and find 6 server GPUs or a line card full of 100Gig Optics and realize that the package is worth more than you make in 5 -10 years.
@@IgoByaGo cage?
Fun fact, our A100 servers (8 80 GB SXM A100s per server) each have a max power draw of close to 5 KW. And Linus and Jake were right! Even with the 80 gig models, we still wish we had more memory. Never enough memory!
Big Iron.
what are you doing with that hardware?
@@velo1337 Skynet, duh.
@@velo1337 ur mum
@@velo1337 Playing Crysis, probably.
I was one of the people handling repairs on amazon servers and I’ve seen thousands of them. They are crazy. Of course I can’t test them but just holding it you can tell it’s a beast
Wait, thousands went for repair.....? So they break often? 🤔.
@@EnsignLovell i think he meant more in a metaphor Type of way
@@Sn1ffko definitely meant he had to go to the datacenter itself and saw all the cards there in the racks
That’s what she said
@@GodlyAwesome yeah I’ve repaired thousands and thousands of server racks. And they have sections dedicated for graphics cards and stuff. In a single server it would have anywhere between 2-12 graphics cards.
I like how youtube has labelled this video as "Exclusive Access" as if Nvidia have allowed this at all lol
Linus sex tips
They have? where?
That looks like sponsor block to me...
that's soapser block (it's blacklisted)
Nvidia should sell this kind of cards to miners instead of selling consumer-grade gpu's in bulk to them.
they would still bot buy them even with this
@@Whatismusic123 Despite the high price, they still would because it's like 100% more efficient for hashing. Just like Linus said, the running cost (electricity cost) of a gpu for mining far outweighs the price.
yes
Most miners wouldn't buy this because they're just not eligible to. For the cost of 10 of these cards you could've purchased like 50-60 3090's even at these high prices and gave them proper cooling which would far outhash those enterprise cards. Yes it's cheaper to run those enterprise cards for the long term but you'd be looking at how long ethereum will last rather than how long the card will last
This GPU would be pointless to a miner, because it costs $10,000 and it would take them months or even years for them to justify the cost of it from mining, it doesn’t take extremely powerful cards to mine.
I'm glad Jake said "ah, it has an IHS" because for a split second I thought that was all GPU die and nearly had a stroke
amogus
Yeah, me too. That would've been the most monstrous die I've ever seen.
Same. I couldn’t believe what I was seeing!
fucking exactly
yesss, my exact thoughts
By default, Tensorflow allocates nearly all the GPU memory for itself regardless of the problem size. So you will see nearly full memory usage even for the smallest model.
cuda_error = cudaMalloc((void **)&x_ptr, all_the_GPU_mem);
As much as I like LTT, they never do benchmark's involving AI/Deep Learning properly.
@@gfeie2 start with USIZE_MAX memory and binary search your way down to an allocation that doesn't fail XD
Oh that explains a lot. I was wondering how they managed to tune it so perfectly, because Pytorch would simply crash if you tried to use more memory than available.
should've used pytorch yeah
Amazing how I can understand so little yet be so thoroughly entertained. 10/10.
actually funny
@@xqzyu ye
Linus sex tips
I’ve been a PC guru for over two decades and even i’m outta my league here.
Oh thank God! I thought I was the only one
I love getting to see the incredibly expensive equipment that runs data centers, even though I understand about half of what they are used for. The efficiency is just insane
Understanding half of what goes on in a data center isn't too bad, though.
Basically, it has half the gpu cores, but way more AI cores, apu, to do AI tasks, at about half the power.
The reason I like Linus videos is that even though I don't understand 90% of the content, I still enjoy watching it without skipping a second. Keep it up dude
A1000 vs GTX 3090
A1000 having similar number or slightly more lanes per core runs at a lower power consumption while having almost 2x computing power.
So A1000 is more efficient at computing numbers workloads but not so much with graphical loading.
Same thought mid video
@@DmanLucky_98 A1000 looks like a big golden chocolate bar
@@DmanLucky_98 A1000 go brrr
Bro I'm here for the segways
At this point, the GPU has become the real computer and the CPU is just there to get it going.
cpu is the coworker that got in because their relaitive works there
Cpu handles multi tasking/ software management. Without cpu we wouldn't have multiplayer games.
@@cradlepen5621 Singleplayer is the future
The computer is as fast as it's slowest component. As example if you have a game that uses the GPU for everything but for some reason decides to use the shadow calculations using the CPU....you are limited by the CPU.
As you increase the GPU, you need to increase the CPU.
Then you'll be sitting at loading screens thinking "Why is this taking forever to load? My CPU and GPU are a beast"
But you are running a standard HDD...Ahhhh time to upgrade! NVMe SSD FTW!.
It's all a balance and why building your own PC will always be better (when you know when you are doing) compared to just buying a PC.
This is the weirdest take I've read all week.
6:58
The nvidia employee watching the chip serial number: 👁️👄👁️
Lol
How did the not think of that. The SECOND they showed the taped serial, I thought "they'll probably fuck this up somehow" @@Zatch_Brago
One day our grandkids will call this GPU the "potato/calculator", just like we call all the hardware that launched people into space 50 years ago...
well we did hit the size limit for our logic gates and whatnot, and quantum tech is only used for crunching numbers. So that's unlikely.
Crazy to think this much power could be available in a phone in 10 years.
@@jorge69696 Also no, size constraints
Ah yes the A100..
A outdated historical relic compared to tech in 2077
or the classic we have those in our phones now
PS3 and Xbox360 games still look graphically impressive. We're not advancing as fast as before.
@13:55 what you guys are totally missing is that, the A100 has fewer CUDA cores but they do INT64/FP64 half throughput of INT32/FP32. The 3090 is what, 1/16th throughput or something? It's meant for higher precision calculation. The Desktop and Datacenter cores are different. You need to do a test on 64-bit calculations to compare.
Didn't understand but you sound like you know your shit
Nerd
@@kvncnr8031 It does 64-bit math like 10x faster than the 3090. So it's better for where you need high precision. Neural networks in particular can get away with much smaller numbers, like 8-bit values in the network. A bit is basically 1 or 0 in a binary number, so the number can represent a larger value with more bits. Or if it's a floating point number, it can have more precision (ie represent more decimal places). For scientific computing, like modelling the weather or physics simulations, you want higher precision math. That's why the A100 is tailored for 64-bit math, where as the 3090 is tailored for 32-bit math and below, which is the most common precision used for graphics.
I think the amount of Tensor cores is also different. Not even sure the older graphics cards have Tensor cores
@@Daireishi i like your funny words magic man
I always love the moments where I realize that 3090s aren't the peaks of it's generation.
They probably have the technology for 10x 3090 but not good for business to lay it all now
In terms of gaming cards, it is top of the line
@@evanshireman5644 well it's not. the 6900xt is mostly faster at 1080p and even at nvidia, there's a 3090 ti in existence.
@@ProjectPhysX except for those that are memory limited.
@@Ornithopter470 yep, you can never have enough memory... but 80GB is already quite a lot :D
Nice comparison. You could've rented one of those bad boys on Azure for less than $4/hour for the benchmark. In fact, 8 GPU A100s connected through NVLink are expected to be 1.5X times faster than stacking 8 A100s connected through motherboard.
Everyone with any pc building experience: "So graphics cards take pci-e power connectors and attempting to plug an eps connector in instead would be bad right?"
Nvidia: "Well yes but no"
it’s a power connector. it’s like saying nema 5-15p connectors can only be used in the usa.
@@snowyowlll Well yeah i'm just referring to the pinout
(Yes i know they're made so it's impossible or at least a lot harder to put one connector in the wrong spot)
So funny thing about that. The keying for PCI Express 8 pin and EPS 12 volt is basically compatible. The only difference between the two connectors is PCI Express has a little tab between pins seven and eight. If you were to plug a PCI Express power connector into an EPS 12 volt port you basically end up shorting 12v to ground. It may or may not know from experience 🤪
250W for such a card is excellent. I was expecting more like 400W up.
7nm TSMC, that's why
@@mihailcirlig8187 i think the A178-9 and the NVIDIA 9050 is way faster. I have it currently.
250 is still a lot bro
nvm
The A100 SXM version does have a 400W draw.
I haven't messed around with an nVidia Tesla GPU past the Maxwell line but I do remember it is possible to switch them to WDDM mode through nVidia SMI in Windows command prompt which will let you use the Tesla GPU for gaming provided you have an iGPU passthrough. By default, nVidia Tesla GPUs like the A100 will run in compute mode which Task Manager and Windows advanced graphics settings won't recognize as a GPU that you can apply to games and apps. But idk if WDDM has been removed in later nVidia Tesla GPUs like the A100 or not.
You said you did what in the who now 😕😵
I recall reading WDDM not being available by default on some modern Tesla cards because the standard drivers only support TCC mode and specific driver packages from Nvidia are needed to do it. I have no idea how this applies to Ampere but I imagine it's similar.
You need more halo lore vids lol
@@SkullGamingNation fr
so strange when two of my completely unrelated hobbies come together randomly like this
Linus you look great with a beard! Long time sub here from wayyyyy back when you and Luke did those build wars and watching your videos back when you had that old space where you connected each pc daisy chained to a copper water cooled set up. It’s awesome to see your sub count and how far you’ve come since I last watched your videos. He you are your family are all well and enjoying the holiday season!
@Teamgeist the beanie suits him!
Trying to imagine a world where fans reach out to you to give you a 10k GPU whilst I struggled to obtain a 3060 so much that I bought a whole prebuilt PC just to pull it lol
Influencer live is pretty dank, innit... ahh, the dreams...
lmaooo even i did the same thing recently
I have a 1060
For me, I bought a laptop instead. Lenovo legion 7 (16" 16:10 version) with rtx 3060. you would think a laptop with that GPU wouldn't have the same performance as the desktop PC equivalence, but the laptop is big enough for the heat and everything that it is extremely close. it runs the same performance and sometimes higher than my friend's rtx 2070 desktop GPU
Oh and it was around £1600, one of the best bang for the buck price performance wise for a gaming laptop. beat only by Lenovo legion 5 pro which is a bit cheaper but looks quite uglier
While machine learning can be sped up using more memory, there are things that you literally can not do without more VRAM. For example, increasing the batch size even further will very quickly overwhelm the 3090. Batch size, contrary to popular belief, is not "parallelizing" the task, but actually computing the direction of improvement with higher accuracy. Using a batch size of one for example would not usually even converge on some datasets, and even if it does, it would take ages to do so.
big batch sizes dont converge necessarily either, which is why you might want to start with a big one but lower it eventually as training goes on
Also, it depends a lot on the what is used. If you're running inference and your model is big, it will need a lot of VRAM (proportionally to the model size) and won't run if it doesn't have enough. You *could* split the model between cards, but it's running into bandwidth, model and performance problems.
I assume we're talking about neural networks. Using bigger batches just means feeding more data sets into the model before backpropagation. Why does this increase memory usage linearly?
@@nottheengineer4957 Imagine sending 32 images of 512x512 pixels in size with three channels, thats a batch of 32, which would be a fp-32 tensor of size 32*512*512*3, a bigger batch size would mean a larger floating point array to be handled by the GPU. So, batch of 64 would be a tensor of 64*512*512*3. This is effectively doubling the total memory required to process the tensor.
I understood like two words
Jake is really growing
Bien
Jake is how I imagine young gabe newell
@@tabovilla a Mousquetaires Gabe
Jake show I imagine you gabe newall
Bien
When I interned at this machine learning lab I got the opportunity to train my models on a supercomputer node which had 4 of these cards. Even though my code was not optimized at all, it crunched my dataset of 500.000 images for 80 epochs in about 5 hours. For reference, my single RTX2060 Super card was on track to do it in about 4 days.
I think the main advantage of these cards in machine learning is mainly the crazy amount of memory. My own gpu could handle batches of 64 images while the node could at least handle 512 with memory to spar (I didn't go further as the bigger batch sizes give diminishing returns in training accuracy)
I get what you're getting at but that comparison seems to be a bit extreme. If you put your workload on one a100 only that costs 10000$ and then on two 3090 that cost ya 2000$, you would save a lot of money and get better performance. If you consider the power usage then yes, you would be saving but still to get to 8000$ worth of difference it would take many years. People of course pay for these things because they are made with tons of memory and linkability and data centers need that but comparing just the processor power these chips aren't better than the more affordable gaming cards. There's a big price hike that nvidia applies to the pro cards because they can and the clients can and do pay.
So glad you guys are now including AI benchmarks. Please continue to do so! Some of your viewers are gamers and data scientists!
What is a data scientist In terms an idiot can understand? 😂
SOME of their viewers are gamers?
@@DakanX Some viewers are *BOTH* gamers and data scientists.
It was just good here because of the GPUs involved.
Linus - “I like to go in dry first.”
Jake- *Please don’t look at me.*
😂
Why does Linus surround himself with fat dudes ???
@@travisash8180 they bring food with them
@@aryanluharuwala6407 I think that Linus is a chubby chaser !!!
That is definitely NOT what she said.
Linus: "I've found my gold"
Jake: "what?"
Linus: "Yvonne"
Jake: *dies of cringe*
the way he said "Yvonne" was so endearing tho
@@WyattWinters I mean to be fair, that's how I feel about my wife and when you find the one you just know it
Such a lovely moment! I hope she sees it accidentally and smiles
Jake: *dies of cringe*
Audience: AAAWWWW that's so sweet!
As a married man, I saw this coming from a mile. That's sweet.
Watching them tear apart my card was stressful not gonna lie, but totally worth it! Great video guys, I'm happy I kind of got to be apart of it!
bro what this video is 11 months old there is no way that was your card lol
@@fireboy2623 well I actually just quit my job today, and that card was used in said job. Since I don't have to worry about being fired anymore I figured I'd finally leave a comment on the video! ^-^
I applaud you! I’d be shitting bricks worrying that Linus would drop something if I was you 😂
"You can do anything you want with it"
Linus: *drops the card*
As a tip, Nvidia-smi runs on Windows too, its included in the driver.
I used to use to lower the power target without needing to install anything.
mine always closes immediately and I cant change settings. Been working to get a tesla functional on my rig and haven't been able just yet.
@@tobiwonkanogy2975 Add that directory to the windows path.
Interesting thing with NVIDIA drivers is that they are essentially the same cross platform. That's why NV wont release source.
smi can also be used to overclock and adjust memory timings, that 174MH could be 200+ with tweaks.
Thanks for the tips ill try em out
Linus: "We'll mask the serial so they can't find the person."
**Shows the chip serial instead**
also device id at 14:36
...hmm
@@hrithvikkondalkar7588 The device ID is not unique. Every single card of the same model will have the same Device ID. For example, every 980Ti the same as mine (I can't say which specific model of 980Ti it is, as I bought it second hand with a waterblock fitted) will show 10DE 17C8 - 10DE 1151. You can google that and see for yourself.
That's not the chip serial. That's the model and revision.
Device ID is same across GPU models, it's part of the PCIe spec.
Jake’s “It’s just so thick, why would you ever use it?” about the “spiciest” 3090 DID NOT age well now that the 4000 series’s are out 😂
But the 4000s suk
@@everythingsalright1121 they're great gpu's just the price is out of this world
@@everythingsalright1121 Turns out that was a lie. the 4080 and 4090 are really good cards, they're just horribly overpriced.
You'd get a significant boost in speed with Blender when you render with a GPU of you set big tile sizes, like 1024 or 2048, under the performance window.
@@Barnaclebeard me? What? why?
256 for 1080p renders and up. that is how you get the fastest speed. if it is 4k, you go with 1024.
I dont get it. I got 4gb doodoo gpu and blender automaticly sets it up to 2048
@@sayochikun3288 modern blender doesn't use tiles the sane way
@@1e1001 but in the video there is older blender 2.9 or 2.8
Can you imagine the process that guy probably had to go through for sending that card over? Like disclosures for if Linus drops it or Jake misplaces a screw lol
Number one thing I thought of when I saw the title was there done with linus dropping their shit 😂
Well, since it was quasi-legal and trying to keep it on the DL, I'd say he just wrapped it up in bubble wrap and a box and sent it UPS.
Pretty sure if Linus broke it he'd buy a new one
@@filonin2 well, it's 100% legal, he just didn't want to ruin relationship with Nvidia.
I would not trust a shipping company to handle it appropriately during transit...
1:48 The fear in that man's face is surreal. Like he had seen a ghost 😂
When, after admiring Linus and the crew for years and counting, you realise you bought a pair of those babies at work and you have an ssh key to login and use them you immediately figure how far you have gone since the first inspiration you got from LTT. Thanks guys, you are a good part of what I’ve got to!
People encrypt their backup.
You better ALSO encrypt your .ssh/ xD Holy crap. Congrats on your achievements tho!
Tensorflow allocates all of the GPU that you give it. That's why the VRAM usage is almost 100% in both cases. 512 batch size on a ResNet50 barely uses any memory, so this benchmark might not actually be pushing the cards to their limit.
For training deep learin ai, machine learning or similar, this one is a beast. For rendering also great because both need lotsss of GPU memory.
Agree, my AI buddy already has his company ordering a few (80 GB model) where 2 of them will go to his high-end workstation. How lucky. But like you said, if you have large scale learning data sets or doing deep learning, these hards are at the top. Anything else, and these cards are likely not worth it.
you know tbh
this gpu could make a nasa supercomputer
1:59 ✈️ 🏬🏬 bro 💀💀💀
😮
what's wrong with you
@@madbanana22 maybe he got free healthcare that's why he's joking about lol
Bro wtfff 😭😭😭😭
5:15 Great, now Nvidia can super sample that fingerprint and find the technician that assembled the card, figure out where the technician worked, then where this card was assembled, then track down where it was sold, so they can find who it was sold to. oh no 😂😂😂
I know this is just a joke anyway, but that "technician" is probably some chinese kid assembling hundreds of those a day.
You need to add some stupid detail to narrow it down further. Maybe there are 9/10 prints on the card and this is explained by the technician having a cut on his 10th finger and taping it. But only for 20 minutes for the bleeding to stop, so you can narrow it down to a handful of cards whose owners you can then manually check out.
Yes, I totally should write episodes for Navy CIS.
@@lunakoala5053 I mean imagine working in a factory to be assembling something this expensive
@@lunakoala5053 Collab with Joel Haver maybe? damn ❤️
@@fjjwfp7819 *imagine working in a factory to be assembling something this expensive, that ends up in Linus steady hands 😂😂
The difference in finishes you see at 6:10 looks like part of the shroud was milled. The matte parts look to be as-is (likely stamped or cast depending on the thickness). The smoother parts with lines going in squares are milled (kind of like a 3D drill to cut away material). This means they were taking higher-volume parts and further customizing them for these cards (milling is done for much smaller production runs than stamping or casting/molding).
Whoever sent that in is literally putting their job on the line and in the hands of a clumsy Linus, watching him take this apart gave me huge anxiety! 😂
Kinda doubt it, if you need that much, why send it. He now can't use for x amount of days
Probably "borrowed" from his work and hoping his manager doesn't see any identifying marks on the missing card.
Also, if you gonna steal a 10k card you're probably not that bright to begin with.
@@RomboutVersluijs He wanted it back with a cooler and a mining benchmark he probably doesn't know how to use it. lol
Nope probably just a miner with extra cash looking to increase efficiency. 70% higher hash rate over a 3090 w/ 25-30% lower power consumption seems good but at 3-4x the initial cost. It will take 5 years of continuous running to pay for itself, assuming about 5.50 a day profit. 3090 will pay for itself in 2.5 years.. this is all of course assuming crypto remains completely flat, which is highly unlikely.
If I can get some of these at a good discount I will probably pick some up.
"Yo test mining with it", dude prolly jacked the thing from somewhere or got it from the market and threw it a linus before he puts it with the rest of his mining operation to see what he's dealing with.
Building a new server at work and I’m using one of these. Pretty excited
What sort of things do they even use these for, is it like protein folding models in biotech firms or something?
I like how they always make a separate video for the top of the line HPC/professional nvidia card of each generation and make it so hype like it's a gaming card and it just released instead of year(s) ago. I don't mean that in a negative way.
I think the "cooling solution" would have worked better if you would've reversed the airflow
ok
ok
ko
ok
so...does it?
The HBM2 will be saving quite a bit of power vs the GDDR6X on the 3090. It’ll also be a huge boost in some workloads. TSMC’s 7nm process is no doubt better than Samsungs N8, it’ll be interesting to see how Lovelace and RDNA3 do on the same 5nm node.
TSMC N7 is better than Samsung's 8 nm for sure, but the reason the A100 is so much more efficient than the 3090 is not because of the die technology.
RNDA3 will use MCM technology. This will possibly allow AMD to win in rasterization performance and be much more power efficient than Lovelace.
People will shit on you if you even dare to mention that RDNA 2.0 is worse than Ampere as an _architecture_ because it has a pretty significant node advantage and still only trades blows with Ampere. But just look at this Ampere on TSMC's 7nm, it's quite darn efficient. It will indeed be interesting to see the Lovelace vs RDNA3 on the same node.
@@PAcifisti well yes but this a100 card also has a die like 3x the size so it spreads heat out better then any of the gaming cards
@@PAcifisti To be fair, the A100 has low clocks and has a massive die size at 830mm2.
It's not even fair lmao.
Same thing about current desktop Ampere: 20% larger than the largest RDNA2 die.
"No I like to go in dry first" Accurate depiction of how Linus' treats hardware
ooooh, unlisted with 5 views? nice
Yup
ok
Why is is unlisted?
@@noahearl nah, I just got to the video originally when it was unlisted with only 5 views
@@Dominatnix me too 😃
I hope the guy that is the owner of this card didnt die 14 times from a heart attack...
Also thank you actual owner for making all of us able to watch this tear down and video!
When Linus said "I found my gold" I thought "how sweet, he’s talking about his wife" and jake was just "pshhh please" xD
6:53 Linus: Nah, I like to go in dry first...
great 😄
6:15 This might be the point where the case was mounted to a big industrial suction cup. Manufacturers often do that when spray painting a piece of metal. You can see that on a lot of metal stuff that doesn't need to look good from the inside.
Linus: "It's not ribbed for my pleasure"
They're never ribbed for "Your" pleasure.
Flip it inside out? Lmao
No, Linus said it quite right... ;-)
@@oldguy9051 it means he's the one getting poked
Depends on your configuration.
Hey, we don't know what Linus and Yvonne are into.
Linus Torvalds: "Nvidia, f*ck you!"
Linus: "Nvidia didn't want to send me this card, but I got it anyways"
Thats the same thing right there
Not even close.
Two tech masters linus and linus
Torvalds' "f you" is genuine and he has no interest in NVIDIA's/Intel's money to keep shut about their practices ;-)
"look how crazy efficient nvidia is!" isn't quite the same as "fuck you".
@@lunakoala5053 Linus is still learning.
"No, I like to go in dry first" 6:53
- Linus 2022
"Your butt is nerds butt" Yea these guys super ghaaaaaayyyyyyyyy
Note on TensorFlow and VRam utilization: TensorFlow allocates all of the available VRam even though it might not use all of it. Furthermore, in my studies, models ran considerably slower with XLA enabled. Would be interesting to know how the cards perform with XLA off!
That fan sending the GPU in for them to do whatever they want to it almost makes up for them being a cryptobro.
Almost.
You are the reason they decided to stay anonymous. You and the whole toxic gaming community.
cringe
@@joeschmo123 what is cringe
@@AR15ORIGINAL you
@@joeschmo123 W
"You can do anything you want with it"
*Linus chooses to drop the card*
you guys are the geekiest.....I don't understand a thing you are talking about, but I am fascinated and really enjoying watching and listening to you geek out....I will like and subscribe just to reward your enthusiasm!!
how tf r they geeky
@@spaghettiarmmachine7445 here is the dictionary definition of "geek"
"engage in or discuss computer-related tasks obsessively or with great attention to technical detail.
"we all geeked out for a bit and exchanged ICQ/MSN/AOL/website information"
It was not meant as an insult or derogatory. I do believe they engaged in computer related tasks with great attention to technical detail. Anyway I loved their enthusiasm for their subject and although I did not understand it.... I enjoyed watching their absolute joy discovering the technical intricacies of the product they were reviewing. Sorry if I offended you...
@@spaghettiarmmachine7445 How tf do you watch this video and *NOT* think they are LMFAO.
Like bro when you're literally fawning over a piece of computer tech that pretty much no normal consumer will ever own in their life, and spitting nerdfacts and terminology that almost nobody will intricately understand unless you have a very deep grasp of the subject matter...at that point is literally the definition of the word.
@@cantunerecordsalvinharriso2872 Don't apologize to these spoon brains lol. They all dress up in their mothers undergarments.
@@spaghettiarmmachine7445 get
6:56 in response to Jake’s alcohol offer, Linus - “I like to go in dry first.” Did you notice Jake’s immediate modified face-palm? We know you laughed, Jake. Gotta ❤️ these guys.
Nvidia: "You can't have that."
Linus: "Why not?"
Nvidia: (gestures wildly at 1:18, where Linus puts his greasy fingers all over the contact leads for the socket on a LOANER CARD.)
They're gold plated; they can't corrode. A little alcohol will make it as good as new.
@@Telogor that matters less to me. What matters to me is the extremely candid mishandling of equipment that does not belong to you. Linus should know better, and he chooses to do stupid things anyway.
If he's into crypto, and got it for a presumably larger mining op, he obviously doesn't give a shit about $10k as it's what our portfolios move every 30 sec lol. If it was my card i wouldn't care if he wiped his ass on it so long it was cleaned and working when i got it back, like the guy said. You can get off your high horse now, go be upset over nothing somewhere else.
@@TacComControl bro i think hes capable of not breaking a graphics card its kind of his job
@@jcartiii apparently not if he doesn't know the most basic of handling instructions. Stop being a fanboy, your boy fucked up.
as a structural biologist that uses this kind of hardware fairly often, its really cool actually contextualising how crazy the hardware we use is hahahaha
No you not
bend a multi layer pcb and you run the risk of breaking tracks guys that may "reconnect" temporarily then disconnect randomly under under heating/etc
Check the temperature of a100. If it's touching 80 C then there is a performance throttle. Tried similar setup with v100, there was huge performance gain using a 6000 rpm fan for cooling.
1:48 Yeah.... That's a VERY GOOD SIGN about Linus returning the fan's GPU *"in working condition".*
I'd be interested in seeing how they compare rendering a single tile all at once. That's what got me when I switched from a 2080 Ti to a 3090, not realizing right away that it would render 2k tiles without sweating, but rendering out a bunch of smaller tiles they were rendering basically the same times.
7:59 "Now kith" 🤣 That's so dumb but so funny. 10/10 editor(s)
A100s are nuts. I really want to get one of those cards
well if you look at the MI250x they are a lot more NUTS
What would you do with it, out of curiosity?
oh shit. the og meme creator
@@intosanctuarysitechannel3732 It prints money; do you even need to ask?
@@RickMyBalls imagine mining crypto on that would it be worth the price tho?
6:53 Linus out of context:
It's pretty amazing how fast we can render stuff with consumer hardware nowadays. Blender's 3.0 version (with Cycles X) renders even faster now. What used to take multiple minutes a few years ago with my GTX 1080 now takes less than 30 seconds with my 3080
Wow, this guy had a 1080. Way to flex on everyone still trying to find something better than a 730 lmao
@@sakaraist Not his problem when some people are this crazy behind whats usable at the moment.
Yeah and Linus is still using the 2.x version of Blender here which uses tile rendering. 3.0 uses progressive rendering and is in some cases double the speed. Linus should upgrade blender :)
@@sakaraist Lmao I sold it at the start of the gpu shortage when prices were still relatively normal so I didn't get much for it
@@sakaraist My 5600g be like,
I would so buy this whole thing
the promoted service at the begining seems very tempting
I love Linus and Jakes's chemistry so damn much and I love how it has evolved.
Jakes started out has the apprentice while nowadays, he's actually the more mature of the two
I found it the opposite, awkward between the two
Ants the real brains at ltt
@@fuckingwebsite1 The Mancubus, thankyou very much.
you love their chemistry ? do you want to see them naked "googling" each other ? you do ! don't you !
@@yellowboat8773 must have been one of those negative reality inversions...
If it was meant for crunching big datasets and machine learning, I would expect it to be optimized for massive mathematical computations not rendering. (large matrix multiplication)
Correct - the FP64 numbers even on the A5000 - A6000 completely destroy the RTX 3090. I'm not sure why he tested blender with it lol.
You can fit a whole operating system in the VRAM of that card. Crazy.
Windows 11 is like less than 10 or 5 gbs
I know what you mean, but you know, DOS fits on a floppy
Most Linux Distros as well
There are fully featured Linux distros that are less than 100 MB, so that's not really that impressive.
Use lubuntu. That's what, less than 1 GB? It performs pretty decent too.
@3:40 Who made the recently released Prime TV series: "Secret Level Series" after Netflix debut'd: "Love Death Robots."
@4:54 GPU/CPU crossover.
19:37 The memory usage is a bit misleading. If you are using tensorflow, it always reserves all the GPU memory when it launch even though it don't actually use that much.
yes
yeah, you need to set the memory growth to true, to actually get an accurate memory reading.
9:20 was so unexpectedly sincere and now I’m crying while watching a fracking tech tips video
Ohhhh this takes me back to my grad school days when these cards were branded as "Tesla." It was wild how a single (albeit $10k) graphics card could enable me to run numerical simulations faster than I could on my undergrad's Beowulf cluster. This was back in the days where if you wanted to do GPGPU, you were writing your own low-level routines in CUDA's C library and mostly having to figure out your own automation. I can't even imagine how things have changed now that you can abstract everything to a high level python API and just let the thing do its thing.
cool story
Thats pretty darned hardcore :)
No I totally understand what you mean. Every single word. Totally.
It's pretty fucking dope. Source- I worked with a guy doing his PhD in computational neuroscience using a python API machine learning algorithm to search for putative appositions in rat brains.
that sounds like it makes sense.
There's a typo at 3:25
The A100 has ~54 Billion transistors on it. The 54.2 million listed would put the card firmly between a Pentium III and a Pentium 4 in terms of transistor count with a curiously big die for the lithographic node.
This must be EXTREMELY confidential if you had to go to SUCH an extreme just to prevent your "lender" from being identified. I know doing this is very much a risk for them, so good on you for minimizing that risk for them.
Okay, I, just, perfectly understand, but, unfortunately, I have one, huge, question for you, Linus. Preventing your lender from being identified, it, just, kind of boggles me. This is, because, in the words of car enthusiast, @Doug DeMuro, in his first video of one of Japanese car manufacturer, @Nissan’s 4, very obscure, “Pike Cars”, The 1989 @Nissan S - Cargo, quote, “Why would you do this?”. I need, in this context, that, question to be answered, and, after you ship it back, Linus, actually, reveal who, exactly, though, sent you the card, or, maybe in your next Tech Tips episode, or, maybe, even, in next week’s WAN Show.
@@avoprim5028 Bruhhh there's a reason why they're unidentified.
@@avoprim5028 Why bring Doug into this?
@@leepicgaymer5464 yeah im confused
What’s nvidia gonna do though assassinate the lender? Just sounds paranoid to me.
The near double performance you're getting near the end is because of nvidia's 1:1 fp16 ratio on consumer GPUs, also nvidia has a 1:64 fp64 ratio on consumer GPUs.
Basically they artificially limited consumer GPUs to force machine learning(wants fp16) and scientific calculation(wants fp64) people to buy way more expensive "professional" products.
LHR is just a newcomer to the party, not exactly a new concept.
Same energy
Can you elaborate on what you mean by '1:1 fp16 ratio' and '1:64 fp64 ratio'? (I am familiar with the datatypes themselves and the consequences for performance/vectorization on CPUs)
BTW AMD has full speed FP16 and 1:16 FP64 on their consumer GPUs.
Yeah that's not really accurate. Geforce cards physically don't have many fp64 units to make them cheaper, also tensorcore fp16 matmul with 16-bit accumulator runs at full rate. Only fp16 matmul with fp32 accumulator is artificially restricted, which can be useful in some workloads but often accumulation in fp16 does the job. The 1:1 fp16 ratio you talk about is on Cuda cores which nobody uses for deep learning since tensor cores are much faster.
This is a very interesting video. I liked it a lot more than I thought I would. There isn't much coverage of Nvidia's data center line from tech TH-camrs typically so it was fun learning a bit about that aspect of their GPU division 😊👍
man I wish I was half as smart and cool as you guys. I enjoy watching these videos as a casual consumer and its so cool you guys basically dance around code and near instantly recognise items instantly and with ease! I mean seeing the teardowns are always a joy too.
The fastest GPU ever video arrived in my recommendations 18 seconds after publishing. That's fast!
I've watched a few Linus videos here and there, and this one I also just happened to click on by chance, but I just want to say I really like that Jake guy! He seems very knowledgable but also very chill and humorous. He's a great addition to the team!
He's a bottom.
I couldnt agree more!
Just wanna say Linus' cheesy but wholesome shoutout to his wife at 9:18 made my day. Classy move, Linus, classy move.
I love Linus and all of his Employee's. He's done such an amazing job over time collecting all the right poeple!
This thing would eat large batches of training data like a champ. I used it in a research lab, absolutely insane cards.
8:46 Jake became Charlie "Yeah, babyyyyyyyyyyy" 😂😂😂
Very cool Video about this amazing card. A good comparison would have been the Tesla T4, which is another popular card for machine learning (since it offers the currently best total cost to performance tradeoff for that workload). Also a small hint: The benchmark doesn't acutally use that much VRAM, but tensorflow allocates almost the entire VRAM by default (even if it doesn't require / use it). The actual VRAM usage of ResNet50 FP16 with batch size 512 should be about 22 GB (since I know its 22GB for FP32 bs 256).
Tesla T4 is primarily meant for inference
22:08 that aged well...
Indeed
so glad they contextualized the AI performance, wish they would branch into that field more, and even have a tech review integration for AI performance. they *must* know about stuff like Stable Diffusion, so it'd be useful and fun, especially when the 40 series comes out, or testing things like the ARC if they ever get an opportunity. i'm still curious if they ever explored AI performance of ARC.
That crap ain't real art, we don't wanna see it