GPU Performance Benchmarking for Deep Learning - P40 vs P100 vs RTX 3090

TheDataDaddi

มุมมอง 1 484

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 22 มิ.ย. 2024
In this video, I benchmark the performance of three of my favorite GPUs for deep learning (DL): the P40, P100, and RTX 3090. Using my custom benchmarking suite, BenchDaddi, I assess the performance of these GPUs across three major DL architectures: CNN, RNN, and Transformers. Whether you're a data scientist, a machine learning engineer, or just an AI enthusiast, this comparison will provide valuable insights into the capabilities of these GPUs.
In this video, you'll discover:
Benchmark Tests: Detailed performance benchmarks across various AI/ML/DL workloads.
Analysis & Insights: In-depth analysis of the results, highlighting strengths and weaknesses.
Use Case Suitability: Recommendations on which GPU is best suited for different types of AI/ML/DL tasks.
#DeepLearning #MachineLearning #AI #GPUBenchmark #NVIDIAGPU #RTX3090 #P40 #P100 #TechReview #DataScience #MLPerformance #DLPerformance #GPUShootout #TechComparison #ArtificialIntelligence #NeuralNetworks #CNN #RNN #Transformers #TechBenchmark #AIEnthusiast
🎥 Other Related Videos:
AI/ML/DL GPU Buying Guide 2024: Get the Most AI Power for Your Budget
• AI/ML/DL GPU Buying Gu...
An Open Source GPU Benchmarking Project: BenchDaddi
• An Open Source GPU Ben...
8 GPU Server Setup for AI/ML/DL: Supermicro SuperServer 4028GR-TRT
• 8 GPU Server Setup for...
📚 Video Resources:
Link To Looker Report (Data Visualization From Video)
lookerstudio.google.com/repor...
Link to Raw GPU Benchmarking Data in Google Sheets
docs.google.com/spreadsheets/...
GitHub Repo For BenchDaddi Benchmarking Suite
github.com/thedatadaddi/Bench...
Looker Studio - Free Data Visualization Platform
lookerstudio.google.com/overview
** PLEASE COLLABORATE **
I cannot possible buy and test every GPU and hardware configuration, but with your help together we can build a library of benchmark data for the betterment of the AI/ML/DL community as a whole. All you need to do to help is pull down the benchmarking suite and run it on your own machines to test the GPUs you have. Then enter the data into google sheets link below under the "new_results" tab. The og_results tab is the original data I used to make this video.
GitHub Repo For BenchDaddi Benchmarking Suite
github.com/thedatadaddi/Bench...
Collaborative Results Google Sheet
docs.google.com/spreadsheets/...
HOW TO GET IN CONTACT WITH ME
🐦 X (Formerly Twitter): @TheDataDaddi
📧 Email: skingutube22@gmail.com
💬 Discord: / discord
Feel free to connect with me on X (Formerly Twitter) or shoot me an email for any inquiries, questions, collaborations, or just to say hello! 👋
HOW TO SUPPORT MY CHANNEL
If you found this content useful, please consider buying me a coffee at the link below. This goes a long way in helping me through grad school and allows me to continue making the best content possible.
Buy Me a Coffee
www.buymeacoffee.com/TheDataD...
As a cryptocurrency enthusiast, I warmly welcome donations in crypto. If you're inclined to support my work this way, please feel free to use the following addresses:
Bitcoin (BTC) Address: bc1q3hh904l4uttmge6p58kjhrw4v9clnc6ec0jns7
Ethereum (ETH) Address: 0x733471ED0A46a317A10bf5ea71b399151A4bd6BE
Should you prefer to donate in a cryptocurrency other than Bitcoin or Ethereum, please don't hesitate to reach out, and I'll provide you with the appropriate wallet address.
Thanks for your support!
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 42

@repixelatedmc 3 วันที่ผ่านมา ⁺¹
Wow! Easy to follow, no gibberish, and pure information with clear and readable statistics!
@TheDataDaddi 3 วันที่ผ่านมา
Hi there. Thanks so much for the positive feedback. So glad you found it clear and useful!
@H0mework 4 วันที่ผ่านมา ⁺⁵
I'm happy whenever you upload
@TheDataDaddi 4 วันที่ผ่านมา
Hi there. Thank you so much for the kind words! I really, really appreciate it! Makes all the work to make the videos worth it.
@Artikel.1 วันที่ผ่านมา ⁺¹
Really great Video! I am considering working with ML and developing myself further in the field of AI. But the price of the P40 seems very strange to me. I feel like I haven't found a P40 anywhere on the internet that costs less than $200. Maybe it's because I live somewhere else. But more than $200 for a single P40 is a bit much for me. I'm still in school, so the price is a deciding factor.
@gorangagrawal 4 วันที่ผ่านมา ⁺²
If possible, please upload NVLink and PCIe extender video. It would be really helpful to understand them.
@TheDataDaddi 4 วันที่ผ่านมา
Hi there. Thanks for your comment!
Do you mean like a general video on how they work? Or specifically with respect build I have done on the channel? Either way here is a video I have that might help you here.
th-cam.com/video/zrcKGF156bA/w-d-xo.html
Let me know if this helps explain things for you!
@gorangagrawal 4 วันที่ผ่านมา
@@TheDataDaddi Thanks for sharing your build video. It covers the information I was looking for.
@BaldyMacbeard 4 วันที่ผ่านมา ⁺¹
Wow... Randomly stumbled upon the video. Thanks super useful! I wish someone would do a nice dataset like this, but with multi-GPU configs with nvlink vs. pcie and so on
@TheDataDaddi 3 วันที่ผ่านมา
Hi there. Thanks so much for the comment!
Gotcha. So in the video the RTX are test in 1 and 2 gpus configurations with NVLink. I will definitely try to make a video in the future highlighting the performance difference with NVLink vs PCIE. I would have done it on this video, but I am unfortunately not able to physically access my GPUs for the next few months. Thanks so much for the suggestion and please stay tuned for a video on this topic!
@ICanDoThatToo2 4 วันที่ผ่านมา ⁺¹
Thanks for this! We've been wondering since Craft Computing mentioned it recently. But ..
30:00 I don't follow your math here. Firstly, the 2xP40 bar _says_ 10.74, but lies on the graph at over 15. I believe this bar should stop at 10.74 which not only shows its true value, but the height of the blue bar would visually show the performance added by the 2nd GPU.
2nd, I can't see where Throughput per Dollar numbers come from. The 3090 has T=17 and $=820, so should appear here at T/$=0.02 or $/T=48. Where did the 141 come from?
3rd, if you're going to look at running costs, then electricity is very important. In some locations electricity costs can exceed server costs in well under a year. It's the reason this hardware is so cheap -- companies can't afford to keep it running.
@TheDataDaddi 3 วันที่ผ่านมา
Hi there. Thanks so much for the comment!
1) Yes, you are absolutely right. This graph does read badly. I stacked them to save space because my laptop screen is small, and I thought it would be a bit more readable for the video. However, I do agree it is misleading the way it is. I will update the report to fix this.
2) This has to do with the way the average is calculated in this case. The CPU-scaled throughput for each scenario is divided by the price of the GPU(s) for each different scenario (GPU, Number of GPUs, Model, Precision, Task, etc). For more specific details, please take a look at the raw data in the Google Sheet link in the video description. It should make things clearer if there is confusion here.
After some review, in a fair number of cases, the LSTM did not see much performance benefit over the CPU. This drove the GPU-scaled throughput per dollar way up nominally. These values artificially dragged up the overall average. I tested this by switching the aggregation method to median rather than average, and the values are more in line with what you would expect based on your observation here. You can also see this if you just look at the BERT or RESNET50 model scenarios. These are much more in line with what you would expect.
In summary, the numbers do appear to be correct even if they are higher than expected globally. The data when training the LSTM was significantly different than the other models. This leads me to believe there may be some issue with the model setup, dataset, or hyperparameters (or some other reason I am missing). My gut tells me I just did not use large enough batch sizes or a large enough dataset. In any case, more digging will be required to understand why this occurred.
3) Yes. This is an excellent point. I actually wanted to include this in the video; however, I am not able to do so at the moment. I am away from my homelab for the summer, so I am not able to access the devices I have set up to measure the power consumption for these GPUs. Once I get back, I plan on making a specific video just to address this. I apologize for not being able to include it in this video, as I do definitely agree this is a major factor to consider when comparing GPUs, especially if electricity costs are high. If you are interested, I invite you to stay tuned, and I will put out a video on this topic as soon as I am able.
I really appreciate your feedback here. There is truly no substitute for a second set of eyes. Hope I answered your questions here. Cheers!
@JimCareyMulligan 4 วันที่ผ่านมา ⁺²
Thank you for your work. Do you have any plans to test the tesla v100 16 GB? They are goes for half the price of the 3090 and support nvlink.
@TheDataDaddi 4 วันที่ผ่านมา ⁺¹
Hi there. Thanks so much for your comment!
So, at the moment, I do not have any plans to test the V100 16GB. Where I am located, there are almost the exact same price as the RTX 3090 for less VRAM. So I hadn't really considered them at this point because I think there are better options for the price. However, if you can find them at half the cost please let me know where, and I would be happy to make a video on them.
Would also be willing to test people's GPUs. For example, if you had a V100 16GB, I would be willing to pay for shipping both ways so I can test it. Idk if you or anyone would go for this, but that would allow me to test more GPUs without have to incur the full cost of buying them.
I have been looking at the V100 32GB SMX2 versions though. These have tons of VRAM and great performance for less the $1000 on EBAY where I am. Only problem is finding a server to put them in. There are not many options that I can find. So if I do make a video it will likely be with those GPUs not the standard PCIE 3.0 V100 16GB.
@werthersoriginal 4 วันที่ผ่านมา ⁺²
Oh wow, I'm in the market for the 3090 for LLMs but I've been eyeing the P40s because of their price. I saw your videos on the R720s and now I'm wondering if I can put a P40 in my R710.
@TheDataDaddi 4 วันที่ผ่านมา
Hi there. Thanks so much for the question!
In theory, this should absolutely possible. I can't vouch for it because I have never tried it personally, but I would be surprised if it didnt work. Only thing to note here is the r710 has PCIE 2.0 so data transfer might be a bottleneck at some point.
Really curious about this so if you end up trying please do let me know how it turns out! Best of luck!
@scentilatingone2148 4 วันที่ผ่านมา ⁺²
Brilliant bud.
@TheDataDaddi 4 วันที่ผ่านมา
Hi there. Really appreciate the kind words! So glad you enjoyed the content!
@jaroman 4 วันที่ผ่านมา ⁺¹
PCI 3.0 vs PCI 4.0 makes a difference in this kind of setups?
@TheDataDaddi 4 วันที่ผ่านมา ⁺¹
Hi there. Thanks so much for the question!
I have not used PCIE 4.0 because none of my servers support it (the ones that do are much more expensive) so I cannot say for sure how much of a difference it makes. What I can say is that while using PCIE 3.0, I have not experiences any major bottlenecks due to data transfer. Even the evidence in this video supports that. Now, if you have better GPUs and are loading extremely large batch sizes then PCIE 4.0 might make a huge difference. For me personally, so far I would say that I have not really "needed" PCIE 4.0.
@VastCNC 4 วันที่ผ่านมา ⁺¹
I’m coming from power bi as well and the new company I’m working with is a Google shop, so definitely interested into a looker studio comparison if you’re game.
@TheDataDaddi 3 วันที่ผ่านมา ⁺¹
Hi there! Thanks so much for the feedback.
Awesome. I am glad to hear there is some interest here. This is definitely of the better tools I have used, and its FREE. lol. Stay tuned for a video here. I'll try to make one as soon as I get a free evening!
@VastCNC 2 วันที่ผ่านมา
@@TheDataDaddi the most critical question is though, do they have a dark mode that is decent? Microsoft is weak in the dark mode game for office suite
@Horesmi 4 วันที่ผ่านมา ⁺²
For some reason, 3090s are going out for $500 over here, and there are a lot of them on the market. Crypto crash or something? Anyway, that seems to change the calculations a lot in my case
@publicsectordirect982 4 วันที่ผ่านมา ⁺²
Over where?
@Horesmi 4 วันที่ผ่านมา
@@publicsectordirect982 Ukraine
@ericspecullaas2841 4 วันที่ผ่านมา ⁺²
@publicsectordirect982 my guess China......
@Horesmi 4 วันที่ผ่านมา
@@publicsectordirect982 Ukraine
@TheDataDaddi 4 วันที่ผ่านมา ⁺¹
Hey there. Thanks so much for sharing!
Yeah that does change things quite a bit! If you can get the RTX 3090 for only $500, first of all I am jealous. lol. Second of all, I think you are going to be hard pressed to find a better GPU for that price. You can also use my spread sheet and plug in the numbers to see how it compares to the p40 and p100 in your area to make a more data driven decision. Personally though I'd go with the RTX 3090 for that price! Cheers!
@drewroyster3046 3 วันที่ผ่านมา ⁺²
Sorry to be this guy but anyone got a TLDR?
@TheDataDaddi 3 วันที่ผ่านมา ⁺¹
Hi there. Thanks for the comment!
Yeah, I apologize for the video length. There was just a lot to discuss. I know a lot people don't have time for full videos of this length (myself included) so maybe I'll start doing a video summary at the beginning so those that are pressed for time can watch that and move on. For those reading this, please let me know if you think this might be beneficial.
TLDR
Anyway, I'll do my best to summarize in a paragraph. P40 and P100 are still some of the best (if not the best) GPUs you can buy right now for the money. P40 is overall my recommendation for most people for general AI/ML/DL tasks due to comparable performance to the P100 and more VRAM (P40 5.5x CPU vs P100 6.5x CPU). P100 is still the most well rounded GPU for the money so if you need various levels of precision for your workloads this would be a good choice. RTX 3090 offers best performance of the three overall (17x CPU) especially if you have use cases where you use or can use mixed precision. For those interested, in working with LLMs specifically this would be my recommendation.
Hope this helps! Please let me know if you have any specific questions, and I will do my best to answer them succinctly.
@drewroyster3046 2 วันที่ผ่านมา
@@TheDataDaddi thank you! Seemed like thorough well thought out content but didn’t have the time for the full deep dive. Thanks!
@makerspersona5456 4 วันที่ผ่านมา ⁺²
we cant hear you... :(
@makerspersona5456 4 วันที่ผ่านมา
might be good to invest in a good mic and make these videos more concise too :)
@makerspersona5456 4 วันที่ผ่านมา
its informative content but feels like a corporate team meeting. sure this would appeal to more ppl if you made it more youtube friendly
@TheDataDaddi 4 วันที่ผ่านมา ⁺¹
Hi there. Thank you so much for letting me know.
Oh no! I am so sorry. I am traveling right now so I do not have my normal nice mic for recording. I really apologize. I did spot check the video and the parts I watched seemed to have at least okay audio. Can you let me know specifically where in the video the sound is bad? I will do my best to fix it.
@TheDataDaddi 4 วันที่ผ่านมา
Agreed. I will try to find a better solution for while I am traveling. Also, I do apologize for the length of the video sometimes it is hard to guess what people will find interesting and what is to much. Hopefully the automatic time stamps and fast forward feature will allow viewer to skip the parts they find boring. I really do appreciate the feedback here though. I have gotten this before, and believe it or not I am working on being more concise in my videos. I will try to continue improving here.
@makerspersona5456 4 วันที่ผ่านมา ⁺¹
@@TheDataDaddithe video is extremely useful and well made in terms of content. I just have to max out my volume from the start and play at 1.5x. It’s really just my opinion and no one else has said this.

ต่อไป

เล่นอัตโนมัติ

I Bought a $5000 PC in a Random Asian Tech Mall