Brother - you went down a serious rabbit hole. A man after my own heart. One thing you didn't mention that is very important to data integrity and therefore results is ECC RAM. The pro Nvidia GPUs have ECC RAM. Most casual users don't realize how many bits are flipped through flaws in silicon and cosmic events (literally). Then some garbage gets written to disk. That's why I would never own a workstation that doesn't have ECC from top to bottom. Better if the OS / Hypervisor is using ZFS (the best file system IMO... with 40 years building enterprise and global systems). Consumer equipment is fine to test and learn on, especially on a budget. But if you want data and result integrity, at a minimum, buy pro equipment. Like you - I've had great results buying refurbs from eBay. Used electronic prices drop faster than pulling out of an auto sales lot in a new car. Well - at least that used to be true. But it still is for workstations and servers. I recently looked at some Dell PowerEdge 730's (NVMe M2 bootable) with 128 GB ECC RAM and dual, upper-end, v3 Xeon processors for about $400 - with IDRAC (out of band management). You did emphasize the use case that the GPU is for, and that's what I'd emphasize too. If the data you are processing isn't something that you can't replace with numerous backups, or can't suffer glitches - go with professional equipment - either new or used. Then use ZFS RAID and not hardware RAID. ZFS controls the whole data stack - from RAM to permanent storage. You'll want to disable hardware RAID so firmware doesn't fight with ZFS. If you are learning and experimenting, but not relying on the end result - use cheaper consumer products. The learning curve is lower and so is the price tag. If you've never managed a PowerEdge server - that is an entirely different animal because it has to be. The difference between one of those and a consumer PC is like the difference between a flip phone and a Linux workstation. Night and day. But if you're a nerd like me - that's what you want as a platform.
Hey there. Thanks so much for the detailed response! I really appreciate comments like this. Yeah I definitely went down a rabbit hole for sure with the video. Really did not mean too, but once I started I couldn't stop. lol I will definitely keep that in mind going forward. I have never really had any issues thus far but what you are saying definitely makes sense. Also, from my research it really does not appear to be too much more expensive. Yeah, I actually have 3 Dell PowerEdge R720s right now in my basement in a rack. Unfortunately, I do not believe any of them have ECC RAM though. Always wondered about ZFS RAID never used it myself, but I will have to go that route. The next server I buy I want something newer, and I will make sure to buy ECC RAM and use ZFS RAID. Also, that is a great price on the R730. Is it still available? lol
I can attest. I would not consider myself a casual user, but that is something I never really considered until recently. Upon further research, this is indeed quite common and definitely highlights the need for ECC ram especially for critical system or production environments. @@Christopher-lb6rf
@@Christopher-lb6rf when I was a hiring manager for sys admins and someone would put 'expert' on their field of knowledge related to Sun servers, you can bet questions about errors and external errors came up - found a few, mostly people that had worked at Sun - answered/ knew the area I was interested in :)
@@jeffm4284 this was a very helpful comment. Thank you! I am learning at this stage and building my first nas/pc, with one of my goals being local AI with fine tuning + RAG for clinical notetaking. I'm not (currently) paid nor can I pay myself for this added work, it's for my learning, development, and the long term good of my clients. In essence, I'm experimenting and not relying on the end result, but I fully intend to rely on it 3-5 years from now. From a cost benefit time/money perspective (and if you were a relative noob), is it worth it to start on consumer equitment, then have to spend the additional time money to upgrade most of your hardware later on? My previous plan was to put a 16gb vram 4070 ti super in my pc, build a nas, learn, and add a 24gb 3090 or 4090 in a seperate enclosure later on (for dual gpus on my pc or single gpus on both pc and nas). But full ECC sounds essential down the line in my use case, so now I'm not so sure.
00:11 Choose GPU with newer architecture for better performance 02:28 Choose NVIDIA GPUs with active support and sufficient VRAM for future scalability. 06:51 Key considerations for choosing an NVIDIA GPU for deep learning 09:08 Consider driver support for deep learning framework compatibility. 13:10 Factors to consider when choosing an NVIDIA GPU for deep learning 15:12 Understanding the key GPU metrics is crucial for making the right choice. 19:46 Choosing GPU based on performance, memory, and bandwidth criteria. 22:00 GeForce RTX 2060 Super and GeForce RTX 4060 TI 8 Gbit are the best bang for your buck GPUs. 26:27 Comparison of NVIDIA GPU models for Deep Learning in 2023 28:45 GeForce RTX 4060 Ti 16GB has the best raw performance 33:18 Choosing NVIDIA GPUs for Deep Learning in 2023 35:36 Best bang for your buck: P100 and P40 GPUs 39:22 P100 and P40 are recommended for deep learning Crafted by Merlin AI.
You are definitely right. There is a reason most companies use the cloud rather than on prem hardware. These are definitely arguments in both directions. In my case, however, I cannot afford the could based solution and like full control my own hardware.
well this is what i actually thinking, most "best bang for the bucks" chart always forget that the longer it took to generate can also considered as a loss
GFLOPS is not calculated like shown in the video at 15:18, remove the Giga which we know, FLOating Point operations per Second, simply(there is some history for why this is used). It is somewhat archaic since a lot if other things are being done too, which aren't incorporated in this but in general most other operations take less cycles than a floating point one does because the comma needs special attention so to speak, 15*15 and 1,5*1,5 are the same thing except for tracking the comma separately with the result being 225 or 2,25. What I mean is the circuit needs additional logic to track commas or rather fractions so to speak, which is why we separate floating point from integer operations - additional hardware is required to track the comma "in top of" the integer type numerical operations. No idea if this makes any sense or is useful, I thought it would be simple to explain until I thought it through and realized I need to type this as opposed to scribble and show on a white board. I'm sure there is a good explanation for it out there, just trying to point to why since to a person doing math it's not as obvious as it is designing a circuit to do it.
Hi there. Thanks so much for the comment! This is great information. Thanks so much for sharing. I did know that floating point ops were different fundamentally than other operations though I was not 100% sure why. This makes a lot of sense! Also, if you know the correct formula for calculating FLOPs in general please let me know.
This is a fantastic video explaining how to choose a GPU for deep learning/AI/ML. He extended Tim Dettmers single GPU performance chart into a masterpiece of a spreadsheet and PowerGI dashboard. Masterful. I wonder if you factored electricity cost, the cost of removing heat from the room, and total decibel output into the decision. I see in a subsequent video that the server is installed in what looks like a basement. The rack is within a few feet of a gas can. Those Dell machines can run hot, so you might want to move the gas can elsewhere. How noisy is the final product with two P100's?
Hi Prent. Thank you so much for the kind words. This took me quite a while to put together so I really appreciate the positive feedback. I have not directly calculated the electricity cost for all of the GPUs. However, for the P100s in the dell PowerEdge R720. I estimated it costing roughly about $200 per year base on 15 ¢/kWh avg in GA where I live and my anticipated usage. However, that is really just a guess. I need to buy a gauge to actually measure average power consumption over a month or so and extrapolate that out. This seem like a good idea for another video so stay tuned. lol The heat removal has not been an issue at this point as the basement is fairly large and stays naturally around 65 F. I have to see what it is in a week or so after the servers heat things up. Also, as far as the noise is concerned. As long as it is in the basement or a room that you don't use often it is fine. They were in my bed room for awhile, but had to move them because of the noise lol. As far as decibels are concerned, I have not measured. However, If you are interested I can try to check and let you know more precisely. Thankfully the gas cans are empty, but I agree probably a good idea to move them. Thanks again for the interest! Glad you found this helpful.
Great video, I'm very grateful; it was worth watching in its entirety. Thank you for your effort, greetings from Panama. Your video will greatly assist me in a project that my classmates and I want to undertake at the university. Many thanks for sharing such valuable information
My brother... Thank you so much for this video. I'm pretty new to this and was JUST about to start a similar workbook before I thought to check opinions on TH-cam. You saved me at least a day of figuring out weights and priorities. I completely agree with your logic and thought process.
Hey there! Thanks so much for the kind words. I am so glad to know that this was useful for you. I am hoping to have my website up and running in the next couple months that will basically have this work book available in real time with up to date pricing. Stay tuned!
I'm watching this as a newbie from a hotel room on my laptop with sub-par speakers. Just 1 request from my unique context would be to amp up the volume on future uploads so it is easier to listen when in similar situations.
@@TheDataDaddi hi. Thank you for the quick reply and I appreciate the willingness to up the volume. Just wanted to add. The audio issue was on my laptop’s end and roughly half way through the video, the volume issue fixed itself. Earlier they sounded like they were at half volume. I forgot to edit my comment earlier. Much love and support ❤️
Hi there. Thanks so much for the positive feedback! I am so glad the video was useful for you and thank you very much for subscribing. I hope you continue to enjoy the content!
It would be nice to add in energy consumption, heat vs cool, most durable, and real life comparison using an actually local LLM system (and which LLM size) with these cards. For instance, the best card for single use performance, vs multi use performance in using a local LLM system like privateGPT, for an example.
Yep this is coming eventually. Things like that just take awhile to do correctly. I am working right now to create a set of comprehensive benchmarks that the community can use to evaluate GPU performance across the major AI/ML/DL areas.
I wish I had watched this video 1 day ago. Great material for beginners in ML. Thank you. I ended up choosing a RTX 4070 12gb. Not the best choice for money, but I guess still very powerful
Hey man! Thanks so much for the feedback. Yeah, unfortunately, most of the 4000 series do not have the greatest price to performance ratio. However, they perform better at general task list rendering and gaming so there is definitely something to be said for them if you have the money to afford them. Anyway, that said I think the 4070 is fine GPU and will be an absolute work horse when it comes to smaller machine learning problems. if you remember, let me know how you like it after a few months of experimenting with it. Very curious about the performance of the 4000 series in general. Thanks again for your feedback!
@@TheDataDaddi GRAPHICS CARD Msi Rtx 4070 Ti Ventus 3X OC 12GB GDDR6X is the graphic card good please help me i am thinking to buy black dimond 2.0 from dell which give me this gpu please give me some suggestion
Hi there. Thanks for the comment. I think this would be a fine GPU overall. It would help to know your use case, but I would say that it is a good GPU for most small to mid size application. Since this is prebuilt machine, do you have any other options for GPUs?@@cattnation6257
I heard that Tensor Core are most important for AI, ML and then Cuda cores are secondly important. I don't think I seen Tensor code referenced on you sheet.
Yes they are definitely a consideration. Admittedly I should have included those stats as well. I will likely go back an add them as well. However, in most cases the higher number of cuda core also translates into a higher number of tensor cores for those GPUs that have them.
Nice work. I think the only big flaw I see in your analysis is that purchase price is not the entire upfront cost. Each card should have an overhead cost based on the fraction of a chassis, mobo+cpu, and PSU it would use. I think you just had a chassis for 2 cards as a sunk cost in your mind so it didn't matter. But for anyone building a full system (or systems) it would have a big impact on their purchasing decision.
Hi there! Thanks for the kind words and the feedback! This is a good point. I was doing all this research to figure out what to put in my Dell Power Edge R720 so my analysis is a bit biased in that way I suppose. My assumption was just that people would be able to compare GPUs most easily by looking at purchase price to performance. However, I do agree for those that are building or using different server's this might shift the total cost significantly. For example, if you wanted to work with RTX 4090s (even in my server) I would have to buy external PSU(s) for power in order to use them. That would likely mean I would have to build a entire external rig so that would definitely increase the overall cost. These are definitely things to consider when thinking about your build. I appreciate you bringing that up, and I will definitely keep it in mind for any future analysis I do. Thanks again!
@@TheDataDaddi It is a helpful comparison but as with everything else, no tool is perfect in and of itself, it may be the perfect tool for one job but not another, or at a certain scale. Mostly for me it reinforces my previous idea which is to aim for 4090 as the most versatile and best fitting for my usecase, also because it isn't a one trick pony and can do other things as well. Some of the other options do come close but do not 'cut the mustard' for one reason or another, primarily that it needs to fit in my one rig which can only take 2 GPU's max physically but then power becomes a limit since 1200w is the biggest PSU that makes sense since above it you run into breakers popping and wiring becoming a factor etc. Every possible factor can't be realistically accounted for or factored in, in a spreadsheet. The 'slot cost' I think would be valuable for many to be able to add to it, since some uses do not require much from the platform itself while others very much do.
@@noth606 You are definitely right. One of the reasons I am trying to focus more on hardware at this stage of my journey is because it so nuanced. For best results, it really should be though about on a case by case basis. This was just meant as a way to show people generally how to start thinking about GPUs for their particular use case and hopefully lessen the research burden a little bit for anyone interested.
@@TheDataDaddi I'm sure this is helpful to get people to start thinking on more concrete terms when they get to a point at which they want to put together hardware specifically to "crunch numbers" rather than just using a typical general use PC configuration. I would think that the journey in a certain sense can or might be split into 3 "stages" conceptually where stage 1 is a normal PC, stage 2 is a PC built to "crunch numbers" but at a general level and stage 3 is a problem/usecase specific PC or set of hardware designed around the exact specific task they are meant to tackle, where for stage 3 you would include efficiency calculating "work units/sec/price/watt" type factors into it. It's important to factor in time and energy on some level because they are of a slightly different nature than calculations per second are conceptually. What I mean is they are inflexible, you don't have access to unlimited amounts of either regardless of other factors. Saying this not to be patronizing, but as someone who has at times forgotten which parts of this kind of equation are inflexible 🙂 and paid a price for that. Ask my ex wife, she'd have tips and examples of the inflexibility of time and energy for sure, lol.
Hey there! Thanks so much for the kind words. I am so glad to know that this was useful for you. I am hoping to have my website up and running in the next couple months that will basically have this work book available in real time with up to date pricing. Stay tuned!
Hi, I'm a deep learning beginner with a 4070 super(12gb) looking to potentially add more gpu(s) to my 2 empty slots on the GIGABYTE areo trx50 d motherboard. I'd like to have more vram for no more than $1000-$2000 budget. What are some of my options and do I need to consider power consumption with a 1200w psu and other compatibility issues? I have a pretty good 32cores 64 thread threaripper cpu and 256gb of ram. What are some benefit of those for ML? I bought this pc for 3d simulation use originally.
So, I think the Threadripper cpu and mobo are wonderful for deep learning applications just really expensive relatively speaking. They are really the only cpu and mobo combos I know of that go over 2 x16 lane pcie slots. This means you can use more than 2 gpus with full x16 lane bandwidth. For me personally, I believe the best you can get right now for that price range is a pair of RTX 3090s with NVLink. I would scan EBAY and wait for a good deal then grab 2 when they are cheap or on sale. Hope this helps!
@@TheDataDaddi You literally changed my life. I could have gone off the deep end and bought a top-tier $1000+ gaming GPU, but something like that is just not ideal for the kind of AI stuff I'm interested in like SDXL and high-end 70b LLM's. This video is definitely going to be a re-watch to assure I retain the information you've kindly shared in the video! Thanks again!!
So glad this was helpful to you man! I am big proponent of buying what you need for your specific application. Please let me know if I can help in any way. @@goldholder8131
Thank you brother for your hard work. You have saved me a lot of time. Your spreadsheet is amazing! We can sort out GPUs by the desired category! I believe that , after viewing your results, the GeForce GPUs that are most notable (especially considering the price) is the 3080 Ti. It is close in CUDA cores to the 3090 (I am aiming for a 3090 , but it might be easier and cheaper for me to get a second-hand 3080 Ti)
Hi there. So glad this video has helped you! Yep, I would definitely agree with that statement. It might definitely may be easier and cheaper to get your hands on a 3080. Most people it seems have their eye on the 3090s for the extra VRAM. However, if your use case does not require that, I think the 3080 would be an excellent way to go for sure.
Hello! First off, I want to thank you for the well researched and presented explanation regarding why GPUs are important to the topic of machine learning and the comparison of the major line of GPUs that you are familiar with. I'm currently in the middle of doing research for building a creator PC for performing GIS (Cartography/Data science/Computer science/programming/Machine and Deep Learning/etc.) with photo editing and Gaming as a side benefit. I'm looking for a GPU that can handle a wide variety of tasks with a focus on visualization and processing of high resolution RGB imagery, high resolution 4+ band multispectral imagery, Hyperspectral imagery and LiDAR (from Unmanned Aerial Vehicles) as well as machine learning and deep learning tasks. I was hoping to play around with the settings in the tool you provided, however I was not able to get power bi working and I'm too lazy to want to spend time trying to figure out how to set up the program properly. I was already looking at purchasing a 4070 ti for my build, but would you say that a 3080 ti, a 4080 series or 4090, if I can find one for a good price, would be a better choice?
Hi there. Thanks so much for your feedback and the question! So since you also want to do photo editing and gaming as a well as AI/ML/DL application I think the RTX family of NVIDIA GPUs is definitely the right way to go. I think the 3 main questions here are: 1) What is your budget? 2) How large are your expected datasets 3) Do you plan on expanding the number of GPUs in the future? Based on your needs and the price points you're considering, my recommendation leans towards the RTX 3090 or 3090 Ti, especially if you can find a compelling deal. These GPUs offer exceptional value around the $1,000 mark for your specific applications. Moreover, it's possible to find them at even lower prices, approximately $800, on platforms like eBay with a bit of patience. Their price-to-performance ratio is among the best in this price range, making them a highly attractive option. A significant advantage of these last-generation NVIDIA RTX GPUs is their support for NVLink, which, in my opinion, offers a notable benefit over the more powerful 4090 for your use case. Starting with a single 3090 or 3090 Ti allows for a robust setup. As your requirements expand and your budget allows, you can further enhance your system by adding another GPU and linking them with NVLink. This approach provides a scalable and highly effective setup for a wide range of tasks.
@@TheDataDaddi Thanks for the response, It's pretty hard to find information about creator PC building, especially if you're not doing video editing, so I really appreciate it. I'll definitely be keeping my eye out for a 3090 or 3090ti with a good deal. The datasets I'm working with right now vary pretty wildly in size depending on the study area and which sensors we're using to collect data (a 10-band multispectral image will be larger than a RGB image and much smaller than a LiDAR point cloud which is in turn smaller than a Hyperspectral profile for example). Anyway, thanks again for the information! I hope you have a great rest of your day!
@@TheDataDaddi I just checked the current retail prices for the 3090 and 4090, and was reminded that the reason I was looking at the 4070ti was because because of that good old sticker shock. It's less than half the price of even a 3090 (The cheapest 3090 at retail is about $2100.00 Canadian, compared to the 4070ti's $899.00 Canadian at retail) here in Canada, where everything is 1.4x more expensive due to the exchange rate. I guess that's the price we pay for all that extra VRAM. I'll still be keeping my eyes open for a deal, but I may end up buying a less expensive graphics card in the meanwhile. I can always upgrade in the future, and when I do the graphics card I buy now can be recycled into part of a home server later.
Of course! Glad you found it helpful. Yeah one of the reasons I focus more on hardware on this channel is because not many other people out there focus on that aspect of machine learning and it is super important. @@dragonmaster1500
Yeah the sticker shock with GPUs its really hard to take sometimes. lol. Check out this one on EBAY. I bought 2 recently. They are a pretty good deal if you are willing to use refurbed equipment (almost all of my equipment is used bought from EBAY or similar). I think it would be about $1200 CAD. www.ebay.com/itm/155867314803?epid=28044609256&hash=item244a6a7a73:g:CnMAAOSwmxNlRStE&amdata=enc%3AAQAIAAAA0GhaLrApc303M8MFhLKXaC1XCZUsnm98lj%2BZeFSruH9oJCFANdXBqU29SOoKs%2BWXGvlPyaIiK5XaubhTqwcQcesmE5FwiNLe0DFWbTLSQ%2FedCQeh%2FYGwxBressF0aNTusfEfh6%2FPh2A%2FG7Uz%2B%2FxEz5CVwvRLABldqDMSoIn%2BM32M3Spzp9f5vb9qFjFE3B7TxotPhewTVPG5AlHyBpu4J07YixG%2FvLiZ2XJDt4nOaaDYjXWNF89%2F8WSbWK8TIBBumuk1germV%2BC3pNIkixMDAGA%3D%7Ctkp%3ABFBMpqHIra5j Yep definitely agree here. That might be the better way to go for now. The best GPU is always the one that fits in your budget. Lol. @@dragonmaster1500
Things have changed, time to step into an Arc A770. Will need a slight adjustment with the tools but now you can get the higher end GPU for a lot less money.
Hi there. Thanks so much for the comment! This is great to hear! What AI/ML/DL applications have you used the ARC A770 for? I would be super curious to know if you have tried it out for anything yet!
Hello, How much additional performance can I expect when running deep learning models if I upgrade from an RTX 3060 Ti with 8 GB to an RTX 4070 Ti Super with 16 GB?
It is hard to say because it is largely use case dependent, but in general you should see a significant improvement in terms of raw throughput (~75%-100% over the RTX 3060 Ti). However, for me the best benefit is the larger VRAM size. This would allow you to load models twice as large which would be extremely helpful for working with LLMs or diffusion based models. Eventually, I hope to be able to test these GPUs direct to give a more definitive answer.
Wow ! How many eons worth of time have you put into this ? Theres not very many people who would go so far out of their way for others like this, this great work of art, thank you. So, for LLMs like Mixtral I can just use P 40 ? Yay
Hi there. Thank you so much for the comment! This video took quite a while for me to put together. I think a couple weeks if I remember correctly. I was doing all the research for myself so I figured that I could save others some time and frustration by sharing my results. I am glad to hear that it is appreciated! Yes! After looking at the specs on hugging face, it looks like you can run inference with even the largest mistral 7B available with a max VRAM requirement of 10.20GB. Unfortunately though, if you wanted to use a non quantized version you may run into memory issues. huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF
Nice video! I'm planning to do a university project for my computer science degree and I need a GPU for this (budget = 600€ - 800€ approx). The project is about an autonomous driving system based on the CARLA simulator. The main parts are: * Ryzen 9 7900X * 32GB RAM 6000Mhz/CL30 * 1TB WD\_BLACK 850X M.2 I've been researching the RTX 3090 since they have 24GB of VRAM and are sold used in my country for 650-750€ but I'm a bit afraid to buy them on the second-hand market because most of them don't have warranties, they are repaired (stickers and gold solders) and I don't really know if they will end up breaking down in a few months. The other option I've considered is a new RTX 4070 TI, with 16GB of VRAM, which I could get for 700-800€. My questions are: * Which GPU do you recommend?(It doesn't necessarily have to be the ones I've listed) * Does it matter so much to have more VRAM instead of power? I mean, 16GB isn't enough? I think my university can give me access to training servers but I don't know the hardware of these. In this case, I would also need the GPU because I have to run CARLA on my computer. I don't rule out using third-party services but I don't know if it will be economically possible due to the fact that I need large amounts of memory for the datasets. Should I use a more normal GPU to do research by training small models and then do the hard work on those servers? Should I pay more and buy something completely local? * Would it be possible to develop the entire project locally and without using AI servers? I am a simple student who is planning to do a master's/PhD next year. Thanks
Hi there! Thank you so much for the kind words and the great question. First of all that sounds like a really interesting project and tough research area! A lot of what GPU you should by depends on how large the models are you will be working with. I am not super familiar with CARLA, but from my research just now. It looks like CARLA itself is a simulation environment, and its primarily loaded into conventional RAM. It looks like 32 GB should be sufficient here. As for the machine learning/deep learning piece, this is what will be loaded into VRAM. So, the choice of GPU depends a lot on what models you will be planning to use. If you can let me know which you plan to use, I can give you a more targeted answer. In general though, my initial though its that I would probably go with the refurbed RTX 3090. It is one of my favorite GPUs from a cost to performance perspective and should be able to handle pretty much whatever you throw at it pretty well. That said I would make sure to buy from a reputable seller. I have never had any issues buying second hand GPUs, but I have always been really pick about where/who I buy them from. If you have the ability to use EBAY, I would recommend that. It gives you at least 30 days of protection to make sure the GPU is working as expected. Some additional thoughts. I would almost never recommend for people to buy a cloud based GPU solution. I have not found any that I really like that much and are at a reasonable price point. I think you would almost always be better of to buy the GPU for the duration of the project then sell it at the end if needed rather than renting a cloud based solution. As far as can you develop the whole thing locally, I am not sure. I don't have enough information to say for sure. However, from my brief research it looks like that should be possible. Anyway, I hope this helps! Please let me know if you have anymore questions, and please keep me updated on your journey!
For anything with image generation, higher vram and cuda cores are pretty much the main priority. I was looking at 3060 with 12gb vram, 3090 with 24gb vram and 4090 with 24gb vram. The 4090 was twice as fast as the 3090 due to the updated architecture and more cuda cores, but considering I couldn't find one second hand, it would have cost me about 4 times the price. For something I'm just testing out, the 3060 and 3090 were the options I was willing to consider. I would have taken a 3060 if it didn't look as though more vram would have been a necessity for a lot of upcoming technology. The 3090 has been really nice to have and is at least 10 times faster, maybe even 20 times faster than running on an M1 MacBook pro with 16gb of unified memory. I just need to look for a feasible cooling option for it for when I'm running large stints.
I completely agree with this comment. I would almost always choose a GPU with higher VRAM over one with better performance these days. The 3090s rn are my favorite. I think they are the sweet spot of price, performance, and VRAM. Unfortunately cooling them properly especially in a case (I am assuming this is your situation) can be challenging. Could try adding more/better case fans and writing a program to adjust their speed based on GPU temp, upgrading to a bigger case, and potentially even selling and swapping for a water cooled variant if possible. The also make aftermarket coolers for these GPUs as well. I have never explored this, but these could be options.
@TheDataDaddi I was looking into cooling options and there aren't many out there. After I wrote the first comment, I checked out the temperature and it was in the ideal temperature for maximum compute even after running it at that level for a few hours. I'm not sure if that's still a potential issue or not because of how often it's at maximum compute, but it did put my mind at ease somewhat. Due to the age of the GPU, there weren't any cooling options in stock in my country. I may consider selling it and buying a new second hand device with cooling built in, or if I'm able to monetize some of the work, I may just buy a 5090 when they release. Fingers crossed.
Thank you for putting this information together. Reading up on some of the cards mentioned here in the context of deep learning cooling issues seemed to come up a couple of times. People were talking about the Tesla m40, for example, noting having enough native cooling to deal with constant loads from deep learning. Have you had any issues from that in your builds? EDIT: I see you have some videos about heat and cooling just after this. I'll take a look.
Hi there. Thanks so much for the positive feedback! Yep, I was just about to suggest that video. I have not personally had any horrible issues with any of my GPUs overheating, but it certainly can be. If you have any specific question after the video feel free to reach out, and I will do my best to help you.
Hey there! Thanks so much for the kind words. I really appreciate it. I am hoping to have my website up and running in the next couple months that will basically have this work book available in real time with up to date pricing. Stay tuned!
Thanks so much for the kind words! So glad this video was helpful. I found it extremely helpful myself in choosing GPUs for my own projects. Mmmm. Depends on cost as you may be able to find some great deals for black Friday, but if you want to get the most performance for your money I would go with the 3060. However, realistically its not that much more the the 4060. Overall though, I am unimpressed with the 4000 series GPUs compared to the 3000 series. If it were me, I would buy the 3060 (unless you really need the larger VRAM) and wait until NVIDIA releases the next series of GPUs then consider those options. On a personal note I use the 3060 in my daily driver PC. I use it as a test bed for models I want to scale up to run on my home lab servers. It works really well for that. I have not have any issues loading and working with fairly decent sized models.
Hi there. Thanks so much for the comment! I am actually in the process right now of creating by own website: thedatadaddi.com. One of the first things I am going to put on the website is a real time GPUs price to performance dashboard. I will certain add in the ability to make this kind of comparison. Please stay tuned for progress here.
Hi there! Thanks for the comment. For LLMs like GPT-3, the resources required to train or even pre-train such models from scratch are beyond the reach of most individuals and many companies. This is due to the immense computational power and data handling capacities needed. For example, models like GPT-3 are were basically trained on most of the internet (Common Crawl, WebText2, etc.), Tons of full books, a snapshot of Wikipedia at that time, and more. Training was preformed with literally thousands of high-end GPUs across various datacenters. Also, ChatGPT and other similar models are proprietary so lack of detailed specifics about their training processes and architecture make it hard to know exactly what it would take to train something like GPT-3 (or other similar model) from scratch. What is accessible for most people and organizations is running inference using these models. "Inference" refers to the process of using a pre-trained model to make predictions or generate text in the case of LLMs. The feasibility of running inference smoothly depends largely on the amount of VRAM available, as larger models require more memory to operate efficiently. For instance, smaller versions of LLMs might run on a single GPU with 12 GB of VRAM, while more extensive models might require a GPU setup with significantly more memory. For those with more robust computing setups, such as advanced home labs or small to medium-sized enterprises, fine-tuning an LLM might be within reach. Fine-tuning involves adjusting a pre-trained model on a new dataset or for a specific task, which typically requires fewer resources than full-scale training from scratch. This process allows users to tailor the model's responses to better fit particular contexts or industry-specific needs without the prohibitive cost of training a new model from the ground up. The following Reddit thread is pretty useful in providing more details here: www.reddit.com/r/MachineLearning/comments/15uzeld/d_estimating_hardware_for_finetuning_llm/ For fine-tuning, a setup with one or more high-end GPUs, such as the NVIDIA A100 or V100 (or RTX 3090 as I advocate for), would generally suffice. This allows for modifications of large LLMs using varied sizes of data, making it a viable option for enhancing model performance on specialized tasks. In summary, while training large-scale LLMs from scratch is out of reach for most, leveraging these models through inference or fine-tuning them for specific applications is quite feasible with the right hardware setup. This opens up opportunities for a wide range of applications, from personalized AI assistants to sophisticated data analysis tools, even for smaller organizations or dedicated individuals with the appropriate resources.
Hi there. Thanks so much for the positive feedback! I have not include AMD GPUs here because AMD GPUs to my knowledge and experience are not well suited to machine learning. The AMD drivers are really buggy and make using them for machine learning a huge pain. NVIDIA unfortunately is the only manufacture I trust at this point for GPUs for AI/ML/DL. That said as I have time in the next few months I am going to try to experiment with and older AMD GPU I recently bought in depth to see what current limitations of the AMD drivers are. I will report my findings a future video. Also, if the results are good. I will make another video just like this but include AMD GPUs.
Wow! This is actually incredible. Really impressive specs and the price is great! This is my favorite by far of the ones you have suggested. Definitely gonna have to make a video here. Really appreciate the suggestions!
@@TheDataDaddi I look forward to it. I'm also considering buying a Orin NX 16GB from SeeedStudio... so would LOVE a comparisson of these two :D The Fogwise is half the price or less but would be cool to see them head to head. Thanks!
@@mickeymouseman Gotcha. Definitely, think this would be cool. I will see what I can do here! What is your use case for these if you don't mind me asking?
Thanks for the guidance, man. Can you help me on this, 4060ti 16 gb is same at price for me as 4070 Super. Which is the better card for DL, I want to get into AI so I wanted a GPU to get me going
Hi there. So glad the video could help! Congrats on the start of your journey! If it were me, I would go with the 4060. I almost always go for the GPUs with more VRAM in the context of AI/ML/DL. I think this should suit you better as you grow to larger and larger models. Good luck! Feel free to reach out if you have any questions along the way.
oh thanks for replying, well in the meantime I bought a much cheaper 12 gb 3060 thats like almost 1/4 the price of a 4060ti. And in the future, I am planning to buy another 12 gb 4070super to make the total vram 24gb, how good is this strategy ?@@TheDataDaddi
Hi there! Thanks for your comment! I completely agree that tensor core FP16 and particularly BF16 performance are crucial for LLMs and many other deep learning tasks. However, I don’t think we can entirely dismiss the relevance of standard CUDA cores or shader cores in AI workloads. While tensor cores are optimized for FP16 and BF16 precision, CUDA and shader cores still play a significant role, especially for FP32 tasks, which remain the default in many deep learning models-particularly in computer vision and audio processing. Moreover, it's important to note that CUDA cores can handle models trained in FP16 or quantized formats. The key difference is in efficiency: tensor cores will perform these tasks faster, but GPUs without tensor cores can still run the models-often at a fraction of the cost. This leads to a trade-off between performance and price. If top-tier performance is the goal, then GPUs with a high number of tensor cores are the way to go. However, for many users, especially those on a budget or with less performance-critical needs, GPUs with fewer tensor cores (or none) can still offer viable solutions. It's all about balancing your requirements with what’s available.
@@TheDataDaddi Yes you are right, CUDA=shader core fp32 is very relevant especially in training because it's easier to use for lazy quick prototyping/dev work or sanity checking your low-precision result, or the rare (training) case where you actually need the range and precision. However, for tensor core GPUs shader fp16 performance has no effect. There shader fp16 is only for stuff like activation functions and perf could be tens of times lower without having any effect. The point was just that for tensor core GPUs you would want to only list tensor core fp16 flops. Grabbing a dirt cheap P100 16GB space heater for fp16 memory-demanging workloads could possibly make sense, here shader fp16 would matter. (Even int8 maybe but I dont't know if common frameworks offer dp4a implementations) Adding int8 and fp8, even fp6 and fp4, to the comparison for inference purposes would be interesting too since LLMs are now often available in those precisions and fp8 effectively doubles memory size and bandwidth too compared to fp16.
Hey there! Thanks so much for the kind words. I am so glad to know that this was useful for you. I am hoping to have my website up and running in the next couple months that will basically have this work book available in real time with up to date pricing. Stay tuned!
Juat picked up a 3060 12gb for under $200 glad to see it stacks up with a p40 just unfortunate that the p40 is about the same price with more ram I just couldn't have those loud small fans on my PC
Hi there. Thanks for the comment! Yeah the RTX 3060 is a good choice for sure, and it has 2nd gen tensor cores as an added bonus. Definitely agrees that having noisy fans on a small PC is not the best way to go.
Would it be possible to link a 4070 ti super with another card? I'm just starting out but, I was thinking 16gb of vram could start me out and I could add a second card in an enclosure to connect to with my pc running dual gpus or my nas in the future. I can't afford much beyond one gpu right now while building out other systems 😕
Hi there. Thanks so much for the question! Absolutely, this is really common. GPUs are really expensive so in many cases people (myself included) will buy GPUs incrementally as they have the money.
@@TheDataDaddi Cool, I appreciate your reply! I feel like I'm witnessing the future watching videos like these and I'm both excited and overwhelmed! Is it fine if a card doesn't have nvlink, or if they're cards from different generations? I'm thinking a 4070 ti super right now for my pc and a 3090 as the "vram/AI" server card (or a 4090 if the price drops enough once the 5090 comes out). Thanks again for your reply and helpful videos!
@@philiphimmelstein9510 Yep! This is one reason I love this area so much. Things change so quickly. It is always exciting. Yep, that should be fine! NVLink is definitely not a requirement. In fact, I have not actually tested the performance gains (or lack there of) you actually get from it. I need to do that. lol. I have 3 different GPUs in one machine and don't have any issues. Of course! Happy to help.
Amazing video! I wonder if you could advise me on my next setup... I'm a deep learning engineer who typically works in medical computer vision, however, i'm looking to buold something for my home office. My budget is 3-4k for the entire setup. Datasets I use tend to be quite large, and 20gb vram is probably a minimum in terms of model size. I've been looking at a 4090 prebuilt because for that budget, I could get a nice spec with the option to do some occasional gaming. In an ideal world, i'd want more vram! 48gb would be amazing. I wondered about going for a 2x 3090 using nvlink. What do you think about something like this? Thank you in advance!
Hi there! Thanks so much for reaching out. So for your budget, I highly recommend going the 2x 3090 route. This is the best configuration I have found so far that balances price and performance for what I would consider mid to upper end setups in terms of compute resources. I actually just bought 2x 3090s and am going to be making some video exploring their performance with and without NVlink. This will also be good in your case because you are interested in occasional gaming. Unfortunately, for 48GB VRAM at this stage you are going to have to go the Tesla series GPUs and those break the bank even used. For right now, unless you absolutely have to have 40+ GB of VRAM I think 2x 3090s with NVlink is the way to go for mid to upper range computing projects.
@@TheDataDaddi Thanks so much for your response! I'm having a tough time clarifying whether 2x3090's with NVlink allow for larger models to be loaded during training/testing...? I understand data parallelisation and how that is beneficial, but can I actually distribute the model in a way that allows me to experiment with really large models with a 2x3090 setup? I use both tensorflow and pytorch and have seen forum posts going either way with regards to this setup. I would be really interested to hear your thoughts.
So, utilizing a dual RTX 3090 setup with NVLink indeed expands your capacity to experiment with larger models during training and testing phases. Typically, in such configurations, the entire model is loaded onto each GPU. The remaining memory is then allocated for data batches during training, testing, or inference processes. With each RTX 3090 leveraging about 24GB of VRAM, you should find this setup sufficiently robust for handling fairly large models (pretty much most things up to the mid to large open source LLMs). That said though, it really depends on the size of the models you will be loading. I would recommend loading a typical model you might work with onto the CPU checking the memory utilization before and after. This should give you a good idea of how large the model actually is. Then you can see a) will the model in its current state fit into memory on both GPUs and b) what kind of batch sizes you might be able to work with. Smaller batch sizes, while manageable, could become a limiting factor in your workflow, potentially impacting training efficiency and model performance evaluation. From here you can make a decision as to whether or not you really need more VRAM. An alternative strategy (especially relevant with NVLink because you have direct memory access between GPUs) involves distributing different parts of a model across the two GPUs, allowing for batch processing to occur sequentially across both units. This method, albeit more complex, can effectively mitigate VRAM limitations by leveraging the combined memory more efficiently. However, it's worth noting that this approach requires careful implementation and might not be suitable for all models or scenarios. It is also important to note that NVLink does not create a single unified memory space accessible by all GPUs as in the traditional sense. Put another way 24GB + 24GB of VRAM does not equal 48GB of VRAM in this case as it would on a single GPU. Frameworks like TensorFlow and PyTorch do support multi-GPU setups and offer varying degrees of support for model parallelism and data parallelism. For TensorFlow, strategies like tf.distribute.MirroredStrategy can be employed for data parallelism, which synchronizes training across the GPUs for each step. PyTorch users can leverage torch.nn.DataParallel or torch.nn.parallel.DistributedDataParallel for similar purposes. For model parallelism, where the model is split across multiple GPUs, PyTorch provides more explicit support through manual implementation, allowing you to define how different parts of the model reside on separate GPUs. With full transparency, I have not had to use model parallelism thus far so I cannot comment more specifically on how to implement it. It will also vary from use case to use case. I think this would be a great video topic though and certainly something that would be great to know as models continue to get larger and larger. I will try to make a video on this specifically as soon as I get a chance. Anyway, I apologize for the long winded answer, but I hope that this response is useful to you! @@dirtdabest
Ah this is a brilliant response! I am going to go for the dual 3090 setup! The potential freedom of having a larger model is what's most important to me on reflection. Are there any other special considerations for a build like this? Thank you so much for your response, it's really cleared things up! I will be watching out for the video when it comes!
Glad to hear it! I think that setup will suit you well. They also hold their value well so it does not end up being what you need you can easily get most of your money back out. Only thing I would say about the build is if you are going the server route make sure that you choose one that will fit in whichever RTX 3090 you choose. I might consider an external rig with pcie extenders. I am doing this myself actually so I will have a video on this soon. You might consider that as well. External rigs are also much easier to keep cool. So glad I could help and keep me updated throughout your journey! @@dirtdabest
Is there a big difference in performance and speed in AI tasks like stable diffusion etc between RTX 4080 super and RTX 4090?Which one should i buy as I seldom play games or should i wait for 5090 at the end of the year?I am not a video editor or hold any jobs related to designing or editing,just a casual home user.
Hi there! Thanks so much for the question. I would say at this current moment. For stable diffusion related tasks, I would go with the 4080. 16GB VRAM should be enough to comfortably handle pretty much all stable diffusion tasks are this point (to my knowledge) and its performance is 60% of what you get with the 4090 for less than half the price. All that said, If you are not pressed for time, I would probably wait to see what happens to the market when the new 5000 series GPUs come on market. It could bring the prices down for the 3000 and 4000 series GPUs as people dump their older GPUs in favor of the latest and greatest. This approach is always a gamble though so if you prefer a safe bet I would look for good deals on a 4080 and not worry too much about what will happen down the road.
Yeah I did not cover it unfortunately. I guess I missed it somehow. Anyway, I would that it is a great GPU. However, the best price I can find even for a used one is about $489.99 atm. This is still about double what you can get the P100 or P40 for at this point and the performance gain is not that much greater for the money (11.7 TFLOPS vs 13.8 TFLOPS for single precision) (4.6 TFLOPS vs 6.9 TFLOPS for double precision). So overall I would still recommend the p100 if you have a need for higher double precision performance or the p40 if you care more about single precision and more VRAM. With that said, I think this is a fine GPU and one of the best you can get for the $500 price range. It also has a newer architecture so I will be relevant for longer. So at the end of the day it really depends on your budget, project needs, and how long you need it to last. Hope this helped! Please let me know how it works for you if you decide to get it!
Hi there. Thanks for the question. What is your price range for GPUs? The best price/performance overall is going to be the p40 and/or p100. But if you care more about performance and willing to spend a bit more I could recommend something more performant. Hope this helps!
Hi there. Thanks so much for the kind words and the comment! It really depends on the use case, but overall for most people I would say that the P40 is actually the better choice. The only GPUs I currently have that support NVlink are my RTX 3090s. Yes, I have them connected via NVLink.
I like to throw in a perspective, currently im using an GTX 960 2gb vram, 7 gb ddr 2 ram @ 600mhz, I also have shared video ram or what it was called which ups my "vram" up to around 5gb (can be checked in task manager for example), AMD tripple core @ 2.1ghz, im fine tuning the smallest gpt2 model via CUDA at around 1 iteration per ~30 seconds and it uses around 3.5gb of my "vram".
Hi there. Thank you so much for the comment. Love when people share specifics like this. Incredibly helpful for the community to understand the performance of real systems. This actually is very impressive. Also, shows that you don't need the latest and greatest hardware for every use case.
Hi there. Thanks so much for you question. I normally buy most of my tech gear from EBAY. I try to find refurbished used gear at a good price from a reliable seller. Hasn't failed me yet. I highly recommend this approach because new hardware in this areas is incredible expensive.
wondering if we can pair a 4060 TI OC 16gb with a data center card for more ai performance will ai be able to use both cards at once? wondering cause i already have the 4060 atm
I made a spreadsheet too, but yours is thorough! I'd also come to similar conclusions as you: that the P40 / P100 were cheap ways to get medium size LLM models into a GPU with decent tokens/second. Your spreadsheet would have saved me time if I'd known about it! At least there's some independent confirmation of your conclusions. There's a lot of detail to add, like how fast/slow models are on certain GPUs ... perhaps another vid on that to save me the effort? :P
Hi there. Thanks so much for the comment! I am glad to here you can confirm! Especially as hardware prices keep increasing I think these are actually becoming even more relevant for those who are budget conscious. Funny you mention this. I am actually working right now on a benchmarking suite to enable reliable comparison between GPUs for different models. There is not a reliable open source benchmarking solution for GPUs so I am trying to create one (or make steps toward it at least). As soon as I get something decent, I will make a video series on it and start using it to benchmark GPUs in a real way with respect to individual models.
Im torn in choosing for Gpu for ai use first (koboldccp + sillytavern) and gaming second. My choices were a 3060 12gb at first then the 4060 ti 16 stood out but then the 4070 ti super got recommend to me. I intend to use the card for at least 3-5 years. The only thing limiting me is my small budget. Like i could buy the 3060 now and the 4060 ti after few weeks. While ill wait and watch out for deals on the 4070...
Hi there. Thanks for the comment! I am not super familiar with koboldccp or sillytavern so please take what I say with a several grains of salt, but from my brief research they need an AI model integrated in some way. I am assuming you want to host this locally. For this I would go with the GPU with the largest VRAM, so the 4060 16GB TI is the clear choice in my book.
@@TheDataDaddi yeah. Basically locally hosting an AI model to my pc. I'm not really into machine learning as of yet. Currently I have a 2060 on my pc and 6 GB isn't really enough. Another question is AMD not a good alternative?
@@rukitorin1998 It can be, but AMD is generally not as easy to use for machine learning. However, it may be applicable for your use case. AMD definitely seems to offer better price to performance, but there are still a lot of bugs from what I understand. I also cannot recommend it strongly AMD because I have not personally dabbled there. What I am telling you now is just based on feedback I have received from viewers.
@@TheDataDaddi thanks for responding, sorry for the late reply. :D i will be buying the 4060 ti 16gb soon when i find a cheaper price point. Living in the Philippines prices are somewhat higher. 30 kph peso(519.08$) to 32k (553.68$) is the prices im looking at right now.
i think you missed out on parts - actual performance per model, and if one can use fp32, fp16 or int8 or tensor. P40 is terrible options for any ai workload due to amount of time one would have to wait... and its power requirement.
Hi there. Thanks so much for your comment! I would agree that for anything below fp32 operations these GPUs would be quite slow. However, the GPU is less than $200 dollars for 24GB of VRAM. So, if you are wanting to experiment with larger models cheaply, I think these GPUs still have good value.
Yup, just got a couple og P40s for a ML350P... after investigating the NVIDIA site.. Slow, yes, but for cheap and something that can run on MS Win2012 Enterprise, it's the ticket. (old mining ETH machine) It will run in the garage without air-conditioning.... Slow but study.
Absolutely. Please feel free to contact me any way you like. All of my contact information can be found in my TH-cam bio. I will also paste it below for convenience. 🐦 X (Formerly Twitter): @TheDataDaddi 📧 Email: skingutube22@gmail.com 💬 Discord: discord.gg/RyRHEn3yMx
good video. good thing you did not pick maxwell. plus side. lots of vram, minus side, everything else. its really slow, xformers dont work, (have to compile it your self to get it to work, have to run most things in fp16 mode. no adamx support if you want to do training. i bought the card 5 years ago but ditched it the moment the p40 dropped to 200 dollars. sadly the volta cards are still 700+ so im still stuck with the p40 but it still does everything i need it to do.
Hi there. Thanks so much for the comment! Yeah, the Maxwell GPUs could still be helpful to some that are just getting in to machine learning and deep learning and are only looking to run smaller models. Also, as you pointed out it does have its own set of issues. Overall through, would recommend in most cases to just start with a p40 or p100. Yep, I think that is the situation for most people. Myself included. Volta GPUs are the logical next step, but the price is still a bit to high to use them in large quantities. Unfortunately, I am not sure if this will change in the immediate future either. The Volta series GPUs and higher have tensor cores and good mixed precision performance so they are still in demand for a lot of businesses. I do not see this trend changing in the near future.
Hi there. Thanks so much for the question! This is tough. I think you really can't go wrong either way. Both are solid choices for ML/DL GPUs. However, for about the same price (I checked just now and they seem to be about the same price on EBAY in my area) I think I would go with the RTX 3080. You get much better performance and the difference in VRAM from 12 to 16 GB is not significant enough to justify the performance difference.
Is there a current laptop with more than 8 GB vram that is recommended and does not cost 3000? Is it better to wait for new processors or graphics cards that incorporate new chips specially built for AI models?
Off the top of my head, I know that the 4090 mobile gpu has 16gb of VRAM. I do not know what laptop have this standard. This link may help you. medium.com/@ibrahimcreative172/top-10-laptops-for-deep-learning-machine-learning-and-data-science-in-2023-f8a6ba861c4f I think it depends. For example, if you could wait until 2025 when the RTX 5000 series comes out, that might be worth it as they will hopefully fix some of the shortcomings of the 4000 series. However, they will be super expensive when they first come out. So I normal prefer to go for older GPUs. I feel that they have a much better value. Long winded way of saying I would probably not wait and try to find value in what exists currently.
@@TheDataDaddi Yes, I think I'm going to wait for new laptops to come out with better video cards and maybe better processors with NPU which seems to be coming strong
Now I have a gaming laptop even though I don't play, with an rtx3050 ti but only with 4 vram. When I bought it in 2021, I didn't know that I was going to need more vram for Stable Diffusion. We'll see how things develop. Thank you so much
I did some more research last night in the area, and I think this might be a good option for the mobile/laptop route. It seems NPU technology stands to make AI/ML/DL much more viable on laptops. @@RSV9
Yeah it would definitely difficult to do much with stable diffusion on 4GB of VRAM. It might be worth upgrading to a GPU with 8 or 12 GB of VRAM while you wait for a better new laptop. You can find some pretty good deals on EBAY if you are patient.@@RSV9
Personally I love the P40 at the moment however BE AWARE that the Pascal cards do NOT offer NVLink, for that you have to go to the Ampere cards. That said you can operate the P40 in x8 PCIe mode without significant loss in performance. Not ideal but if you have a consumer motherboard and are trying to get one more GPU in there this one might not be a bad choice for a Gen 4 board where the increased PCIe buss speeds more than make up for the lack of full x16 access for this Gen 3 card.
Hi there! Thanks so much for the comment. Yeah unfortunately they do not seem to. Although interestingly they have a cut out on one side that looked like it was made for something NVLink related. However, I have never been able to find anything that would link them together. Yeah that is true. In most cases for consumer mobos you will be working with x8 pcie not the full x16 unfortunately. Like you said though for most it shouldn't make that much of a difference.
@@TheDataDaddi and if you are looking to water cool there will be challenges for the P40 s well. The PCB cutout is the same as the 1080 but with the rear plug on the P40 you will likely have to do some modding on a 1080 water block to make it work. The 40x40x28 15000rpm fans I have on there scream when at 100% so be prepared to write some custom code to control the fan speeds. If you are interested I'll push mine to GitHub and send you the link.
It seems the Chinese suppliers has noticed the higher buy rate and have inflated the prices accordingly. Now P40 and P100's are at $300, so it makes more sense just to buy 4060 ti 16GB as new.
Yep, I have noticed this trend as well. Demand is so high that even older hardware is now selling a premium. I agree with your assessment. The 4060 TI is probably a better choice now.
Great video^^, getting tired of seeing those gpu comparison video where all they think are just about gaming. it pops up right in time when i'm thinking to build a new pc. I was thinking about buying the 4060 ti with its 16gigs vram to help me with my thesis research that I assume the 16gigs would be really helpful for the ML/DL(used to have 1660 with 6gigs vram and its horrendous XD) but also pretty good enough for my daily use such as streaming and editing. totally in a tight budget that i needed to squeeze a bit more to get that 470$(the price in my country rn) card or should i just wait for the rtx 50 series to come out later hoping the older gen price drop?
Hey there. Thanks so much for the kind word and the question! I think the 4060 TI is a solid choice in general for a budget constrained build for a masters thesis project. Of course it depends on your exact use case, but baring working with larger LLMs I think this should be a great choice. As far as waiting for the rtx 5000 series GPUs to come out, I would not really hold my breath for a huge price drop. I think even once the rtx 5000 do come out it will take a while for the prices of older GPUs to be substantial affected. If it was me, I wouldn't wait. I would just go ahead and buy. Best of luck with you project! Hope this helps!
So would this mean I can't use a 7900XT to make AI meme pictures? I've actually been interested in the whole AI thing, even though I'm not a smart dude on tech. (I find it cool just because I can use my computer for something other than just gaming/streaming/video editing, but I'm going to try a resist a little against our developing AI overlords lol) I know the tech is still developing, but I thought it would be cool to use AI to create a Vtuber model to stream with. (even if it came out bad, I thought it would be a fun little experiment to do for some views and laughs) However one of my hardest parts to upgrade, in my mind, was a GPU. I know AMD is a step or two behind Nvidia (My last card and current card is a 1070) but when it comes to price, it's hard to beat. I just didn't know if something like a 7900XT or -XTX would at least make up for it vs a 4070 TI Super in terms of AI generation. (I still have no idea what app to use to even make use of my GPU to even make stuff with AI) Alright, enough rambling with the thoughts in my brain, I'll keep watching 👌
Hi there! Thanks so much for the comment. So, my take here is the NVIDIA GPUs are going to be much easier to work with at this stage. I have head from some of my viewers that AMD GPUs can and do work. It is just a lot more of a pain to work around bugs and the learning curve is steeper. NVIDIA is more or less plug and play when it comes to AI/ML/DL, but that is also why you pay a premium. I guess what I would say is if you want the easier route or don't have time to do much trouble shooting NVIDIA might be a better way to go. However, from the sounds of it you are more partial to AMD GPUs and have many other workloads besides just AI. In your case, it may be a better idea to go with AMD GPUs because you will get better price for performance for all of your other workloads then deal with the pain of setting up you AMD GPUs for your specific AI use case. The 7900XT is definitely a powerful card and can handle AI tasks, though you might need to use specific software or frameworks that support AMD GPUs, like ROCm. Creating a VTuber model sounds like an interesting project! I would recommend maybe starting with programs like DeepFaceLab for deepfake-style video or some stable diffusion flavors to generate images as a starting point. For generation, tools like Blender for 3D modeling could be helpful and for real-time animation you might could use VMagicMirror or VSeeFace which can utilize your GPU to bring your VTuber model to life. Hope this helps!
I'd love to know your opinion of the modified Nvidia p102-100's with 10 gb of vram being sold for about $50-$60 on ebay since they have no display outputs. They are basically 1080 ti's with a bit of performance nerfing. they have no display outputs, but seem like they'd be ideal to plop into a system with an existing AMD Gpu just to give Cuda acceleration. or perhaps multiple cards?
Hi there. Thanks for your question! I would definitely go with the RTX 3060 in this case. The extra 4GB of VRAM will make a ton of difference. Best of luck!
What’s your thoughts on Apple silicon, for example, would 32GB of unified memory on an M2 Max, be an equivalent of 24GB in a dGPU (and not considering a CUDA advantage)?
Hi there. Thanks so much for you question. I actually had some ask a similar question yesterday. Lets dive in. Apple's silicon, particularly the M2 Max, represents a significant shift in computing architecture. The concept of unified memory in Apple's design is quite innovative. Unified memory essentially allows the CPU and GPU to share the same memory pool, which can lead to more efficient use of resources. Regarding your question about the equivalence of 32GB of unified memory to 24GB in a discrete GPU setup, it's not a straightforward comparison. In traditional setups, the CPU and GPU have separate memory pools, and data needs to be transferred between them, which can create a bottleneck. With Apple's unified memory, this bottleneck is reduced, as both the CPU and GPU can access the same memory pool directly. This can make the system more efficient, potentially allowing 32GB of unified memory to perform comparably or even outperform a 24GB discrete GPU setup in certain scenarios. However, this doesn't mean it's superior in all aspects. For example, tasks heavily reliant on GPU performance, especially those optimized for CUDA (a parallel computing platform and API model created by Nvidia), might still perform better on a traditional discrete GPU setup. This is because CUDA has been around for a longer time and is extensively optimized for specific professional and scientific applications. So to sum everything up, while 32GB of unified memory on an M2 Max might offer comparable performance to a 24GB dGPU in many use cases, the actual performance can vary depending on the specific applications and workloads. @@rayf3244
@@rayf3244 Machine Learning rests on linear algebra. Massively parallel matrix algebra is what video cards do. (until we see neuromorphic chips in wide production, cheap and actually successful. Could just be a pipe dream (rnn's). The RTX 3090 with NVlink is the best bang for the buck and the old Tesla cards are the cheapest entry. 2-3090's give you 48GB and let you play up to llama 2 78b. No I can't afford two 3090's either let alone a couple of last generation a6000s. The Tesla P100's have 16GB and NVlink for a total or 32GB for less dollars. When PCI-e 5.0 boards come out NVlink won't be necessary (We are told) but until then we are pretty limited as hobbyists. NVidia is really the only game in town and they are focused on the Enterprise not us. AMD and Intel haven't invested in AI at this level and Apple isn't even in the game.
Hi, I am new to the idea of learning about ML/AI. I appreciate your video and am contemplating piecing together a budget friendly system to start learning with. In the past, I was able to purchase some used crypto mining rigs from a person that was getting out of crypto mining. I parted most of the systems out and made a profit, but I kept a couple crazy 8 GPU motherboards. My first question is is there any restrictions that would prevent one from using multiple GPUs (more than 2 which seems more common), and my second question is there a certain GPU that would make sense from a budget standpoint where having multiple of them would be more beneficial than one or two standard GPUs. I would think a system running 8 x Tesla M40 having a total amount of 96GB of VRAM would be better than a system running 1 or even 2 3060s w/ 12GB or 24GB of VRAM. I look forward to hearing your response if you find the time to respond, I appreciate your time in advance!
Hi there. Thanks so much for the great question. So glad that you have found this video helpful! QUESTION 1 There are a couple considerations here: 1. Unless you shard or split the model itself you are going to be limited to whatever the smallest memory size is in your available GPUs to actually load and train models. So for example if you have 4 Tesla P40 GPUs that would be 24x4 GB of VRAM total, but in many cases with parallelism people end up using data parallelism by default and the model must be loaded onto each GPU. This speeds up training time because you are able to process more batches in parallel across all GPUs, but it does not allow you to load larger models. There is also many ways to do model parallelism in which you break parts of you model across different GPUs and process the data in a pipeline like fashion. I have never actually had to do this so I cannot really get into particulars here, but it is definitely possible. From what I understand though this is much more involved because it require you to logical partition your model in a way that makes sense and assign layers or segments of it to the various available GPUs. All of this is a long winded way of saying if your model is not to parallelize, you may be limited in term of the size model you are able to load and use even if you have many GPUs. 2. Every mother board has a maximum number of PCIE lanes that it can support so in some cases even if a mother board has slots for 8 GPUs it may not support them all at the full x16 lanes require for each GPU to utilize its full bandwidth. This is okay in many cases because the GPUs will still work, but it will limit performance and may cause a bottleneck if you have a ton of data IO. 3. GPU form factor. This is a lot more important than I originally thought, but each GPU manufacturer may have slightly different dimensions and specs. For example, the RTX 3090 founders edition is actually physically quite different than the RTX 3090 Zotac I bought. The RTX 3090 founders edition might be able to fit in one of my servers, but the RTX 3090 Zotac was much larger and would not. While this is not a major consideration, you definitely do need to confirm that the GPU will fix in or on whatever chasis/mobo you are thinking about working with. QUESTION 2 This question is also a bit murky to answer because it depends on what you will eventually be doing and the size of models you will be working with. That said. I will recommend what I would do. If you want to fill up one of you 8 GPU mobos, I would suggest 8 Tesla P40 GPUs. They are ~ $200 a piece. They are great in terms of price to performance. Having this many GPUs also allows you to use an immense amount of data parallelism to train and test faster for models that will fit inside 24GB of VRAM. It also gives you the ability to split very large models across all 8 of you GPUs. This theoretically would be enough to earnestly start playing around with some the largest open source LLM models currently avaliable. In addition, you can section of subsets of you 8 GPUs for different models or training/testing different version of your models all at once. For me personal, I find the last point invaluable in my research. Finally, these GPUs are cheap enough to add them slowly over time. You can buy them one at a time as you have the funds rather than having to shell out thousands for a single GPU. If you value performance over the flexibility mentioned above and have a bit more cash available. I would probably go with 2 RTX 3090s with NVLink. I have not been able to test this setup, but I think it should be excellent in terms of performance. You could also use the same idea here by adding RTX 3090s over time as you have the funds. In summary, if you want a scalable mid - high end range rig where performance is you main concern. I would go with the dual RTX 3090s with NVLink. If you value flexibility and have less cash to invest upfront I think the P40 route is a great way to go. Very excited to hear that you are starting your journey, and I am glad I can help you along the way. I hope this helps, and if you have any other questions along the way please feel free to reach out!
Hi there. All of my pricing is directly pulled from EBAY. I normally try to find the lowest cost reputable seller. GPU prices have been sky rocketing recently so that deal probably no longer exists unfortunately.
Hi there. Thanks so much for the comment! It really depends on your use case. If you plan on trying to expand to larger models like some of the diffusion related models or want to use larger batch sizes then 12 GB VRAM would be helpful. However, for most conventional deep learning models 8GB should be fine and you will get faster training and inference speeds. Personally, though I would probably go with the 3060 with 12GB. I normally always default to the GPU with the higher VRAM even if the performance is slightly worse. I would rather be able to load the model and just have training and/or inference be slower than get out of memory errors and not be able to load a model I want work with.
thank you so much for the info but I am so conflicted, the information in this video is great, but the pacing gives me severe anxiety. nothing against you it's my own issue, so this comment is not a complaint is to help others who possess an overactive brain and 0 attention span. watch the video at 1.75 speed while reading through the additional resources. This will cut the play time down to 23.1 minutes which I admit, is still a very long time based on the amount of information, but it is good information so its worth it! this should keep your brain from losing interest during the loooong pauses. again , great content
Hey there! Thanks so much for the comment. I really appreciate the honest feedback. I have been trying to get better about being more direct and to the point in my videos. I know that some of them are unnecessarily long. My assumption is that most people would likely watch on faster speeds or hop around rather than listen to me drone on and on. I do agree though I need to do a better job of keeping the audience engaged and make shorted videos with less long pauses. I appreciate the candid feedback, and I will work to do better on this in the future. Glad you at least thought the content itself was good though!
Hey there. So I was not able to find anything official from Nvidia, but I did find this online: www.techpowerup.com/gpu-specs/tesla-t40-24-gb.c3942 Might not be as reliable as actual documentation from Nvidia though so I would take it with a grain of salt. From what I can tell though its a solid card, but a bit expensive for the performance. I would compare it to the P40 before going that route to see if the price justifies the performance gains for your use case.
@@TheDataDaddi I didn't buy the T40 due to the lack of documentation. Can't even find the driver for it online. I ended up buying P100s as the P40 can't do gpu-only inference which is what I need.
Hi there. Thanks so much for the question! So, in terms of theoretical performance the 4070 ti super is about twice as performant. Since you are just interested in inferencing, I would say that in this case having the better performance of the 4070 will benefit you more.
@@TheDataDaddi thanks a lot for you answer. I think its worth the better performance since i dont get any performance benefits of having two rtx 4060 ti besides more memory. In case i do some training ate some portion i can still rent some hardware. I my case i want to have a local llm for work where i can not use anything connected to the internet. So i would use it primarily for inference and if i need to optimize the model this should only occur one ore twice (hopefully)
@@pixelslayertv7140 Sure! Yeah unfortunately since the memory pools for both 4060s would be separate you really don't get much benefit even having more total VRAM. You may be able to get away with fine tuning some of the smaller open source LLMs especially if you look into quantization. However, my gut tells me you will have a hard time doing much beyond that with 16GB VRAM. Like you said though you could always rent for the few times you do need access to more VRAM.
I'm studying artifical intelligence and Machine learning engineering in india. Is intel i5 14600k and rtx 4060ti 16/8gb good, if not so suggest some😊. My brother is also studying Machine learning and deep learning. Can you suggest some laptops for him. I need a pc and he wants a laptop.
Got it. The I5 is a solid choice and so is the 4060 ti 16GB. I would not get a GPU these days for AI/ML/DL less than 12GB. That would be my minimum. Also for AI/ML/DL I would really not recommend a laptop. I think have a server or workstation to ssh or remote access via a laptop is fine. However, I do realize as a student a laptop makes a lot more sense. I will try to find a good solution for your brother. My first though its one of the Mac M3 Pros. These are really expensive though and I really can't speak on how good they are from a machine learning perspective. I have an M1, and it has actually been pretty good. Just has definitely been buggy from time to time for machine learning. I actually just found this article. Tell your brother to check it out. It has 10 options that vary in price. He should be able to find something here that fits his budget. If it were me, I would probably start by checking out the Mac M3 Pro first though. As far as laptops go, I have always been impressed with Macs minus the price tag. medium.com/@ibrahimcreative172/top-10-laptops-for-deep-learning-machine-learning-and-data-science-in-2023-f8a6ba861c4f Hope this helps! @@maamla_boy5208
All things considered 2 X RTX 4060 TI 16GB is the best investment if you run Ai load which can utilize both CPU's. 8704 modern CUDA and 32GB are better specs than a 4070Ti 16GB. Memory is key here. The M40 are way to expensive even the used ones price is insanely high.
Hey there. Thanks so much for the comment. I think the 4060 TI is a great choice. However, all of these choices at the end of the day depend on current GPUs prices. In general though, I would definitely agree with you assessment!
It's funny how regularly I hear "oh don't bother with the Pascal cards, just buy an Ada Lovelace, just sink $1,200 on just the GPU, don't waste your time" every time I point out the two Pascal Tesla cards are the best bang for your buck right now, assuming you can cool them. Minor errata btw, the Maxwell Tesla cards don't actually supply their complete VRAM capacity and CUDA count in 1 addressable device, but rather 2-4 smaller GPUs with 4-8gb of VRAM each, and most DL/SD/LLM applications don't scale linearly with CUDA cores spread across devices. Worse, many apps don't support anything older than Pascal anyways, meaning the effective usefulness gap will be even larger than your scoring system lets on.
Hey there! Thanks so much for the great information. Yeah it is amazing to me how often people overlook the Pascals. I understand that they are old and definitely far from the state of the art, but they are still a great value for tons of compute. You can't argue with 16GB or 24GB of VRAM for ~ $200 Honestly, for most people starting out, they are more than enough for most use cases anyway. I use both the P40 and P100 almost everyday in my PhD research, and they are great for the money. Ah okay. I was reading something about that a while back. That definitely makes sense. Thanks so much for sharing. Great info for anyone considering Maxwell series GPUs.
@@TheDataDaddi Appreciate your content by the way! Been binge watching your channel, there's a lot here for me to learn from, been a huge help in figuring out how to scale up from my little workstation rig in the future. (T7910, 2x E5-2699 v4s, 128gb of ECC DDR4, P40 w/ cooling duct in the overhead slot, a v5900 as a display adapter, 4x1tb NVMe drives in a Hyper M.2 carrier)
So glad you have been taking a lot from it. I primarily made this channel because of how hard it was for me to learn this stuff. There are not many channels that focus on how to cost effectively create home labs or work stations specifically for machine learning. Most that do focus on things that are out of the price range of most normal individuals. Really glad to hear that it has been helping you. Great setup btw the way! Wonderful place to start. Keep me updated on how the scaling goes! Be curious to see what you do. @@KiraSlith
@@TheDataDaddi Current plan is to move the P40 down to the main chamber, add a second P40 below it, and cool the P40s by pulling air through the back via a high CFM 120mm fan, rather than pushing through the front. I'd have to move the v5900 up to the top CPU1 slots though. I know you couldn't attach GPUs to CPU1 with the old T7800, but I've never tried with the T7910. The NVMe drives will be just fine though, thankfully.
Just looked at the layout for the T7910. Seems like a good plan and that looks like it should work from what I can tell. Definitely agree adding a fan to pull through the back is a good call. Let me know how it ends up working! @@KiraSlith
Hi there. Thanks so much for the comment! I have also been interested in non NVIDIA solutions. The ARC GPUs have certain interested me. However, I would caution you. If you leave the NVIDIA ecosystem, it is like going into the wild west so just make sure you are prepared. Here is a Reddit thread that might shed some light. www.reddit.com/r/MachineLearning/comments/z8k1lb/does_anyone_uses_intel_arc_a770_gpu_for_machine/ If you do decide to go the ARC route, please let me know how it goes for you. I would be super curious to better understand where those GPUs are in terms of AI/ML/DL applications.
Hey man. I really appreciate you reaching out. I am going to make an X or threads or similar account soon to become more reachable. For now though, just shoot me an email at: skingutube22@gmail.com
Hi there. Thanks so much for reaching out. You are should be able to download a local copy and change it in anyway that you see fit. I acknowledge that the spreadsheet needs to be updated. I am working right now on a website actually that lists GPU specs and then keeps track of historical price trends. So all of the information in the spread sheet and more should be available updated on a daily basis soon. Unfortunately, I am not comfortable giving you direct editable access to the original version in the Google drive. I apologize. It is nothing personal. I just don't know you well enough.
Hi there! Thanks so much for the comment. I am sorry though. I am not sure I understand what is being asked here. Could you give a little more context?
Brother - you went down a serious rabbit hole. A man after my own heart. One thing you didn't mention that is very important to data integrity and therefore results is ECC RAM. The pro Nvidia GPUs have ECC RAM. Most casual users don't realize how many bits are flipped through flaws in silicon and cosmic events (literally). Then some garbage gets written to disk. That's why I would never own a workstation that doesn't have ECC from top to bottom. Better if the OS / Hypervisor is using ZFS (the best file system IMO... with 40 years building enterprise and global systems). Consumer equipment is fine to test and learn on, especially on a budget. But if you want data and result integrity, at a minimum, buy pro equipment.
Like you - I've had great results buying refurbs from eBay. Used electronic prices drop faster than pulling out of an auto sales lot in a new car. Well - at least that used to be true. But it still is for workstations and servers. I recently looked at some Dell PowerEdge 730's (NVMe M2 bootable) with 128 GB ECC RAM and dual, upper-end, v3 Xeon processors for about $400 - with IDRAC (out of band management).
You did emphasize the use case that the GPU is for, and that's what I'd emphasize too. If the data you are processing isn't something that you can't replace with numerous backups, or can't suffer glitches - go with professional equipment - either new or used.
Then use ZFS RAID and not hardware RAID. ZFS controls the whole data stack - from RAM to permanent storage. You'll want to disable hardware RAID so firmware doesn't fight with ZFS. If you are learning and experimenting, but not relying on the end result - use cheaper consumer products. The learning curve is lower and so is the price tag. If you've never managed a PowerEdge server - that is an entirely different animal because it has to be. The difference between one of those and a consumer PC is like the difference between a flip phone and a Linux workstation. Night and day. But if you're a nerd like me - that's what you want as a platform.
Hey there. Thanks so much for the detailed response! I really appreciate comments like this.
Yeah I definitely went down a rabbit hole for sure with the video. Really did not mean too, but once I started I couldn't stop. lol
I will definitely keep that in mind going forward. I have never really had any issues thus far but what you are saying definitely makes sense. Also, from my research it really does not appear to be too much more expensive.
Yeah, I actually have 3 Dell PowerEdge R720s right now in my basement in a rack. Unfortunately, I do not believe any of them have ECC RAM though.
Always wondered about ZFS RAID never used it myself, but I will have to go that route. The next server I buy I want something newer, and I will make sure to buy ECC RAM and use ZFS RAID.
Also, that is a great price on the R730. Is it still available? lol
> Most casual users don't realize how many bits are flipped through flaws in silicon and cosmic events (literally).
Yup, almost none.
I can attest. I would not consider myself a casual user, but that is something I never really considered until recently. Upon further research, this is indeed quite common and definitely highlights the need for ECC ram especially for critical system or production environments. @@Christopher-lb6rf
@@Christopher-lb6rf when I was a hiring manager for sys admins and someone would put 'expert' on their field of knowledge related to Sun servers, you can bet questions about errors and external errors came up - found a few, mostly people that had worked at Sun - answered/ knew the area I was interested in :)
@@jeffm4284 this was a very helpful comment. Thank you!
I am learning at this stage and building my first nas/pc, with one of my goals being local AI with fine tuning + RAG for clinical notetaking. I'm not (currently) paid nor can I pay myself for this added work, it's for my learning, development, and the long term good of my clients.
In essence, I'm experimenting and not relying on the end result, but I fully intend to rely on it 3-5 years from now. From a cost benefit time/money perspective (and if you were a relative noob), is it worth it to start on consumer equitment, then have to spend the additional time money to upgrade most of your hardware later on?
My previous plan was to put a 16gb vram 4070 ti super in my pc, build a nas, learn, and add a 24gb 3090 or 4090 in a seperate enclosure later on (for dual gpus on my pc or single gpus on both pc and nas). But full ECC sounds essential down the line in my use case, so now I'm not so sure.
00:11 Choose GPU with newer architecture for better performance
02:28 Choose NVIDIA GPUs with active support and sufficient VRAM for future scalability.
06:51 Key considerations for choosing an NVIDIA GPU for deep learning
09:08 Consider driver support for deep learning framework compatibility.
13:10 Factors to consider when choosing an NVIDIA GPU for deep learning
15:12 Understanding the key GPU metrics is crucial for making the right choice.
19:46 Choosing GPU based on performance, memory, and bandwidth criteria.
22:00 GeForce RTX 2060 Super and GeForce RTX 4060 TI 8 Gbit are the best bang for your buck GPUs.
26:27 Comparison of NVIDIA GPU models for Deep Learning in 2023
28:45 GeForce RTX 4060 Ti 16GB has the best raw performance
33:18 Choosing NVIDIA GPUs for Deep Learning in 2023
35:36 Best bang for your buck: P100 and P40 GPUs
39:22 P100 and P40 are recommended for deep learning
Crafted by Merlin AI.
Thanks so much for adding these! I appreciate it.
What most people don't think about is that time is money !
You are definitely right. There is a reason most companies use the cloud rather than on prem hardware. These are definitely arguments in both directions. In my case, however, I cannot afford the could based solution and like full control my own hardware.
And the electricity cost, and airconditioning if your climate requires some extra work to dissipate that heat.
well this is what i actually thinking, most "best bang for the bucks" chart always forget that the longer it took to generate can also considered as a loss
This is a cool little project you did.
Hi there. I am so glad you enjoyed the content! Really really appreciate the donation. Really helps the channel!
GFLOPS is not calculated like shown in the video at 15:18, remove the Giga which we know, FLOating Point operations per Second, simply(there is some history for why this is used). It is somewhat archaic since a lot if other things are being done too, which aren't incorporated in this but in general most other operations take less cycles than a floating point one does because the comma needs special attention so to speak, 15*15 and 1,5*1,5 are the same thing except for tracking the comma separately with the result being 225 or 2,25. What I mean is the circuit needs additional logic to track commas or rather fractions so to speak, which is why we separate floating point from integer operations - additional hardware is required to track the comma "in top of" the integer type numerical operations. No idea if this makes any sense or is useful, I thought it would be simple to explain until I thought it through and realized I need to type this as opposed to scribble and show on a white board. I'm sure there is a good explanation for it out there, just trying to point to why since to a person doing math it's not as obvious as it is designing a circuit to do it.
Hi there. Thanks so much for the comment!
This is great information. Thanks so much for sharing. I did know that floating point ops were different fundamentally than other operations though I was not 100% sure why. This makes a lot of sense! Also, if you know the correct formula for calculating FLOPs in general please let me know.
This is a fantastic video explaining how to choose a GPU for deep learning/AI/ML. He extended Tim Dettmers single GPU performance chart into a masterpiece of a spreadsheet and PowerGI dashboard. Masterful. I wonder if you factored electricity cost, the cost of removing heat from the room, and total decibel output into the decision. I see in a subsequent video that the server is installed in what looks like a basement. The rack is within a few feet of a gas can. Those Dell machines can run hot, so you might want to move the gas can elsewhere. How noisy is the final product with two P100's?
Hi Prent. Thank you so much for the kind words. This took me quite a while to put together so I really appreciate the positive feedback.
I have not directly calculated the electricity cost for all of the GPUs. However, for the P100s in the dell PowerEdge R720. I estimated it costing roughly about $200 per year base on 15 ¢/kWh avg in GA where I live and my anticipated usage. However, that is really just a guess. I need to buy a gauge to actually measure average power consumption over a month or so and extrapolate that out. This seem like a good idea for another video so stay tuned. lol
The heat removal has not been an issue at this point as the basement is fairly large and stays naturally around 65 F. I have to see what it is in a week or so after the servers heat things up.
Also, as far as the noise is concerned. As long as it is in the basement or a room that you don't use often it is fine. They were in my bed room for awhile, but had to move them because of the noise lol. As far as decibels are concerned, I have not measured. However, If you are interested I can try to check and let you know more precisely.
Thankfully the gas cans are empty, but I agree probably a good idea to move them.
Thanks again for the interest! Glad you found this helpful.
Your insight and the spreadsheet you provided are invaluable. Thanks.
Hi there. Thank you so much for the feedback! I am so glad this was useful to you.
Great video, I'm very grateful; it was worth watching in its entirety. Thank you for your effort, greetings from Panama. Your video will greatly assist me in a project that my classmates and I want to undertake at the university. Many thanks for sharing such valuable information
Hi there! Thanks so much for your comment! I am so glad that my video was able to help!
WOW dude, this is an amazing study. Thank you so much for the energy you've put into this!!! This is tremendous help.
Hi there! Thank you so much for the kind words. I am so glad that this was able to provide a lot of value for you.
My brother... Thank you so much for this video. I'm pretty new to this and was JUST about to start a similar workbook before I thought to check opinions on TH-cam. You saved me at least a day of figuring out weights and priorities. I completely agree with your logic and thought process.
Hey there! Thanks so much for the kind words. I am so glad to know that this was useful for you. I am hoping to have my website up and running in the next couple months that will basically have this work book available in real time with up to date pricing. Stay tuned!
I'm watching this as a newbie from a hotel room on my laptop with sub-par speakers. Just 1 request from my unique context would be to amp up the volume on future uploads so it is easier to listen when in similar situations.
Hi there. Thanks so much for the feedback. I will certainly do that in the future!
@@TheDataDaddi hi. Thank you for the quick reply and I appreciate the willingness to up the volume. Just wanted to add. The audio issue was on my laptop’s end and roughly half way through the video, the volume issue fixed itself. Earlier they sounded like they were at half volume. I forgot to edit my comment earlier. Much love and support ❤️
Oh okay no worries at all. I really appreciate you reaching back out to let me know. You are awesome!@@kidrock777
Just what I was looking for! Thank you for all the hard work in putting this together - you make the world a better place :)
Hi there. Thank you so much for you kind words and the feedback. I am just glad this was helpful to you.
This is just great! This is just what I was looking for. You have my great respect! Extremely helpful to people. Kudos of the highest order!
Hi there. I am so glad that this was able to help you! Really appreciate the feedback!
Very good analysis and helpful ! You got yourself a subscriber.
Hi there. Thanks so much for the positive feedback! I am so glad the video was useful for you and thank you very much for subscribing. I hope you continue to enjoy the content!
I live for this deep dive stuff. Thanks for your thorough work!
Thanks so much! Glad you enjoyed the video!
Seriously great video man. I can tell this was a labor of love. Thank you for taking the time to create this!
Hey there. Really appreciate the kind words! So glad you enjoyed the content!
It would be nice to add in energy consumption, heat vs cool, most durable, and real life comparison using an actually local LLM system (and which LLM size) with these cards. For instance, the best card for single use performance, vs multi use performance in using a local LLM system like privateGPT, for an example.
Yep this is coming eventually. Things like that just take awhile to do correctly. I am working right now to create a set of comprehensive benchmarks that the community can use to evaluate GPU performance across the major AI/ML/DL areas.
I wish I had watched this video 1 day ago. Great material for beginners in ML. Thank you. I ended up choosing a RTX 4070 12gb. Not the best choice for money, but I guess still very powerful
Hey man! Thanks so much for the feedback. Yeah, unfortunately, most of the 4000 series do not have the greatest price to performance ratio. However, they perform better at general task list rendering and gaming so there is definitely something to be said for them if you have the money to afford them. Anyway, that said I think the 4070 is fine GPU and will be an absolute work horse when it comes to smaller machine learning problems. if you remember, let me know how you like it after a few months of experimenting with it. Very curious about the performance of the 4000 series in general. Thanks again for your feedback!
@@TheDataDaddi very good reply man
The work you did to put this together is very much appreciated. Thank you for the thorough and thoughtful analysis!
Hi there. So glad this was able to help you!
@@TheDataDaddi GRAPHICS CARD Msi Rtx 4070 Ti Ventus 3X OC 12GB GDDR6X is the graphic card good please help me i am thinking to buy black dimond 2.0 from dell which give me this gpu please give me some suggestion
i can invest only one time so can you please help me my buget is 2500$@@TheDataDaddi
thank you in advacn3
@@TheDataDaddi
Hi there. Thanks for the comment.
I think this would be a fine GPU overall. It would help to know your use case, but I would say that it is a good GPU for most small to mid size application.
Since this is prebuilt machine, do you have any other options for GPUs?@@cattnation6257
I heard that Tensor Core are most important for AI, ML and then Cuda cores are secondly important. I don't think I seen Tensor code referenced on you sheet.
Yes they are definitely a consideration. Admittedly I should have included those stats as well. I will likely go back an add them as well. However, in most cases the higher number of cuda core also translates into a higher number of tensor cores for those GPUs that have them.
Nice work. I think the only big flaw I see in your analysis is that purchase price is not the entire upfront cost. Each card should have an overhead cost based on the fraction of a chassis, mobo+cpu, and PSU it would use. I think you just had a chassis for 2 cards as a sunk cost in your mind so it didn't matter. But for anyone building a full system (or systems) it would have a big impact on their purchasing decision.
Hi there! Thanks for the kind words and the feedback! This is a good point. I was doing all this research to figure out what to put in my Dell Power Edge R720 so my analysis is a bit biased in that way I suppose. My assumption was just that people would be able to compare GPUs most easily by looking at purchase price to performance. However, I do agree for those that are building or using different server's this might shift the total cost significantly. For example, if you wanted to work with RTX 4090s (even in my server) I would have to buy external PSU(s) for power in order to use them. That would likely mean I would have to build a entire external rig so that would definitely increase the overall cost. These are definitely things to consider when thinking about your build. I appreciate you bringing that up, and I will definitely keep it in mind for any future analysis I do. Thanks again!
@@TheDataDaddi It is a helpful comparison but as with everything else, no tool is perfect in and of itself, it may be the perfect tool for one job but not another, or at a certain scale. Mostly for me it reinforces my previous idea which is to aim for 4090 as the most versatile and best fitting for my usecase, also because it isn't a one trick pony and can do other things as well. Some of the other options do come close but do not 'cut the mustard' for one reason or another, primarily that it needs to fit in my one rig which can only take 2 GPU's max physically but then power becomes a limit since 1200w is the biggest PSU that makes sense since above it you run into breakers popping and wiring becoming a factor etc. Every possible factor can't be realistically accounted for or factored in, in a spreadsheet. The 'slot cost' I think would be valuable for many to be able to add to it, since some uses do not require much from the platform itself while others very much do.
@@noth606 You are definitely right. One of the reasons I am trying to focus more on hardware at this stage of my journey is because it so nuanced. For best results, it really should be though about on a case by case basis. This was just meant as a way to show people generally how to start thinking about GPUs for their particular use case and hopefully lessen the research burden a little bit for anyone interested.
@@TheDataDaddi I'm sure this is helpful to get people to start thinking on more concrete terms when they get to a point at which they want to put together hardware specifically to "crunch numbers" rather than just using a typical general use PC configuration. I would think that the journey in a certain sense can or might be split into 3 "stages" conceptually where stage 1 is a normal PC, stage 2 is a PC built to "crunch numbers" but at a general level and stage 3 is a problem/usecase specific PC or set of hardware designed around the exact specific task they are meant to tackle, where for stage 3 you would include efficiency calculating "work units/sec/price/watt" type factors into it. It's important to factor in time and energy on some level because they are of a slightly different nature than calculations per second are conceptually. What I mean is they are inflexible, you don't have access to unlimited amounts of either regardless of other factors. Saying this not to be patronizing, but as someone who has at times forgotten which parts of this kind of equation are inflexible 🙂 and paid a price for that. Ask my ex wife, she'd have tips and examples of the inflexibility of time and energy for sure, lol.
Excellent my friend. thank you so much for compiling. I'm definitely going to use this for my upcoming rig. cheers.
Hey there! Thanks so much for the kind words. I am so glad to know that this was useful for you. I am hoping to have my website up and running in the next couple months that will basically have this work book available in real time with up to date pricing. Stay tuned!
Hi, I'm a deep learning beginner with a 4070 super(12gb) looking to potentially add more gpu(s) to my 2 empty slots on the GIGABYTE areo trx50 d motherboard. I'd like to have more vram for no more than $1000-$2000 budget. What are some of my options and do I need to consider power consumption with a 1200w psu and other compatibility issues?
I have a pretty good 32cores 64 thread threaripper cpu and 256gb of ram. What are some benefit of those for ML? I bought this pc for 3d simulation use originally.
So, I think the Threadripper cpu and mobo are wonderful for deep learning applications just really expensive relatively speaking. They are really the only cpu and mobo combos I know of that go over 2 x16 lane pcie slots. This means you can use more than 2 gpus with full x16 lane bandwidth.
For me personally, I believe the best you can get right now for that price range is a pair of RTX 3090s with NVLink. I would scan EBAY and wait for a good deal then grab 2 when they are cheap or on sale.
Hope this helps!
I'm just starting this video and I already know it's going to be good. Thanks a ton ahead of time!
Hi there. So glad that you are enjoying the content. Hope it helps you!
@@TheDataDaddi You literally changed my life. I could have gone off the deep end and bought a top-tier $1000+ gaming GPU, but something like that is just not ideal for the kind of AI stuff I'm interested in like SDXL and high-end 70b LLM's. This video is definitely going to be a re-watch to assure I retain the information you've kindly shared in the video! Thanks again!!
So glad this was helpful to you man! I am big proponent of buying what you need for your specific application. Please let me know if I can help in any way. @@goldholder8131
Thank you brother for your hard work. You have saved me a lot of time. Your spreadsheet is amazing! We can sort out GPUs by the desired category! I believe that , after viewing your results, the GeForce GPUs that are most notable (especially considering the price) is the 3080 Ti. It is close in CUDA cores to the 3090 (I am aiming for a 3090 , but it might be easier and cheaper for me to get a second-hand 3080 Ti)
Hi there. So glad this video has helped you!
Yep, I would definitely agree with that statement. It might definitely may be easier and cheaper to get your hands on a 3080. Most people it seems have their eye on the 3090s for the extra VRAM. However, if your use case does not require that, I think the 3080 would be an excellent way to go for sure.
Excellent video. I very much appreciate the time and research that you put into this.
So glad you enjoyed it!
Loving the content! Looking forward to the next video!!!!
Hi there. So glad you are loving the content! Thanks for the support!
Hello! First off, I want to thank you for the well researched and presented explanation regarding why GPUs are important to the topic of machine learning and the comparison of the major line of GPUs that you are familiar with.
I'm currently in the middle of doing research for building a creator PC for performing GIS (Cartography/Data science/Computer science/programming/Machine and Deep Learning/etc.) with photo editing and Gaming as a side benefit. I'm looking for a GPU that can handle a wide variety of tasks with a focus on visualization and processing of high resolution RGB imagery, high resolution 4+ band multispectral imagery, Hyperspectral imagery and LiDAR (from Unmanned Aerial Vehicles) as well as machine learning and deep learning tasks.
I was hoping to play around with the settings in the tool you provided, however I was not able to get power bi working and I'm too lazy to want to spend time trying to figure out how to set up the program properly.
I was already looking at purchasing a 4070 ti for my build, but would you say that a 3080 ti, a 4080 series or 4090, if I can find one for a good price, would be a better choice?
Hi there. Thanks so much for your feedback and the question!
So since you also want to do photo editing and gaming as a well as AI/ML/DL application I think the RTX family of NVIDIA GPUs is definitely the right way to go.
I think the 3 main questions here are:
1) What is your budget?
2) How large are your expected datasets
3) Do you plan on expanding the number of GPUs in the future?
Based on your needs and the price points you're considering, my recommendation leans towards the RTX 3090 or 3090 Ti, especially if you can find a compelling deal. These GPUs offer exceptional value around the $1,000 mark for your specific applications. Moreover, it's possible to find them at even lower prices, approximately $800, on platforms like eBay with a bit of patience. Their price-to-performance ratio is among the best in this price range, making them a highly attractive option.
A significant advantage of these last-generation NVIDIA RTX GPUs is their support for NVLink, which, in my opinion, offers a notable benefit over the more powerful 4090 for your use case. Starting with a single 3090 or 3090 Ti allows for a robust setup. As your requirements expand and your budget allows, you can further enhance your system by adding another GPU and linking them with NVLink. This approach provides a scalable and highly effective setup for a wide range of tasks.
@@TheDataDaddi Thanks for the response, It's pretty hard to find information about creator PC building, especially if you're not doing video editing, so I really appreciate it. I'll definitely be keeping my eye out for a 3090 or 3090ti with a good deal.
The datasets I'm working with right now vary pretty wildly in size depending on the study area and which sensors we're using to collect data (a 10-band multispectral image will be larger than a RGB image and much smaller than a LiDAR point cloud which is in turn smaller than a Hyperspectral profile for example).
Anyway, thanks again for the information! I hope you have a great rest of your day!
@@TheDataDaddi I just checked the current retail prices for the 3090 and 4090, and was reminded that the reason I was looking at the 4070ti was because because of that good old sticker shock. It's less than half the price of even a 3090 (The cheapest 3090 at retail is about $2100.00 Canadian, compared to the 4070ti's $899.00 Canadian at retail) here in Canada, where everything is 1.4x more expensive due to the exchange rate.
I guess that's the price we pay for all that extra VRAM. I'll still be keeping my eyes open for a deal, but I may end up buying a less expensive graphics card in the meanwhile. I can always upgrade in the future, and when I do the graphics card I buy now can be recycled into part of a home server later.
Of course! Glad you found it helpful. Yeah one of the reasons I focus more on hardware on this channel is because not many other people out there focus on that aspect of machine learning and it is super important. @@dragonmaster1500
Yeah the sticker shock with GPUs its really hard to take sometimes. lol. Check out this one on EBAY. I bought 2 recently. They are a pretty good deal if you are willing to use refurbed equipment (almost all of my equipment is used bought from EBAY or similar). I think it would be about $1200 CAD.
www.ebay.com/itm/155867314803?epid=28044609256&hash=item244a6a7a73:g:CnMAAOSwmxNlRStE&amdata=enc%3AAQAIAAAA0GhaLrApc303M8MFhLKXaC1XCZUsnm98lj%2BZeFSruH9oJCFANdXBqU29SOoKs%2BWXGvlPyaIiK5XaubhTqwcQcesmE5FwiNLe0DFWbTLSQ%2FedCQeh%2FYGwxBressF0aNTusfEfh6%2FPh2A%2FG7Uz%2B%2FxEz5CVwvRLABldqDMSoIn%2BM32M3Spzp9f5vb9qFjFE3B7TxotPhewTVPG5AlHyBpu4J07YixG%2FvLiZ2XJDt4nOaaDYjXWNF89%2F8WSbWK8TIBBumuk1germV%2BC3pNIkixMDAGA%3D%7Ctkp%3ABFBMpqHIra5j
Yep definitely agree here. That might be the better way to go for now. The best GPU is always the one that fits in your budget. Lol.
@@dragonmaster1500
Things have changed, time to step into an Arc A770. Will need a slight adjustment with the tools but now you can get the higher end GPU for a lot less money.
Hi there. Thanks so much for the comment!
This is great to hear! What AI/ML/DL applications have you used the ARC A770 for? I would be super curious to know if you have tried it out for anything yet!
Hello, How much additional performance can I expect when running deep learning models if I upgrade from an RTX 3060 Ti with 8 GB to an RTX 4070 Ti Super with 16 GB?
It is hard to say because it is largely use case dependent, but in general you should see a significant improvement in terms of raw throughput (~75%-100% over the RTX 3060 Ti). However, for me the best benefit is the larger VRAM size. This would allow you to load models twice as large which would be extremely helpful for working with LLMs or diffusion based models. Eventually, I hope to be able to test these GPUs direct to give a more definitive answer.
Thank you SO much, you've saved me hours of life down this rabbit hole!!
Hi there. So glad this helped you! Love to hear that.
hey, please consider adding the 4070ti super 16gb model to your spreadsheet!
Good call! I will update it as soon as I get a chance.
Wow ! How many eons worth of time have you put into this ?
Theres not very many people who would go so far out of their way for others like this, this great work of art, thank you.
So, for LLMs like Mixtral I can just use P 40 ? Yay
Hi there. Thank you so much for the comment!
This video took quite a while for me to put together. I think a couple weeks if I remember correctly. I was doing all the research for myself so I figured that I could save others some time and frustration by sharing my results. I am glad to hear that it is appreciated!
Yes! After looking at the specs on hugging face, it looks like you can run inference with even the largest mistral 7B available with a max VRAM requirement of 10.20GB. Unfortunately though, if you wanted to use a non quantized version you may run into memory issues.
huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF
Nice video!
I'm planning to do a university project for my computer science degree and I need a GPU for this (budget = 600€ - 800€ approx).
The project is about an autonomous driving system based on the CARLA simulator.
The main parts are:
* Ryzen 9 7900X
* 32GB RAM 6000Mhz/CL30
* 1TB WD\_BLACK 850X M.2
I've been researching the RTX 3090 since they have 24GB of VRAM and are sold used in my country for 650-750€ but I'm a bit afraid to buy them on the second-hand market because most of them don't have warranties, they are repaired (stickers and gold solders) and I don't really know if they will end up breaking down in a few months.
The other option I've considered is a new RTX 4070 TI, with 16GB of VRAM, which I could get for 700-800€.
My questions are:
* Which GPU do you recommend?(It doesn't necessarily have to be the ones I've listed)
* Does it matter so much to have more VRAM instead of power? I mean, 16GB isn't enough?
I think my university can give me access to training servers but I don't know the hardware of these. In this case, I would also need the GPU because I have to run CARLA on my computer.
I don't rule out using third-party services but I don't know if it will be economically possible due to the fact that I need large amounts of memory for the datasets.
Should I use a more normal GPU to do research by training small models and then do the hard work on those servers? Should I pay more and buy something completely local?
* Would it be possible to develop the entire project locally and without using AI servers?
I am a simple student who is planning to do a master's/PhD next year.
Thanks
Hi there! Thank you so much for the kind words and the great question.
First of all that sounds like a really interesting project and tough research area!
A lot of what GPU you should by depends on how large the models are you will be working with. I am not super familiar with CARLA, but from my research just now. It looks like CARLA itself is a simulation environment, and its primarily loaded into conventional RAM. It looks like 32 GB should be sufficient here. As for the machine learning/deep learning piece, this is what will be loaded into VRAM. So, the choice of GPU depends a lot on what models you will be planning to use. If you can let me know which you plan to use, I can give you a more targeted answer.
In general though, my initial though its that I would probably go with the refurbed RTX 3090. It is one of my favorite GPUs from a cost to performance perspective and should be able to handle pretty much whatever you throw at it pretty well. That said I would make sure to buy from a reputable seller. I have never had any issues buying second hand GPUs, but I have always been really pick about where/who I buy them from. If you have the ability to use EBAY, I would recommend that. It gives you at least 30 days of protection to make sure the GPU is working as expected.
Some additional thoughts. I would almost never recommend for people to buy a cloud based GPU solution. I have not found any that I really like that much and are at a reasonable price point. I think you would almost always be better of to buy the GPU for the duration of the project then sell it at the end if needed rather than renting a cloud based solution.
As far as can you develop the whole thing locally, I am not sure. I don't have enough information to say for sure. However, from my brief research it looks like that should be possible.
Anyway, I hope this helps! Please let me know if you have anymore questions, and please keep me updated on your journey!
For anything with image generation, higher vram and cuda cores are pretty much the main priority. I was looking at 3060 with 12gb vram, 3090 with 24gb vram and 4090 with 24gb vram. The 4090 was twice as fast as the 3090 due to the updated architecture and more cuda cores, but considering I couldn't find one second hand, it would have cost me about 4 times the price.
For something I'm just testing out, the 3060 and 3090 were the options I was willing to consider. I would have taken a 3060 if it didn't look as though more vram would have been a necessity for a lot of upcoming technology. The 3090 has been really nice to have and is at least 10 times faster, maybe even 20 times faster than running on an M1 MacBook pro with 16gb of unified memory. I just need to look for a feasible cooling option for it for when I'm running large stints.
I completely agree with this comment. I would almost always choose a GPU with higher VRAM over one with better performance these days. The 3090s rn are my favorite. I think they are the sweet spot of price, performance, and VRAM. Unfortunately cooling them properly especially in a case (I am assuming this is your situation) can be challenging. Could try adding more/better case fans and writing a program to adjust their speed based on GPU temp, upgrading to a bigger case, and potentially even selling and swapping for a water cooled variant if possible. The also make aftermarket coolers for these GPUs as well. I have never explored this, but these could be options.
@TheDataDaddi I was looking into cooling options and there aren't many out there. After I wrote the first comment, I checked out the temperature and it was in the ideal temperature for maximum compute even after running it at that level for a few hours. I'm not sure if that's still a potential issue or not because of how often it's at maximum compute, but it did put my mind at ease somewhat. Due to the age of the GPU, there weren't any cooling options in stock in my country. I may consider selling it and buying a new second hand device with cooling built in, or if I'm able to monetize some of the work, I may just buy a 5090 when they release. Fingers crossed.
I scored a 3080 10gb for 353.00 out the door a few weeks back to replace/alternate with a 3060 12gb.
Hi there. Thanks so much for the comment!
Man! That is an excellent price. Great pick up! Hope it works well for you!
Thank you for putting this information together.
Reading up on some of the cards mentioned here in the context of deep learning cooling issues seemed to come up a couple of times. People were talking about the Tesla m40, for example, noting having enough native cooling to deal with constant loads from deep learning. Have you had any issues from that in your builds? EDIT: I see you have some videos about heat and cooling just after this. I'll take a look.
Hi there. Thanks so much for the positive feedback!
Yep, I was just about to suggest that video. I have not personally had any horrible issues with any of my GPUs overheating, but it certainly can be. If you have any specific question after the video feel free to reach out, and I will do my best to help you.
@@TheDataDaddi Thank you. I'll let you know if I have any questions.
Sounds good! @@Sensorium19
Hello, Great video! Another thing, you have not benchmarked the Nvidia RTX 3000 Ada Generation Laptop GPU yet. It will be helpful if you do that also.
Hi there! Thanks for letting me know. I will try to get that added as soon as I can!
this is some serious data science stuff nice work.
Hey there! Thanks so much for the kind words. I really appreciate it. I am hoping to have my website up and running in the next couple months that will basically have this work book available in real time with up to date pricing. Stay tuned!
Amazing. Thanks a lot bro.
If you were to choose between 3060 12GB and 4060TI 16GB, which one would you go for?
Thanks so much for the kind words! So glad this video was helpful. I found it extremely helpful myself in choosing GPUs for my own projects.
Mmmm. Depends on cost as you may be able to find some great deals for black Friday, but if you want to get the most performance for your money I would go with the 3060. However, realistically its not that much more the the 4060. Overall though, I am unimpressed with the 4000 series GPUs compared to the 3000 series. If it were me, I would buy the 3060 (unless you really need the larger VRAM) and wait until NVIDIA releases the next series of GPUs then consider those options. On a personal note I use the 3060 in my daily driver PC. I use it as a test bed for models I want to scale up to run on my home lab servers. It works really well for that. I have not have any issues loading and working with fairly decent sized models.
Thanks a ton bro@@TheDataDaddi
Sure man! Glad I could help
@@jubayehossainarnob6069
4060 ti 16GB. More memory is better and the first priority
Probably the way I would go as well.@@TekTakes
Would have loved to see the top 2 of each category put in a list together and compared in the spreadsheet. Great video
Hi there. Thanks so much for the comment!
I am actually in the process right now of creating by own website: thedatadaddi.com. One of the first things I am going to put on the website is a real time GPUs price to performance dashboard. I will certain add in the ability to make this kind of comparison. Please stay tuned for progress here.
Awesome! Very helpful!
Thanks so much for the kind words!
4:02 But Sir, LLM like chatgpt are trained from multiple gpus only right?
Hi there! Thanks for the comment.
For LLMs like GPT-3, the resources required to train or even pre-train such models from scratch are beyond the reach of most individuals and many companies. This is due to the immense computational power and data handling capacities needed. For example, models like GPT-3 are were basically trained on most of the internet (Common Crawl, WebText2, etc.), Tons of full books, a snapshot of Wikipedia at that time, and more. Training was preformed with literally thousands of high-end GPUs across various datacenters. Also, ChatGPT and other similar models are proprietary so lack of detailed specifics about their training processes and architecture make it hard to know exactly what it would take to train something like GPT-3 (or other similar model) from scratch.
What is accessible for most people and organizations is running inference using these models. "Inference" refers to the process of using a pre-trained model to make predictions or generate text in the case of LLMs. The feasibility of running inference smoothly depends largely on the amount of VRAM available, as larger models require more memory to operate efficiently. For instance, smaller versions of LLMs might run on a single GPU with 12 GB of VRAM, while more extensive models might require a GPU setup with significantly more memory.
For those with more robust computing setups, such as advanced home labs or small to medium-sized enterprises, fine-tuning an LLM might be within reach. Fine-tuning involves adjusting a pre-trained model on a new dataset or for a specific task, which typically requires fewer resources than full-scale training from scratch. This process allows users to tailor the model's responses to better fit particular contexts or industry-specific needs without the prohibitive cost of training a new model from the ground up. The following Reddit thread is pretty useful in providing more details here:
www.reddit.com/r/MachineLearning/comments/15uzeld/d_estimating_hardware_for_finetuning_llm/
For fine-tuning, a setup with one or more high-end GPUs, such as the NVIDIA A100 or V100 (or RTX 3090 as I advocate for), would generally suffice. This allows for modifications of large LLMs using varied sizes of data, making it a viable option for enhancing model performance on specialized tasks.
In summary, while training large-scale LLMs from scratch is out of reach for most, leveraging these models through inference or fine-tuning them for specific applications is quite feasible with the right hardware setup. This opens up opportunities for a wide range of applications, from personalized AI assistants to sophisticated data analysis tools, even for smaller organizations or dedicated individuals with the appropriate resources.
Excellent content! Congratulations! I had a question: Why didn't you include AMD GPUs? Would that change anything?
Hi there. Thanks so much for the positive feedback!
I have not include AMD GPUs here because AMD GPUs to my knowledge and experience are not well suited to machine learning. The AMD drivers are really buggy and make using them for machine learning a huge pain. NVIDIA unfortunately is the only manufacture I trust at this point for GPUs for AI/ML/DL.
That said as I have time in the next few months I am going to try to experiment with and older AMD GPU I recently bought in depth to see what current limitations of the AMD drivers are. I will report my findings a future video. Also, if the results are good. I will make another video just like this but include AMD GPUs.
@@TheDataDaddi Yes please!!
Have you heard of the Radxa Fogwise AI SBC? Would be worth reviewing and comparing to these cards
I have never heard of this. Let me check it out!
Wow! This is actually incredible. Really impressive specs and the price is great! This is my favorite by far of the ones you have suggested. Definitely gonna have to make a video here. Really appreciate the suggestions!
@@TheDataDaddi I look forward to it. I'm also considering buying a Orin NX 16GB from SeeedStudio... so would LOVE a comparisson of these two :D The Fogwise is half the price or less but would be cool to see them head to head. Thanks!
@@mickeymouseman Gotcha. Definitely, think this would be cool. I will see what I can do here! What is your use case for these if you don't mind me asking?
Thanks for the guidance, man.
Can you help me on this,
4060ti 16 gb is same at price for me as 4070 Super.
Which is the better card for DL, I want to get into AI so I wanted a GPU to get me going
again that would depend on you if ur work does not require 16gigs of vram, then go for the 4070 and it will cut the training times.
Hi there. So glad the video could help!
Congrats on the start of your journey!
If it were me, I would go with the 4060. I almost always go for the GPUs with more VRAM in the context of AI/ML/DL. I think this should suit you better as you grow to larger and larger models.
Good luck! Feel free to reach out if you have any questions along the way.
oh thanks for replying, well in the meantime I bought a much cheaper 12 gb 3060 thats like almost 1/4 the price of a 4060ti. And in the future, I am planning to buy another 12 gb 4070super to make the total vram 24gb, how good is this strategy ?@@TheDataDaddi
You are missing the most important point, tensor core fp16 and bf16 performance. Shader fp16 isn't used anywhere where it's performance matters
Hi there! Thanks for your comment!
I completely agree that tensor core FP16 and particularly BF16 performance are crucial for LLMs and many other deep learning tasks. However, I don’t think we can entirely dismiss the relevance of standard CUDA cores or shader cores in AI workloads.
While tensor cores are optimized for FP16 and BF16 precision, CUDA and shader cores still play a significant role, especially for FP32 tasks, which remain the default in many deep learning models-particularly in computer vision and audio processing.
Moreover, it's important to note that CUDA cores can handle models trained in FP16 or quantized formats. The key difference is in efficiency: tensor cores will perform these tasks faster, but GPUs without tensor cores can still run the models-often at a fraction of the cost. This leads to a trade-off between performance and price.
If top-tier performance is the goal, then GPUs with a high number of tensor cores are the way to go. However, for many users, especially those on a budget or with less performance-critical needs, GPUs with fewer tensor cores (or none) can still offer viable solutions. It's all about balancing your requirements with what’s available.
@@TheDataDaddi Yes you are right, CUDA=shader core fp32 is very relevant especially in training because it's easier to use for lazy quick prototyping/dev work or sanity checking your low-precision result, or the rare (training) case where you actually need the range and precision. However, for tensor core GPUs shader fp16 performance has no effect. There shader fp16 is only for stuff like activation functions and perf could be tens of times lower without having any effect. The point was just that for tensor core GPUs you would want to only list tensor core fp16 flops.
Grabbing a dirt cheap P100 16GB space heater for fp16 memory-demanging workloads could possibly make sense, here shader fp16 would matter. (Even int8 maybe but I dont't know if common frameworks offer dp4a implementations)
Adding int8 and fp8, even fp6 and fp4, to the comparison for inference purposes would be interesting too since LLMs are now often available in those precisions and fp8 effectively doubles memory size and bandwidth too compared to fp16.
Great job!
Thanks so much for your positive feedback!
Useful info. Thanks
Hey there! Thanks so much for the kind words. I am so glad to know that this was useful for you. I am hoping to have my website up and running in the next couple months that will basically have this work book available in real time with up to date pricing. Stay tuned!
Juat picked up a 3060 12gb for under $200 glad to see it stacks up with a p40 just unfortunate that the p40 is about the same price with more ram I just couldn't have those loud small fans on my PC
Hi there. Thanks for the comment! Yeah the RTX 3060 is a good choice for sure, and it has 2nd gen tensor cores as an added bonus. Definitely agrees that having noisy fans on a small PC is not the best way to go.
Would it be possible to link a 4070 ti super with another card? I'm just starting out but, I was thinking 16gb of vram could start me out and I could add a second card in an enclosure to connect to with my pc running dual gpus or my nas in the future. I can't afford much beyond one gpu right now while building out other systems 😕
Hi there. Thanks so much for the question!
Absolutely, this is really common. GPUs are really expensive so in many cases people (myself included) will buy GPUs incrementally as they have the money.
@@TheDataDaddi Cool, I appreciate your reply! I feel like I'm witnessing the future watching videos like these and I'm both excited and overwhelmed!
Is it fine if a card doesn't have nvlink, or if they're cards from different generations? I'm thinking a 4070 ti super right now for my pc and a 3090 as the "vram/AI" server card (or a 4090 if the price drops enough once the 5090 comes out). Thanks again for your reply and helpful videos!
@@philiphimmelstein9510 Yep! This is one reason I love this area so much. Things change so quickly. It is always exciting.
Yep, that should be fine! NVLink is definitely not a requirement. In fact, I have not actually tested the performance gains (or lack there of) you actually get from it. I need to do that. lol. I have 3 different GPUs in one machine and don't have any issues. Of course! Happy to help.
@@TheDataDaddi Sweet! That's good news. Thank you 😊
@@philiphimmelstein9510 Of course! Happy to help!
Amazing video! I wonder if you could advise me on my next setup...
I'm a deep learning engineer who typically works in medical computer vision, however, i'm looking to buold something for my home office. My budget is 3-4k for the entire setup. Datasets I use tend to be quite large, and 20gb vram is probably a minimum in terms of model size. I've been looking at a 4090 prebuilt because for that budget, I could get a nice spec with the option to do some occasional gaming.
In an ideal world, i'd want more vram! 48gb would be amazing. I wondered about going for a 2x 3090 using nvlink. What do you think about something like this?
Thank you in advance!
Hi there! Thanks so much for reaching out.
So for your budget, I highly recommend going the 2x 3090 route. This is the best configuration I have found so far that balances price and performance for what I would consider mid to upper end setups in terms of compute resources. I actually just bought 2x 3090s and am going to be making some video exploring their performance with and without NVlink. This will also be good in your case because you are interested in occasional gaming.
Unfortunately, for 48GB VRAM at this stage you are going to have to go the Tesla series GPUs and those break the bank even used. For right now, unless you absolutely have to have 40+ GB of VRAM I think 2x 3090s with NVlink is the way to go for mid to upper range computing projects.
@@TheDataDaddi Thanks so much for your response!
I'm having a tough time clarifying whether 2x3090's with NVlink allow for larger models to be loaded during training/testing...?
I understand data parallelisation and how that is beneficial, but can I actually distribute the model in a way that allows me to experiment with really large models with a 2x3090 setup? I use both tensorflow and pytorch and have seen forum posts going either way with regards to this setup. I would be really interested to hear your thoughts.
So, utilizing a dual RTX 3090 setup with NVLink indeed expands your capacity to experiment with larger models during training and testing phases. Typically, in such configurations, the entire model is loaded onto each GPU. The remaining memory is then allocated for data batches during training, testing, or inference processes.
With each RTX 3090 leveraging about 24GB of VRAM, you should find this setup sufficiently robust for handling fairly large models (pretty much most things up to the mid to large open source LLMs). That said though, it really depends on the size of the models you will be loading. I would recommend loading a typical model you might work with onto the CPU checking the memory utilization before and after. This should give you a good idea of how large the model actually is. Then you can see a) will the model in its current state fit into memory on both GPUs and b) what kind of batch sizes you might be able to work with. Smaller batch sizes, while manageable, could become a limiting factor in your workflow, potentially impacting training efficiency and model performance evaluation. From here you can make a decision as to whether or not you really need more VRAM.
An alternative strategy (especially relevant with NVLink because you have direct memory access between GPUs) involves distributing different parts of a model across the two GPUs, allowing for batch processing to occur sequentially across both units. This method, albeit more complex, can effectively mitigate VRAM limitations by leveraging the combined memory more efficiently. However, it's worth noting that this approach requires careful implementation and might not be suitable for all models or scenarios. It is also important to note that NVLink does not create a single unified memory space accessible by all GPUs as in the traditional sense. Put another way 24GB + 24GB of VRAM does not equal 48GB of VRAM in this case as it would on a single GPU.
Frameworks like TensorFlow and PyTorch do support multi-GPU setups and offer varying degrees of support for model parallelism and data parallelism. For TensorFlow, strategies like tf.distribute.MirroredStrategy can be employed for data parallelism, which synchronizes training across the GPUs for each step. PyTorch users can leverage torch.nn.DataParallel or torch.nn.parallel.DistributedDataParallel for similar purposes. For model parallelism, where the model is split across multiple GPUs, PyTorch provides more explicit support through manual implementation, allowing you to define how different parts of the model reside on separate GPUs.
With full transparency, I have not had to use model parallelism thus far so I cannot comment more specifically on how to implement it. It will also vary from use case to use case. I think this would be a great video topic though and certainly something that would be great to know as models continue to get larger and larger. I will try to make a video on this specifically as soon as I get a chance.
Anyway, I apologize for the long winded answer, but I hope that this response is useful to you!
@@dirtdabest
Ah this is a brilliant response! I am going to go for the dual 3090 setup! The potential freedom of having a larger model is what's most important to me on reflection.
Are there any other special considerations for a build like this?
Thank you so much for your response, it's really cleared things up! I will be watching out for the video when it comes!
Glad to hear it! I think that setup will suit you well. They also hold their value well so it does not end up being what you need you can easily get most of your money back out.
Only thing I would say about the build is if you are going the server route make sure that you choose one that will fit in whichever RTX 3090 you choose. I might consider an external rig with pcie extenders. I am doing this myself actually so I will have a video on this soon. You might consider that as well. External rigs are also much easier to keep cool.
So glad I could help and keep me updated throughout your journey!
@@dirtdabest
Is there a big difference in performance and speed in AI tasks like stable diffusion etc between RTX 4080 super and RTX 4090?Which one should i buy as I seldom play games or should i wait for 5090 at the end of the year?I am not a video editor or hold any jobs related to designing or editing,just a casual home user.
Hi there! Thanks so much for the question.
I would say at this current moment. For stable diffusion related tasks, I would go with the 4080. 16GB VRAM should be enough to comfortably handle pretty much all stable diffusion tasks are this point (to my knowledge) and its performance is 60% of what you get with the 4090 for less than half the price.
All that said, If you are not pressed for time, I would probably wait to see what happens to the market when the new 5000 series GPUs come on market. It could bring the prices down for the 3000 and 4000 series GPUs as people dump their older GPUs in favor of the latest and greatest. This approach is always a gamble though so if you prefer a safe bet I would look for good deals on a 4080 and not worry too much about what will happen down the road.
This is a badass video. Thank you.
Really appreciate the feedback. So glad it helped you.
What do you think about the Titan V? I didn't see it listed.
Yeah I did not cover it unfortunately. I guess I missed it somehow. Anyway, I would that it is a great GPU. However, the best price I can find even for a used one is about $489.99 atm. This is still about double what you can get the P100 or P40 for at this point and the performance gain is not that much greater for the money (11.7 TFLOPS vs 13.8 TFLOPS for single precision) (4.6 TFLOPS vs 6.9 TFLOPS for double precision). So overall I would still recommend the p100 if you have a need for higher double precision performance or the p40 if you care more about single precision and more VRAM. With that said, I think this is a fine GPU and one of the best you can get for the $500 price range. It also has a newer architecture so I will be relevant for longer. So at the end of the day it really depends on your budget, project needs, and how long you need it to last. Hope this helped! Please let me know how it works for you if you decide to get it!
For stable diffusion, which is the best price/performance?
Hi there. Thanks for the question. What is your price range for GPUs? The best price/performance overall is going to be the p40 and/or p100. But if you care more about performance and willing to spend a bit more I could recommend something more performant. Hope this helps!
great video and information, really helpful.. thanks for the hard work
Hi there.Of course! So glad to hear that it helped you!
Thank you for your hard work.
Was p40 or p100 the better choice? ALso were you using nvlink?
Hi there. Thanks so much for the kind words and the comment!
It really depends on the use case, but overall for most people I would say that the P40 is actually the better choice. The only GPUs I currently have that support NVlink are my RTX 3090s. Yes, I have them connected via NVLink.
I like to throw in a perspective, currently im using an GTX 960 2gb vram, 7 gb ddr 2 ram @ 600mhz, I also have shared video ram or what it was called which ups my "vram" up to around 5gb (can be checked in task manager for example), AMD tripple core @ 2.1ghz, im fine tuning the smallest gpt2 model via CUDA at around 1 iteration per ~30 seconds and it uses around 3.5gb of my "vram".
Hi there. Thank you so much for the comment.
Love when people share specifics like this. Incredibly helpful for the community to understand the performance of real systems. This actually is very impressive. Also, shows that you don't need the latest and greatest hardware for every use case.
Where do you buy the P100? and P40s?
Hi there. Thanks so much for you question. I normally buy most of my tech gear from EBAY. I try to find refurbished used gear at a good price from a reliable seller. Hasn't failed me yet. I highly recommend this approach because new hardware in this areas is incredible expensive.
wondering if we can pair a 4060 TI OC 16gb with a data center card for more ai performance will ai be able to use both cards at once? wondering cause i already have the 4060 atm
Hi there. Thanks so much for the question!
Yep, you should be able to do that! It is a bit unconventional, but it shouldn’t cause any major issues!
I made a spreadsheet too, but yours is thorough! I'd also come to similar conclusions as you: that the P40 / P100 were cheap ways to get medium size LLM models into a GPU with decent tokens/second. Your spreadsheet would have saved me time if I'd known about it! At least there's some independent confirmation of your conclusions. There's a lot of detail to add, like how fast/slow models are on certain GPUs ... perhaps another vid on that to save me the effort? :P
Hi there. Thanks so much for the comment!
I am glad to here you can confirm! Especially as hardware prices keep increasing I think these are actually becoming even more relevant for those who are budget conscious.
Funny you mention this. I am actually working right now on a benchmarking suite to enable reliable comparison between GPUs for different models. There is not a reliable open source benchmarking solution for GPUs so I am trying to create one (or make steps toward it at least). As soon as I get something decent, I will make a video series on it and start using it to benchmark GPUs in a real way with respect to individual models.
Im torn in choosing for Gpu for ai use first (koboldccp + sillytavern) and gaming second. My choices were a 3060 12gb at first then the 4060 ti 16 stood out but then the 4070 ti super got recommend to me. I intend to use the card for at least 3-5 years. The only thing limiting me is my small budget. Like i could buy the 3060 now and the 4060 ti after few weeks. While ill wait and watch out for deals on the 4070...
Hi there. Thanks for the comment!
I am not super familiar with koboldccp or sillytavern so please take what I say with a several grains of salt, but from my brief research they need an AI model integrated in some way. I am assuming you want to host this locally. For this I would go with the GPU with the largest VRAM, so the 4060 16GB TI is the clear choice in my book.
@@TheDataDaddi yeah. Basically locally hosting an AI model to my pc. I'm not really into machine learning as of yet. Currently I have a 2060 on my pc and 6 GB isn't really enough. Another question is AMD not a good alternative?
@@rukitorin1998 It can be, but AMD is generally not as easy to use for machine learning. However, it may be applicable for your use case. AMD definitely seems to offer better price to performance, but there are still a lot of bugs from what I understand. I also cannot recommend it strongly AMD because I have not personally dabbled there. What I am telling you now is just based on feedback I have received from viewers.
@@TheDataDaddi thanks for responding, sorry for the late reply. :D i will be buying the 4060 ti 16gb soon when i find a cheaper price point. Living in the Philippines prices are somewhat higher. 30 kph peso(519.08$) to 32k (553.68$) is the prices im looking at right now.
i think you missed out on parts - actual performance per model, and if one can use fp32, fp16 or int8 or tensor. P40 is terrible options for any ai workload due to amount of time one would have to wait... and its power requirement.
Hi there. Thanks so much for your comment!
I would agree that for anything below fp32 operations these GPUs would be quite slow. However, the GPU is less than $200 dollars for 24GB of VRAM. So, if you are wanting to experiment with larger models cheaply, I think these GPUs still have good value.
Yup, just got a couple og P40s for a ML350P... after investigating the NVIDIA site.. Slow, yes, but for cheap and something that can run on MS Win2012 Enterprise, it's the ticket. (old mining ETH machine) It will run in the garage without air-conditioning.... Slow but study.
Hi there. Thanks so much for the comment!
Yep. These are still a great option in my mind. Slow and steady wins the race as they say. Lol
can we reach out to you for Universities AI requirement ?
Absolutely. Please feel free to contact me any way you like. All of my contact information can be found in my TH-cam bio. I will also paste it below for convenience.
🐦 X (Formerly Twitter): @TheDataDaddi
📧 Email: skingutube22@gmail.com
💬 Discord: discord.gg/RyRHEn3yMx
Thanks for the work man ! Love from France
Hi there. Thanks so much for the comment. So glad you enjoyed the video!
good video. good thing you did not pick maxwell. plus side. lots of vram, minus side, everything else. its really slow, xformers dont work, (have to compile it your self to get it to work, have to run most things in fp16 mode. no adamx support if you want to do training.
i bought the card 5 years ago but ditched it the moment the p40 dropped to 200 dollars.
sadly the volta cards are still 700+ so im still stuck with the p40 but it still does everything i need it to do.
Hi there. Thanks so much for the comment!
Yeah, the Maxwell GPUs could still be helpful to some that are just getting in to machine learning and deep learning and are only looking to run smaller models. Also, as you pointed out it does have its own set of issues. Overall through, would recommend in most cases to just start with a p40 or p100.
Yep, I think that is the situation for most people. Myself included. Volta GPUs are the logical next step, but the price is still a bit to high to use them in large quantities. Unfortunately, I am not sure if this will change in the immediate future either. The Volta series GPUs and higher have tensor cores and good mixed precision performance so they are still in demand for a lot of businesses. I do not see this trend changing in the near future.
3080 ti 2nd or 4060 ti new ? which the best for ML or DL
Hi there. Thanks so much for the question!
This is tough. I think you really can't go wrong either way. Both are solid choices for ML/DL GPUs. However, for about the same price (I checked just now and they seem to be about the same price on EBAY in my area) I think I would go with the RTX 3080. You get much better performance and the difference in VRAM from 12 to 16 GB is not significant enough to justify the performance difference.
Thanks!!!
The effort you put into this video is just mind-blowing. I subscribed immediately 😁
Hi there. So glad you enjoyed this video and thank you very much for subscribing! I will do my best to continue to make great content for you.
Is there a current laptop with more than 8 GB vram that is recommended and does not cost 3000?
Is it better to wait for new processors or graphics cards that incorporate new chips specially built for AI models?
Off the top of my head, I know that the 4090 mobile gpu has 16gb of VRAM. I do not know what laptop have this standard. This link may help you.
medium.com/@ibrahimcreative172/top-10-laptops-for-deep-learning-machine-learning-and-data-science-in-2023-f8a6ba861c4f
I think it depends. For example, if you could wait until 2025 when the RTX 5000 series comes out, that might be worth it as they will hopefully fix some of the shortcomings of the 4000 series.
However, they will be super expensive when they first come out. So I normal prefer to go for older GPUs. I feel that they have a much better value. Long winded way of saying I would probably not wait and try to find value in what exists currently.
@@TheDataDaddi Yes, I think I'm going to wait for new laptops to come out with better video cards and maybe better processors with NPU which seems to be coming strong
Now I have a gaming laptop even though I don't play, with an rtx3050 ti but only with 4 vram. When I bought it in 2021, I didn't know that I was going to need more vram for Stable Diffusion. We'll see how things develop. Thank you so much
I did some more research last night in the area, and I think this might be a good option for the mobile/laptop route. It seems NPU technology stands to make AI/ML/DL much more viable on laptops. @@RSV9
Yeah it would definitely difficult to do much with stable diffusion on 4GB of VRAM. It might be worth upgrading to a GPU with 8 or 12 GB of VRAM while you wait for a better new laptop. You can find some pretty good deals on EBAY if you are patient.@@RSV9
Personally I love the P40 at the moment however BE AWARE that the Pascal cards do NOT offer NVLink, for that you have to go to the Ampere cards. That said you can operate the P40 in x8 PCIe mode without significant loss in performance. Not ideal but if you have a consumer motherboard and are trying to get one more GPU in there this one might not be a bad choice for a Gen 4 board where the increased PCIe buss speeds more than make up for the lack of full x16 access for this Gen 3 card.
Hi there! Thanks so much for the comment.
Yeah unfortunately they do not seem to. Although interestingly they have a cut out on one side that looked like it was made for something NVLink related. However, I have never been able to find anything that would link them together.
Yeah that is true. In most cases for consumer mobos you will be working with x8 pcie not the full x16 unfortunately. Like you said though for most it shouldn't make that much of a difference.
@@TheDataDaddi and if you are looking to water cool there will be challenges for the P40 s well. The PCB cutout is the same as the 1080 but with the rear plug on the P40 you will likely have to do some modding on a 1080 water block to make it work. The 40x40x28 15000rpm fans I have on there scream when at 100% so be prepared to write some custom code to control the fan speeds. If you are interested I'll push mine to GitHub and send you the link.
It seems the Chinese suppliers has noticed the higher buy rate and have inflated the prices accordingly. Now P40 and P100's are at $300, so it makes more sense just to buy 4060 ti 16GB as new.
Yep, I have noticed this trend as well. Demand is so high that even older hardware is now selling a premium. I agree with your assessment. The 4060 TI is probably a better choice now.
Great video^^, getting tired of seeing those gpu comparison video where all they think are just about gaming.
it pops up right in time when i'm thinking to build a new pc. I was thinking about buying the 4060 ti with its 16gigs vram to help me with my thesis research that I assume the 16gigs would be really helpful for the ML/DL(used to have 1660 with 6gigs vram and its horrendous XD) but also pretty good enough for my daily use such as streaming and editing. totally in a tight budget that i needed to squeeze a bit more to get that 470$(the price in my country rn) card or should i just wait for the rtx 50 series to come out later hoping the older gen price drop?
Hey there. Thanks so much for the kind word and the question!
I think the 4060 TI is a solid choice in general for a budget constrained build for a masters thesis project. Of course it depends on your exact use case, but baring working with larger LLMs I think this should be a great choice.
As far as waiting for the rtx 5000 series GPUs to come out, I would not really hold my breath for a huge price drop. I think even once the rtx 5000 do come out it will take a while for the prices of older GPUs to be substantial affected. If it was me, I wouldn't wait. I would just go ahead and buy.
Best of luck with you project! Hope this helps!
So would this mean I can't use a 7900XT to make AI meme pictures? I've actually been interested in the whole AI thing, even though I'm not a smart dude on tech. (I find it cool just because I can use my computer for something other than just gaming/streaming/video editing, but I'm going to try a resist a little against our developing AI overlords lol)
I know the tech is still developing, but I thought it would be cool to use AI to create a Vtuber model to stream with. (even if it came out bad, I thought it would be a fun little experiment to do for some views and laughs) However one of my hardest parts to upgrade, in my mind, was a GPU. I know AMD is a step or two behind Nvidia (My last card and current card is a 1070) but when it comes to price, it's hard to beat. I just didn't know if something like a 7900XT or -XTX would at least make up for it vs a 4070 TI Super in terms of AI generation. (I still have no idea what app to use to even make use of my GPU to even make stuff with AI)
Alright, enough rambling with the thoughts in my brain, I'll keep watching 👌
Hi there! Thanks so much for the comment.
So, my take here is the NVIDIA GPUs are going to be much easier to work with at this stage. I have head from some of my viewers that AMD GPUs can and do work. It is just a lot more of a pain to work around bugs and the learning curve is steeper. NVIDIA is more or less plug and play when it comes to AI/ML/DL, but that is also why you pay a premium. I guess what I would say is if you want the easier route or don't have time to do much trouble shooting NVIDIA might be a better way to go. However, from the sounds of it you are more partial to AMD GPUs and have many other workloads besides just AI. In your case, it may be a better idea to go with AMD GPUs because you will get better price for performance for all of your other workloads then deal with the pain of setting up you AMD GPUs for your specific AI use case.
The 7900XT is definitely a powerful card and can handle AI tasks, though you might need to use specific software or frameworks that support AMD GPUs, like ROCm. Creating a VTuber model sounds like an interesting project! I would recommend maybe starting with programs like DeepFaceLab for deepfake-style video or some stable diffusion flavors to generate images as a starting point. For generation, tools like Blender for 3D modeling could be helpful and for real-time animation you might could use VMagicMirror or VSeeFace which can utilize your GPU to bring your VTuber model to life.
Hope this helps!
I'd love to know your opinion of the modified Nvidia p102-100's with 10 gb of vram being sold for about $50-$60 on ebay since they have no display outputs. They are basically 1080 ti's with a bit of performance nerfing. they have no display outputs, but seem like they'd be ideal to plop into a system with an existing AMD Gpu just to give Cuda acceleration. or perhaps multiple cards?
I responded to you on the other comment, but in case you see this one first. Short answer. I think this would be a great way to go provided they work!
Sir which do you think I should take rtx 3060 12gb, rtx 3070 or rtx 4060 my budget only suits this gpu
Hi there. Thanks for your question!
I would definitely go with the RTX 3060 in this case. The extra 4GB of VRAM will make a ton of difference.
Best of luck!
Modern AI needs heaps of video ram. You shouldn't bother looking at any without 24gb
Hi there. I definitely agree. I think 24gb is a good happy medium between price and performance.
What’s your thoughts on Apple silicon, for example, would 32GB of unified memory on an M2 Max, be an equivalent of 24GB in a dGPU (and not considering a CUDA advantage)?
Hi there. Thanks so much for you question.
I actually had some ask a similar question yesterday. Lets dive in.
Apple's silicon, particularly the M2 Max, represents a significant shift in computing architecture. The concept of unified memory in Apple's design is quite innovative. Unified memory essentially allows the CPU and GPU to share the same memory pool, which can lead to more efficient use of resources.
Regarding your question about the equivalence of 32GB of unified memory to 24GB in a discrete GPU setup, it's not a straightforward comparison. In traditional setups, the CPU and GPU have separate memory pools, and data needs to be transferred between them, which can create a bottleneck. With Apple's unified memory, this bottleneck is reduced, as both the CPU and GPU can access the same memory pool directly. This can make the system more efficient, potentially allowing 32GB of unified memory to perform comparably or even outperform a 24GB discrete GPU setup in certain scenarios.
However, this doesn't mean it's superior in all aspects. For example, tasks heavily reliant on GPU performance, especially those optimized for CUDA (a parallel computing platform and API model created by Nvidia), might still perform better on a traditional discrete GPU setup. This is because CUDA has been around for a longer time and is extensively optimized for specific professional and scientific applications.
So to sum everything up, while 32GB of unified memory on an M2 Max might offer comparable performance to a 24GB dGPU in many use cases, the actual performance can vary depending on the specific applications and workloads.
@@rayf3244
@@rayf3244 Machine Learning rests on linear algebra. Massively parallel matrix algebra is what video cards do. (until we see neuromorphic chips in wide production, cheap and actually successful. Could just be a pipe dream (rnn's). The RTX 3090 with NVlink is the best bang for the buck and the old Tesla cards are the cheapest entry. 2-3090's give you 48GB and let you play up to llama 2 78b. No I can't afford two 3090's either let alone a couple of last generation a6000s. The Tesla P100's have 16GB and NVlink for a total or 32GB for less dollars. When PCI-e 5.0 boards come out NVlink won't be necessary (We are told) but until then we are pretty limited as hobbyists. NVidia is really the only game in town and they are focused on the Enterprise not us. AMD and Intel haven't invested in AI at this level and Apple isn't even in the game.
@@TheDataDaddidid you use ChatGPT answer this?
Hi, I am new to the idea of learning about ML/AI. I appreciate your video and am contemplating piecing together a budget friendly system to start learning with. In the past, I was able to purchase some used crypto mining rigs from a person that was getting out of crypto mining. I parted most of the systems out and made a profit, but I kept a couple crazy 8 GPU motherboards.
My first question is is there any restrictions that would prevent one from using multiple GPUs (more than 2 which seems more common), and my second question is there a certain GPU that would make sense from a budget standpoint where having multiple of them would be more beneficial than one or two standard GPUs. I would think a system running 8 x Tesla M40 having a total amount of 96GB of VRAM would be better than a system running 1 or even 2 3060s w/ 12GB or 24GB of VRAM.
I look forward to hearing your response if you find the time to respond, I appreciate your time in advance!
Hi there. Thanks so much for the great question. So glad that you have found this video helpful!
QUESTION 1
There are a couple considerations here:
1. Unless you shard or split the model itself you are going to be limited to whatever the smallest memory size is in your available GPUs to actually load and train models. So for example if you have 4 Tesla P40 GPUs that would be 24x4 GB of VRAM total, but in many cases with parallelism people end up using data parallelism by default and the model must be loaded onto each GPU. This speeds up training time because you are able to process more batches in parallel across all GPUs, but it does not allow you to load larger models.
There is also many ways to do model parallelism in which you break parts of you model across different GPUs and process the data in a pipeline like fashion. I have never actually had to do this so I cannot really get into particulars here, but it is definitely possible. From what I understand though this is much more involved because it require you to logical partition your model in a way that makes sense and assign layers or segments of it to the various available GPUs.
All of this is a long winded way of saying if your model is not to parallelize, you may be limited in term of the size model you are able to load and use even if you have many GPUs.
2. Every mother board has a maximum number of PCIE lanes that it can support so in some cases even if a mother board has slots for 8 GPUs it may not support them all at the full x16 lanes require for each GPU to utilize its full bandwidth.
This is okay in many cases because the GPUs will still work, but it will limit performance and may cause a bottleneck if you have a ton of data IO.
3. GPU form factor. This is a lot more important than I originally thought, but each GPU manufacturer may have slightly different dimensions and specs. For example, the RTX 3090 founders edition is actually physically quite different than the RTX 3090 Zotac I bought. The RTX 3090 founders edition might be able to fit in one of my servers, but the RTX 3090 Zotac was much larger and would not. While this is not a major consideration, you definitely do need to confirm that the GPU will fix in or on whatever chasis/mobo you are thinking about working with.
QUESTION 2
This question is also a bit murky to answer because it depends on what you will eventually be doing and the size of models you will be working with. That said. I will recommend what I would do.
If you want to fill up one of you 8 GPU mobos, I would suggest 8 Tesla P40 GPUs. They are ~ $200 a piece. They are great in terms of price to performance. Having this many GPUs also allows you to use an immense amount of data parallelism to train and test faster for models that will fit inside 24GB of VRAM. It also gives you the ability to split very large models across all 8 of you GPUs. This theoretically would be enough to earnestly start playing around with some the largest open source LLM models currently avaliable. In addition, you can section of subsets of you 8 GPUs for different models or training/testing different version of your models all at once. For me personal, I find the last point invaluable in my research. Finally, these GPUs are cheap enough to add them slowly over time. You can buy them one at a time as you have the funds rather than having to shell out thousands for a single GPU.
If you value performance over the flexibility mentioned above and have a bit more cash available. I would probably go with 2 RTX 3090s with NVLink. I have not been able to test this setup, but I think it should be excellent in terms of performance. You could also use the same idea here by adding RTX 3090s over time as you have the funds.
In summary, if you want a scalable mid - high end range rig where performance is you main concern. I would go with the dual RTX 3090s with NVLink. If you value flexibility and have less cash to invest upfront I think the P40 route is a great way to go.
Very excited to hear that you are starting your journey, and I am glad I can help you along the way. I hope this helps, and if you have any other questions along the way please feel free to reach out!
Excelent work!
Hi there. Thanks so much for the comment! Really appreciate your positive feedback!
Hi, I noticed that the Tesla v100 is around $1400. Where did you find the one is $670?
Hi there. All of my pricing is directly pulled from EBAY. I normally try to find the lowest cost reputable seller. GPU prices have been sky rocketing recently so that deal probably no longer exists unfortunately.
Which one is better, the Rtx 3060 12GB or 3060Ti?
Hi there. Thanks so much for the comment!
It really depends on your use case. If you plan on trying to expand to larger models like some of the diffusion related models or want to use larger batch sizes then 12 GB VRAM would be helpful. However, for most conventional deep learning models 8GB should be fine and you will get faster training and inference speeds. Personally, though I would probably go with the 3060 with 12GB. I normally always default to the GPU with the higher VRAM even if the performance is slightly worse. I would rather be able to load the model and just have training and/or inference be slower than get out of memory errors and not be able to load a model I want work with.
For my LLM Hobby 4070 is decent enough, wish Nvidiia had 16GB variant.
That would definitely be nice. Maybe with the 5000 series GPUs there will be something that fits that bill.
thank you so much for the info but I am so conflicted, the information in this video is great, but the pacing gives me severe anxiety. nothing against you it's my own issue, so this comment is not a complaint is to help others who possess an overactive brain and 0 attention span. watch the video at 1.75 speed while reading through the additional resources. This will cut the play time down to 23.1 minutes which I admit, is still a very long time based on the amount of information, but it is good information so its worth it! this should keep your brain from losing interest during the loooong pauses. again , great content
Hey there! Thanks so much for the comment. I really appreciate the honest feedback. I have been trying to get better about being more direct and to the point in my videos. I know that some of them are unnecessarily long. My assumption is that most people would likely watch on faster speeds or hop around rather than listen to me drone on and on. I do agree though I need to do a better job of keeping the audience engaged and make shorted videos with less long pauses. I appreciate the candid feedback, and I will work to do better on this in the future. Glad you at least thought the content itself was good though!
@@TheDataDaddi thanks for the response! believe me there was no hate, keep up the good work!
@@Travis-jl3wx Of course man. I did not take it that way at all. I really appreciate the honest feedback. Something I just got to work on.
Hey man, do you know of any online resource on the NVIDIA T40? Planning on using it to serve LLM but I see nothing about it online.
Hey there. So I was not able to find anything official from Nvidia, but I did find this online:
www.techpowerup.com/gpu-specs/tesla-t40-24-gb.c3942
Might not be as reliable as actual documentation from Nvidia though so I would take it with a grain of salt.
From what I can tell though its a solid card, but a bit expensive for the performance. I would compare it to the P40 before going that route to see if the price justifies the performance gains for your use case.
@@TheDataDaddi I didn't buy the T40 due to the lack of documentation. Can't even find the driver for it online. I ended up buying P100s as the P40 can't do gpu-only inference which is what I need.
Gotcha. Yeah I think that is a solid choice for sure. Sucks the documentation is so sparse for the t40 though@@wood6454
what yould you recommend one rtx 4070 ti super or two rtx 4060 ti 16 Gb for llm inferencing? i know there are a lot of things to consider
Hi there. Thanks so much for the question!
So, in terms of theoretical performance the 4070 ti super is about twice as performant. Since you are just interested in inferencing, I would say that in this case having the better performance of the 4070 will benefit you more.
@@TheDataDaddi thanks a lot for you answer. I think its worth the better performance since i dont get any performance benefits of having two rtx 4060 ti besides more memory.
In case i do some training ate some portion i can still rent some hardware.
I my case i want to have a local llm for work where i can not use anything connected to the internet. So i would use it primarily for inference and if i need to optimize the model this should only occur one ore twice (hopefully)
@@pixelslayertv7140 Sure! Yeah unfortunately since the memory pools for both 4060s would be separate you really don't get much benefit even having more total VRAM. You may be able to get away with fine tuning some of the smaller open source LLMs especially if you look into quantization. However, my gut tells me you will have a hard time doing much beyond that with 16GB VRAM. Like you said though you could always rent for the few times you do need access to more VRAM.
sweet video!
Hi there. Thanks for the comment. So glad you enjoyed it!
Hi there. Can you suggest me midrange or budget cpu and gpu for ai/ml/dl pc.
Sure! What all are you looking to do specifically in the ai/ml/dl space?
I'm studying artifical intelligence and Machine learning engineering in india.
Is intel i5 14600k and rtx 4060ti 16/8gb good, if not so suggest some😊. My brother is also studying Machine learning and deep learning. Can you suggest some laptops for him. I need a pc and he wants a laptop.
Got it. The I5 is a solid choice and so is the 4060 ti 16GB. I would not get a GPU these days for AI/ML/DL less than 12GB. That would be my minimum.
Also for AI/ML/DL I would really not recommend a laptop. I think have a server or workstation to ssh or remote access via a laptop is fine. However, I do realize as a student a laptop makes a lot more sense.
I will try to find a good solution for your brother. My first though its one of the Mac M3 Pros. These are really expensive though and I really can't speak on how good they are from a machine learning perspective. I have an M1, and it has actually been pretty good. Just has definitely been buggy from time to time for machine learning.
I actually just found this article. Tell your brother to check it out. It has 10 options that vary in price. He should be able to find something here that fits his budget. If it were me, I would probably start by checking out the Mac M3 Pro first though. As far as laptops go, I have always been impressed with Macs minus the price tag.
medium.com/@ibrahimcreative172/top-10-laptops-for-deep-learning-machine-learning-and-data-science-in-2023-f8a6ba861c4f
Hope this helps!
@@maamla_boy5208
Thank you so much.
Appreciate your efforts
Sure. Glad to help!@@maamla_boy5208
All things considered 2 X RTX 4060 TI 16GB is the best investment if you run Ai load which can utilize both CPU's. 8704 modern CUDA and 32GB are better specs than a 4070Ti 16GB. Memory is key here. The M40 are way to expensive even the used ones price is insanely high.
Hey there. Thanks so much for the comment. I think the 4060 TI is a great choice. However, all of these choices at the end of the day depend on current GPUs prices. In general though, I would definitely agree with you assessment!
what about amd radeon cards? they have much more vram and power by their price. cuda is no more a big deal after rocm and hip-rt features of amd.
It's funny how regularly I hear "oh don't bother with the Pascal cards, just buy an Ada Lovelace, just sink $1,200 on just the GPU, don't waste your time" every time I point out the two Pascal Tesla cards are the best bang for your buck right now, assuming you can cool them.
Minor errata btw, the Maxwell Tesla cards don't actually supply their complete VRAM capacity and CUDA count in 1 addressable device, but rather 2-4 smaller GPUs with 4-8gb of VRAM each, and most DL/SD/LLM applications don't scale linearly with CUDA cores spread across devices. Worse, many apps don't support anything older than Pascal anyways, meaning the effective usefulness gap will be even larger than your scoring system lets on.
Hey there! Thanks so much for the great information.
Yeah it is amazing to me how often people overlook the Pascals. I understand that they are old and definitely far from the state of the art, but they are still a great value for tons of compute. You can't argue with 16GB or 24GB of VRAM for ~ $200 Honestly, for most people starting out, they are more than enough for most use cases anyway. I use both the P40 and P100 almost everyday in my PhD research, and they are great for the money.
Ah okay. I was reading something about that a while back. That definitely makes sense. Thanks so much for sharing. Great info for anyone considering Maxwell series GPUs.
@@TheDataDaddi Appreciate your content by the way! Been binge watching your channel, there's a lot here for me to learn from, been a huge help in figuring out how to scale up from my little workstation rig in the future. (T7910, 2x E5-2699 v4s, 128gb of ECC DDR4, P40 w/ cooling duct in the overhead slot, a v5900 as a display adapter, 4x1tb NVMe drives in a Hyper M.2 carrier)
So glad you have been taking a lot from it. I primarily made this channel because of how hard it was for me to learn this stuff. There are not many channels that focus on how to cost effectively create home labs or work stations specifically for machine learning. Most that do focus on things that are out of the price range of most normal individuals. Really glad to hear that it has been helping you.
Great setup btw the way! Wonderful place to start. Keep me updated on how the scaling goes! Be curious to see what you do.
@@KiraSlith
@@TheDataDaddi Current plan is to move the P40 down to the main chamber, add a second P40 below it, and cool the P40s by pulling air through the back via a high CFM 120mm fan, rather than pushing through the front. I'd have to move the v5900 up to the top CPU1 slots though. I know you couldn't attach GPUs to CPU1 with the old T7800, but I've never tried with the T7910. The NVMe drives will be just fine though, thankfully.
Just looked at the layout for the T7910. Seems like a good plan and that looks like it should work from what I can tell. Definitely agree adding a fan to pull through the back is a good call. Let me know how it ends up working! @@KiraSlith
omg so detailed.. thank you for your time.
Hi there. Thanks so much for you comment. So glad that the content was useful for you!
29:11 haha, that's the exact card I am looking at. Comparing it with ARC a770 actually.
Hi there. Thanks so much for the comment!
I have also been interested in non NVIDIA solutions. The ARC GPUs have certain interested me. However, I would caution you. If you leave the NVIDIA ecosystem, it is like going into the wild west so just make sure you are prepared. Here is a Reddit thread that might shed some light.
www.reddit.com/r/MachineLearning/comments/z8k1lb/does_anyone_uses_intel_arc_a770_gpu_for_machine/
If you do decide to go the ARC route, please let me know how it goes for you. I would be super curious to better understand where those GPUs are in terms of AI/ML/DL applications.
Really good video but had to play at 1.75X 😅
Hi there. Thanks so much for the feedback! In the future, I will work on keeping things more concise.
Hey bud, how can I DM you for some Qs?
Hey man. I really appreciate you reaching out. I am going to make an X or threads or similar account soon to become more reachable. For now though, just shoot me an email at:
skingutube22@gmail.com
amazing! thanks.
Glad you liked it!
Can I get access to this spreadsheet
Hi there. Thanks so much for reaching out.
You are should be able to download a local copy and change it in anyway that you see fit. I acknowledge that the spreadsheet needs to be updated. I am working right now on a website actually that lists GPU specs and then keeps track of historical price trends. So all of the information in the spread sheet and more should be available updated on a daily basis soon.
Unfortunately, I am not comfortable giving you direct editable access to the original version in the Google drive. I apologize. It is nothing personal. I just don't know you well enough.
damn thats the analysis
Can we emulate GPU with USB?!!
Hi there! Thanks so much for the comment.
I am sorry though. I am not sure I understand what is being asked here. Could you give a little more context?