Tips Tricks 16 - How much memory to train a DL model on large images

DigitalSreeni

มุมมอง 7 557

327

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 9 ก.พ. 2025
Rough calculation to estimate the required memory (esp. GPU) to train a deep learning model.
Code generated in the video can be downloaded from here:
github.com/bns...

ความคิดเห็น • 53

@KarthikArumugham 3 ปีที่แล้ว ⁺¹
Great explanation. I used to manage the image and batch size on an ad-hoc basis resulting in OOM errors. This tool now gives me a better way to anticipate memory requirements.
@DigitalSreeni 3 ปีที่แล้ว ⁺¹
Actual memory requirements depend on a few other factors but this calculation gives a rough estimate.
@jacobusstrydom7017 3 ปีที่แล้ว ⁺¹
Great explanation!! Incredible to see how big a single image can get once it's gone through the whole network.
@DigitalSreeni 3 ปีที่แล้ว
Exactly!
@AsaadFSaid 2 ปีที่แล้ว ⁺³
This is correct if you want to keep all output data in memory. In many cases, the output data is temporary (used once) and no need to keep it. In such case, you can create a buffer in to swap data in & out which save significant memory footprint....
@laohu1514 3 ปีที่แล้ว ⁺⁴
Great video as always! I learn more from you, than from most of my teachers at the graduate school 😅
@DigitalSreeni 3 ปีที่แล้ว ⁺¹
Not sure if I am happy to hear that or sad but I am glad you find these tutorials to be informative.
@behnoudshafizadeh3281 3 ปีที่แล้ว ⁺¹
this is an important video, when you decide to buy a GPU, you will usually have the challenge to select which of them.
Thanks Dear Sreeni
@DigitalSreeni 3 ปีที่แล้ว ⁺¹
You are absolutely right, this is the basic calculation you need to do while shopping for a GPU. In fact, I recommend signing up for colab pro for a month to see if 16GB works for you. If not, try Google cloud AI where you can subscribe to higher powered GPUs. Once you confirm that a specific GPU works for your application, do the math in terms of purchasing a GPU vs. subscribing via Google (or AWS or something else).
@behnoudshafizadeh3281 3 ปีที่แล้ว ⁺¹
@@DigitalSreeni thanks for replying Mr.Sreeni,,you have been making very important course,I will have been appreciating you.
Best wishes,
Behnoud from 🇮🇷
@diogosilva3152 3 ปีที่แล้ว
Such useful information in a relatively short video! Amazing!
@DigitalSreeni 3 ปีที่แล้ว
Glad you think so!
@diogosilva3152 3 ปีที่แล้ว
@@DigitalSreeni btw I assume this would not work for models that contain transfer-learning. For example a model with VGG-16 as backbone and then some FC layers. When we call model.layers() we don't have the option to expand the nested layers. Please correct me if I'm wrong
@abubakrshafique7335 3 ปีที่แล้ว
Best Channel for Information on DL and AI.
Thank you
@DigitalSreeni 3 ปีที่แล้ว
I appreciate that!
@schwatzgelber8444 3 ปีที่แล้ว ⁺²
Thank you very much for this video. I have a question and maybe you have the time to answer it:
On the slide at 8:38, you said that the backward pass consumes a similar amount of memory as the memory requirements of the feature maps from the layers. Consequently, shouldn't there be a factor of two applied to "features_mem" in your function that approximates the total memory needed for the neural net?
The required memory of the parameter gradients is also neglected, isn't it? Independent of the optimizer a factor of two applied to the "parameter_mem_MB" should be reasonable since the gradient must always be calculated. Please correct me if I am wrong! In addition to that the memory requirement for the momentum that is used by some optimizers would require even a factor of three applied to the "parameter_mem_MB" as the information about previous iterations is needed for the momentum calculation.
Most of this is mentioned on the slide at 8:38 but not considered in the "get_model_memory_usage" function. Maybe you can give some feedback on this. That would be really nice.
@ganapathyshankar2994 3 ปีที่แล้ว ⁺¹
Thanks Mr Srini for this video. Very useful
@khushpatelmd 3 ปีที่แล้ว ⁺¹
Wow. Simply amazing.
@DigitalSreeni 3 ปีที่แล้ว
Glad you like it!
@dimitrisspiridonidis3284 2 ปีที่แล้ว ⁺¹
you can use mixed presision with float16 this will bring the size down by half.
@aaronabelachelseafc 2 ปีที่แล้ว ⁺¹
Absolutely amazing video 👌🥰 is it possible to do a similar example for a GAN model? Such an example is the stack gan?
@vimalshrivastava6586 2 ปีที่แล้ว ⁺¹
Very informative video. I have a query. I am trying to train standard UNet model on 512x512x3 dataset with batch_size=4 on RTX 3090 with 24GB GPU. I am getting memory full error. Please help me to resolve this issue.
@vimalshrivastava6586 ปีที่แล้ว
Thank you for the response.
@vimalshrivastava6586 ปีที่แล้ว ⁺¹
@@MARTIN-101 Actually, there were some issue with RAM. I have replaced it & now I am able to run it.
But if i increase the dataset or model parameters, then i have to do it in two steps.
(i) Train the model, save the weights & restart the kernel.
(ii) load trained weights and test.
@ashwinig5160 3 ปีที่แล้ว
Thank you very much for your knowledge sharing sir...
Sir can I use nvidia Jetson developer kit for training deep learning python codes ?
@haqkiemdaim 3 ปีที่แล้ว ⁺¹
I recently learn a lot from your youtube tutorial sir! Anyway im working on some rnd regarding human segmentation (black and white) and make your tutorial as my guide.
Just want to ask, what would make a good segmentation dataset ya sir? Is it the quality? Variety of human body?
@gomathig6128 3 ปีที่แล้ว
Respected sir for my cardiac mri segmentation work ,i used unet architecture , for that training i have taken in the ratio (80,100).The total images are nearly 900 but in the output only 224 images are displayed . Can u please explain me?
@kamilchodzynski9395 3 ปีที่แล้ว
Hi, nice video, however have you ever considered or tried to adapt Gradient accumulation ? Even if your batch does not fit to your GPUs memory in Tensorflow there are technics to cut your big batch into mini batches. Maybe it would be worth to try it and make the video about it?
@evyatarcoco 3 ปีที่แล้ว
A great episode!
@DigitalSreeni 3 ปีที่แล้ว
Thanks
@rajeshdhanda5913 3 ปีที่แล้ว
i didn't get the "21 times" @ 2:31. Can you please elaborate.?
@DigitalSreeni 3 ปีที่แล้ว ⁺¹
I meant to say that your input image has 3 channels and if your feature maps size is 64, it is like copying your input image 64/3 times which is approximately 21 times.
@abdulla6741 3 ปีที่แล้ว ⁺¹
Thanks, Sreeni, I see a lot of machine learning learners recommend going to Google Colab Pro, what do you think about this?
@DigitalSreeni 3 ปีที่แล้ว ⁺²
I will probably record a video on the topic but the short answer would be - it depends. The free version of Colab is great but it disconnects frequently, you need to babysit the training. The pro version is a bit lenient when it comes to session time and gives you 24 hrs. So it may be worth it if you are annoyed by the constant babysitting of the free version. Although, I should mention that the pro version will not give you a lot more resources so it will not solve any large data problems, if that is your challenge.
@yifeipei5484 3 ปีที่แล้ว ⁺¹
Google Colab Pro is worth it if you cannot get RTX 3090 or Titan RTX or some GPUs better than these two. For Colab free users, you can only get Tesla P100 but for Pro users, you can probably get Tesla V100 GPU, which doubles the speed. Also for Pro version, the CPU memory is 25 GB but for the free version, the CPU memory is only 12 GB. Also for Pro version, you can use terminal, which is not existed in the free version. I train my GAN model on Colab Pro Tesla V100 GPU, it takes 200-210 seconds for each epoch (for free version Tesla P100 GPU, it takes about 350 seconds for each epoch) but if I train my GAN model on RTX A6000 (a $5000 GPU), it takes about 180 seconds for each epoch, only save about about 30 seconds for each epoch. I had already set the batch size to the maximum. Considering the price tags, you can see how Google is generous to offer us Tesla V100 on Colab Pro for only $9.99 a month.
@DigitalSreeni 3 ปีที่แล้ว ⁺¹
Thanks for sharing your experience Yifei. I should also add that with colab pro you have a good chance of getting 26GB RAM but it is not guaranteed and also depends on your location. Same goes for GPU, you have higher chances of getting better piece of hardware compared to the free version but it is not guaranteed. I got frustrated too many times with colab pro so I paid extra to rent a virtual machine. It was (and is) expensive but it makes things predictable and gets the work done. I wouldn’t spend that type of money for hobby coding, it was for real work.
@yifeipei5484 3 ปีที่แล้ว
@@DigitalSreeni Thank you!
@yifeipei5484 3 ปีที่แล้ว
@@DigitalSreeni Hello Dr.Screeni, just let you know Google launches Colab Pro+ ($49.99/month). I has upgraded my Colab Pro to Pro+. However, Colab Pro+ still gives me Tesla P100 not matter how many times I disconnected it.
@sumitbali9194 3 ปีที่แล้ว ⁺²
Thanks for this video. Super helpful. Can you please benchmark 3080ti vs 3090 for deep learning
@DigitalSreeni 3 ปีที่แล้ว ⁺³
There are 2 primary specs you need to consider when picking a GPU.
1. Memory: This is important if you work with large data sets. For example, if you'd like to work with large images or large batches. Please note that having small memory means you cannot work with large data, it stops you from getting the work done.
2. Processing speed: This is of course highly relevant, especially if time to results is important. Please note that having lower processing speed means your time to results is slow but you will get the work done. It will not stop you from getting the work done (generally).
Considering the above two dimensions you can now compare 3080ti and 3090.
The primary difference is in memory, 3090 comes with 24GB where as 3080ti comes with 12GB. This is a significant amount of difference. When it comes to processing speed, both cards have near identical specs for CUDA cores, tensor cores, etc.
In summary, can you get your work done with 12GB? Also, can you get away by using 12GB provided for free by colab or even $10/mo colab pro? If not, consider 3090.
@sumitbali9194 3 ปีที่แล้ว
@@DigitalSreeni Thanks for your reply. I will keep this in mind while making my purchase
@shriramjaju 3 ปีที่แล้ว
Thank you, this is really helpful.
@DigitalSreeni 3 ปีที่แล้ว
Glad it was helpful!
@SohailKhan-tt5eh 2 ปีที่แล้ว
Thank you very much for this video
@DigitalSreeni 2 ปีที่แล้ว
You are welcome
@yifeipei5484 3 ปีที่แล้ว
Thank you! But I feel the real situation is more complex than it. I do GAN research in my PhD program. I use the same model, same codes, same image dataset and same batch size on Colab Tesla V100 (16GB VRAM) and RTX A6000 (48 GB). On Colab Tesla V100, it takes about 16 GB VRAM. But on RTX A6000, it takes more than 30 GB. So I think the number of CUDA cores also affect the memory since more CUDA cores will do more parallel computing. RTX A6000 has about 10, 000 CUDA cores and Tesla V100 has about 5,000 CUDA cores.
@DigitalSreeni 3 ปีที่แล้ว
I am not sure how number of cores affect the storage memory, may be I am missing something. Of course, it impacts the speed for calculations and may offload some overhead storage but the ability of the GPU to work with some data primarily depends on the model, batch size, and type of data (float, int, etc.). In any case, for learning purposes you need about 12GB GPU and for serious work you need at least 32GB.
@khushpatelmd 3 ปีที่แล้ว
I don't think so number of CUDA cores in any way affects the model size calculations.
@yifeipei5484 3 ปีที่แล้ว
@@khushpatelmd But my experiments shows it does.
@yifeipei5484 3 ปีที่แล้ว
@@DigitalSreeni Something should be missing here. My GAN is for image compression so the generator has encoder and decoder. I calculated the size of outputs of the encoder, and it gave 65.75 GB for a singe image. However, for a Colab Tesla V100 (16GB) GPU, I can set the batch size to 8 for my whole GAN model. Based on my previous experiments, which I had mentioned in my comment, I think GPU dynamically record the output images in the memory in order to save the memory, not record all output images at once. Your method calculates the upper bound of the GPU memory the model uses. But thank you for your effort again.
@falahfakhri2729 3 ปีที่แล้ว
Very much interesting stuff, thanks a lot, but as usual the code isn't available within your amazing github stuff.
@user-wr4yl7tx3w 2 ปีที่แล้ว
Idea. Call graph on memory.
@jiuyi7273 ปีที่แล้ว
so I think for most graduate students who do reserch about image in DP.Their lab should basically provide >=24 gb gpu.And i am starting to apply now😂single 4070 totally not work

ต่อไป

เล่นอัตโนมัติ

Tips Tricks 20 - Understanding transfer learning for different size and channel inputs