Amazing video. In min 5:25 how did you do to open the second bash in the console? I was searching for a long time and I can't find anything. Thanks and regards!
I am having a problem in the min 18:00 the model load is being killed all the time. I tried to "minikube config set memory 4096" but still having the same problem. Any idea? I've been looking for a solution for 3 hours and there is no way
really nice video. Would you see any benefit of using the deployment in a single node with M1 chip? I'd say somehow yes because an inference might not be taking all the CPU of the M1 chip, but how about scaling the model in terms of RAM? one of those models might take 4-7GB of RAM which makes up to 21GB of RAM only for 3 pods. What's you opinion on that?
Glad you liked the video! Honestly, I filmed the video on my M1 using minikube mostly because of convenience. But on real projects I have always worked with K8s clusters that had multiple nodes. So I cannot really advocate for the single node setup other than for learning purposes.
@@mildlyoverfittedgot it. So, very likely more petitions could be resolved at the same time but with a very limited scalability and probably with performance loss. By the way, what are those fancy combos with the terminal? is it tmux?
excellent!! I'm curious why my search always shows garbage and videos like this never come up. This was suggested by Gemini when I asked a question about ML model deployment.
@@mildlyoverfitted thankyou, i use the "--device" flag of transformers-cli to enable GPU. And I found that serving app takes up almost gpu memory and no compute power. Whatever, thankyou for your video!
the reason you got . , ? as the output for [MASK] because you didn't end your input request with a full stop. Bert Masking Models should be passed that way. "my name is [MASK]." should have been your request.
Always a pleasure to watch someone as talented as you! Keep it up :)
Wow, much appreciated:) Thanks:)
Brooooo this was so good.
Glad you liked it!
Welcome back, we missed you!
Hehe, thank you! Nice to hear that:)
I agree!
Great example. Thanks for the information
My pleasure!
OH !!!!! Glad to meet you again !!!!
Glad you are here:))
Thank you for detail tutorial!
But torchserve now has kubernetes intergration
I will definitely look into it:) Thank you for pointing it out!!
Really helpful for foundation on ml ops
Glad to hear that!
great video thanks a lot really liked the explanation !!!.
Glad it was helpful!
he is back 🎉
Amazing video. In min 5:25 how did you do to open the second bash in the console? I was searching for a long time and I can't find anything. Thanks and regards!
Thank you! You need to install a tool called tmux. One of its features is that you can have multiple panes on a single screen.
@@mildlyoverfitted Thank you! Will dig in it now
You're great. Thanks for sharing this in such a nice way.
My pleasure!
Great video very informative.
Glad you liked it!
Thank you, it helped me a lot .
Happy to hear that!
Would appreciate a video using VScode to include docker contain files, k8s file and Fast API
I am having a problem in the min 18:00 the model load is being killed all the time. I tried to "minikube config set memory 4096" but still having the same problem. Any idea? I've been looking for a solution for 3 hours and there is no way
Hm, I haven't had that problem myself. However, yeh, it might be related to the lack of memory.
Cheers mate!
really nice video. Would you see any benefit of using the deployment in a single node with M1 chip? I'd say somehow yes because an inference might not be taking all the CPU of the M1 chip, but how about scaling the model in terms of RAM? one of those models might take 4-7GB of RAM which makes up to 21GB of RAM only for 3 pods. What's you opinion on that?
Glad you liked the video! Honestly, I filmed the video on my M1 using minikube mostly because of convenience. But on real projects I have always worked with K8s clusters that had multiple nodes. So I cannot really advocate for the single node setup other than for learning purposes.
@@mildlyoverfittedgot it. So, very likely more petitions could be resolved at the same time but with a very limited scalability and probably with performance loss. By the way, what are those fancy combos with the terminal? is it tmux?
@@davidpratr interesting:) yes, it is tmux:)
excellent!! I'm curious why my search always shows garbage and videos like this never come up. This was suggested by Gemini when I asked a question about ML model deployment.
very cool video!
Thank you! Cheers!
Hi, I would like to use GPU to accelerate this demo, can you give me some tips? Thank you
So if you wanna use minikube this seems to be the solution. minikube.sigs.k8s.io/docs/handbook/addons/nvidia/
@@mildlyoverfitted thankyou, i use the "--device" flag of transformers-cli to enable GPU. And I found that serving app takes up almost gpu memory and no compute power. Whatever, thankyou for your video!
What terminal application is this, with the different panels?
tmux
New video 🤩
Great!
Realy goood
👏👏👏
Great
Look forward to show your face alot :))
the reason you got . , ? as the output for [MASK] because you didn't end your input request with a full stop. Bert Masking Models should be passed that way. "my name is [MASK]." should have been your request.