Serving Machine Learning models with Google Vertex AI

ML Engineer

มุมมอง 10 925

173

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 10 ก.พ. 2025

ความคิดเห็น • 37

@yessinekhanfir4157 14 วันที่ผ่านมา
This was suuuuper helpful, thank you so much
@markosmuche4193 6 หลายเดือนก่อน ⁺²
Your lectures are invaluable sir. Thanks.
@ml-engineer 6 หลายเดือนก่อน
Thank you 🫡
@linkawaken 2 ปีที่แล้ว ⁺⁷
Hi Sascha! I think a great differentiator between your videos and the official ones is your ability to be the official "unofficial guide" to ML engineering on GCP i.e. the most-trusted source with less fluff and more practice. Please continue mentioning tradeoffs, limitations etc. without reservation, as the official videos are not as quick to make these clear upfront.
Some content suggestions:
I would love to see a deep dive on Custom Prediction Routines that demo's more sophisticated use of the *preprocessing* of features, including how to manage dependencies appropriately (e.g. `setup.py`?) - I didn't see this covered in the labs / blog / notebook examples.
Perhaps an overall strategic view of Vertex AI - how to think of all the components as a set of Legos that can be pieced together. This isn't emphasized enough out there.
@ml-engineer 2 ปีที่แล้ว ⁺¹
Thank you so much for this amazing feedback.
Love your content suggestions around custom predictions routines and the overall overview of Vertex AI.
@linkawaken 2 ปีที่แล้ว
@@ml-engineer In terms of Custom Prediction Routines, what's not obvious and would benefit from a code example at least is how to address the preprocessing on the training side: CPR will facilitate the preprocessing for the prediction container (serving), but what about putting the training preprocessing into a container so that this can all be automated e.g. in a Pipeline? It seems a best practice would be to write the preprocessing once, and then make this available to both the training and serving containers, but CPR only addresses the latter, at least in docs. It is almost as if the expectation with CPR is that one would do their training in a notebook, but I am pretty sure that's not technically necessary.
@linkawaken 2 ปีที่แล้ว
@@ml-engineer Another note: you mentioned as a best practice to bake the model into the container, although I suppose a tradeoff would be that you have rebuild the container each time that the model is trained? This seems to be a tradeoff to consider depending on the expected amount of retraining vs autoscaling, and maybe some other things I didn't consider. For instance, consider a Kubeflow pipeline that retrains a small model prior to making the latest batch of predictions, doing this every time.
@ml-engineer ปีที่แล้ว
Yes it depends on the model size. If it is a small model few MB there is no need to embedd the model into the container. Though the additional effort to do that is just one line of code in your couldbuild.yaml even if model is small the additional impoementation efforts are small.
@samuellimabraz หลายเดือนก่อน ⁺¹
Thank you very much for your content, it is helping me a lot.
I’m currently deploying a model using a container with Nvidia Triton Server, but I’ve encountered the limitation you mentioned in the video: the 1.5MB maximum request size. Do you have any advice or potential workarounds for this issue?
@ml-engineer 24 วันที่ผ่านมา ⁺¹
Pass a path to a Google cloud storage bucket and load the file from within the serving container. Might be a bit more complex since you are using a container with Trition Server
@enricocompagno3513 ปีที่แล้ว ⁺¹
Nice video! What one needs to change to run batch prediction with a custom container?
@ml-engineer ปีที่แล้ว
No need to change anything you can use the same container. Google spins up distributed infrastructure to run predictions in parallel.
@ml-engineer 6 หลายเดือนก่อน
What have you defined as batch output BigQuery or CloudStorage check there. In case of errors you should get an error file there.
Logging with Batch Prediction jobs is unfortunatly not working out of the box. I wrote an article about that topic:
medium.com/google-cloud/add-cloud-logging-to-your-vertex-ai-batch-prediction-jobs-9bd7f6db1be2
@sumitvyas210 4 หลายเดือนก่อน ⁺¹
I have deployed registered model to private endpoint using custom vpc. How can i access the private endpoint from my laptop?
@ml-engineer 3 หลายเดือนก่อน
You can't that's the purpose of a private endpoint. If you want to access it from outside of your Google Cloud Project I recommend to use the public endpoint.
@bristobalpy ปีที่แล้ว ⁺³
Can you make a video implementing the third option ? Using custom prediction routines ?
@ml-engineer ปีที่แล้ว ⁺¹
Hi Christóbal
yes, you're not the first one asking for a custom prediction routine video. I start preparing it in th next couple of days.
@chetanmunugala8457 11 หลายเดือนก่อน
@@ml-engineer were you ever able to figure this out? i am frustrated with the fact that i cant take the exported model from vertex ai and run it locally
@ml-engineer 11 หลายเดือนก่อน
@@chetanmunugala8457 what exported model? Are you referring to an AutoML model?
@МихаилКороткин-ю3у 2 ปีที่แล้ว ⁺²
Hi! Sascha, thank you very much for the video, it was very useful for our team! Can I ask a question about pricing the you've mentioned? Does the price from video only include custom container functionality or it is total price for any AI solution which could be made with trained vertex model? - because from my understanding, in order to use vertex model (for image recognition, for example) you should deploy your model to google endpoint (which will cost you 2$ per 1hour which means - around 1500$ per month ). Am I right?
@ml-engineer 2 ปีที่แล้ว
Hi Михаил,
the costs I mentioned in the video are only for the custom container. If you use other ML products on GCP you have a different pricing. I assume you are refering to the AutoML capabilities of Vertex AI? Those have a dedicated pricing model. cloud.google.com/vertex-ai/pricing#automl_models. For example AutoML Object Dectection costs around $2.002 USD per node hour, this sums up to your calculated $1500 thats correct.
@Simba-qm5qs 6 หลายเดือนก่อน ⁺¹
Is it necessary to go for Cloudbuild ? Can I just create a GitHub action to achieve this ?
@ml-engineer 6 หลายเดือนก่อน ⁺¹
Hi Simba
Two things are necessary with those custom containers.
1. You need to build the container. This can be done locally, cloud build but also with GitHub actions. You are fully flexible here.
2. The docker image needs to be uploaded to Google cloud container registry / artifact registry.
@Simba-qm5qs 6 หลายเดือนก่อน
@@ml-engineer many thanks 🙏
@RedoineZITOUNI 9 หลายเดือนก่อน ⁺¹
Hi Sascha, thanks for your video :) Just one question. Does a custom model written by inheriting sklearn.base.BaseEstimator fit to this use case ?
@ml-engineer 9 หลายเดือนก่อน
Yes since it's a docker container it works with anything no limitations.
@BotnitivePrivateLtd ปีที่แล้ว ⁺¹
The build fails for the service account with an error "...access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)"
@ml-engineer ปีที่แล้ว
Did you check if the service account has indeed the right permission and the bucket is existing, and also check if you are referencing the correct project.
@StefanMićić-r6c ปีที่แล้ว ⁺¹
Can you give an example of creating an endpoint with GPU?
@ml-engineer ปีที่แล้ว
I go into it in my article
medium.com/google-cloud/serving-machine-learning-models-with-google-vertex-ai-5d9644ededa3
And here is also a code example that uses a serving container with GPU
github.com/SaschaHeyer/image-similarity-search
It pins down to the base image, which needs to support GPUs. This could be either for example the TensorFlow GPU image or pytorch what ever you prefer. For example
gcr.io/deeplearning-platform-release/pytorch-gpu
Regarding the deployment you only need to add an accelerator
!gcloud ai endpoints deploy-model 7365738345634201600 \
--project=sascha-playground-doit \
--region=us-central1 \
--model=8881690002430361600 \
--traffic-split=0=100 \
--machine-type="n1-standard-16" \
--accelerator=type="nvidia-tesla-t4,count=1" \
--display-name=image-similarity-embedding
@spadecake ปีที่แล้ว ⁺¹
Hello great video.
I have been overlooking the models/schemata step so now I'm figuring out that the parsing of request & response was incorrect. Any technical details available on this, is it a feature from autoML?
Another solution would be to export the schemata from openapi.json to yaml and provide it at model upload. Not tried yet though.
@ml-engineer ปีที่แล้ว
Thank you.
Are you referring to the format that the request and response need to follow?
@spadecake ปีที่แล้ว
@@ml-engineer Yeah so based on a standard application, route functions needed to be modified alongside with models. I wasn't rigorous enough to notice it from the start. in fact the changes are very simple.

ต่อไป

เล่นอัตโนมัติ

Recommendation Engine Pipeline with BigQuery ML and Vertex AI Pipelines using Matrix Factorization