53
20 874

GPU Sharing on GKE with Multi Instance GPU

5:16

Different ways of Running RayJob on Kubernetes

5:43

Simplify Kuberay with Ray Operator on GKE

3:48

GKE Multi Tenancy with Teams

14:34

Fleet Level Feature Management with Feature Manager

11:30

Build Internal Developer Platforms on GKE using GKE Enterprise

12:07

Improve Infrastructure Autoscaling with Custom Compute Classes in GKE

In this video we'll explore the new custom compute class feature in GKE, which improves infrastructure autoscaling with fall-back compute priorities and advanced node configuration in a new declarative, CRD-based API.
For more information see: cloud.google.com/kubernetes-engine/docs/concepts/about-custom-compute-classes
Blog Post: cloud.google.com/blog/products/containers-kubernetes/introducing-new-gke-custom-compute-class-api/

มุมมอง: 194

วีดีโอ

GPU Sharing on GKE with Multi Instance GPU

5:16

GPU Sharing on GKE with Multi Instance GPU

มุมมอง 8721 วันที่ผ่านมา

The A100 GPU and H100 GPU consist of seven compute units and eight memory units, which can be partitioned into GPU instances of varying sizes. Overview of all GPU Sharing Techniques on GKE: th-cam.com/video/Y7hdy-mFLlU/w-d-xo.html GKE Multi Instance GPU: cloud.google.com/kubernetes-engine/docs/how-to/gpus-multi NVIDIA Partitioning Table: docs.nvidia.com/datacenter/tesla/mig-user-guide/index.htm...

Different ways of Running RayJob on Kubernetes

5:43

Different ways of Running RayJob on Kubernetes

มุมมอง 9221 วันที่ผ่านมา

RayJob is a Kubernetes Custom Resource Definition that combines both the Ray Job (A packaged ray application) and the submitter (something that sends a Ray Job to a RayCluster). In this video Kent and Ali covers the different ways of running a RayJob on Kubernetes using Persistent or Ephemeral clusters.

Simplify Kuberay with Ray Operator on GKE

3:48

Simplify Kuberay with Ray Operator on GKE

มุมมอง 89หลายเดือนก่อน

KubeRay is a powerful, open-source Kubernetes operator that simplifies the deployment and management of Ray applications on Kubernetes. Ray Operator addon on GKE makes it super easy to install and setup Ray on GKE. In this video Mofi Rahman and Ali Zaidi walks through the steps to install Ray Addon on an existing cluster and setting up an Ray Cluster. Simple Command To Run Ray Job: ray job subm...

14:34

GKE Multi Tenancy with Teams

มุมมอง 59หลายเดือนก่อน

In this video Nick walks us through another GKE Enterprise feature, Teams.

Fleet Level Feature Management with Feature Manager

11:30

Fleet Level Feature Management with Feature Manager

มุมมอง 3332 หลายเดือนก่อน

The Feature Management view lets you view the state of GKE Enterprise features for your fleet clusters. In this video Nick goes over what Feature Manager is and the key features you can enable in your fleet. Intro to GKE Enterprise: th-cam.com/video/sXcA-Zsxvvw/w-d-xo.html

Build Internal Developer Platforms on GKE using GKE Enterprise

12:07

Build Internal Developer Platforms on GKE using GKE Enterprise

มุมมอง 2423 หลายเดือนก่อน

In this video Nick Eberts walks us through how to build a Internal Developer Platform on top of GKE using GKE Enterprise. The topics covered in this video are, GKE Enterprise, Fleets and Rollout Sequencing. Learn About GKE Enterprise: cloud.google.com/kubernetes-engine/enterprise/docs/concepts/gke-editions Fleets: cloud.google.com/kubernetes-engine/docs/fleets-overview Rollout Sequencing: cloud...

Tips for Securing your Ray Cluster on GKE

8:37

Tips for Securing your Ray Cluster on GKE

มุมมอง 2253 หลายเดือนก่อน

As a distributed compute framework for AI applications, Ray has grown in popularity in recent years, and deploying it on GKE is a popular choice that provides flexibility and configurable orchestration. Learn how to secure Ray on GKE in this video. How to secure Ray on Google Kubernetes Engine: cloud.google.com/blog/products/containers-kubernetes/securing-ray-to-run-on-google-kubernetes-engine ...

5:35

Effective GPU Sharing Strategies in GKE

มุมมอง 2423 หลายเดือนก่อน

GPU sharing strategies allow multiple containers to efficiently use your attached GPUs and save running costs. GKE provides the following GPU sharing strategies: - Multi-instance GPU - GPU time-sharing - NVIDIA MPS Learn More: cloud.google.com/kubernetes-engine/docs/concepts/timesharing-gpus

Serving Gemma on GKE on TPU using Jetstream

4:32

Serving Gemma on GKE on TPU using Jetstream

มุมมอง 1433 หลายเดือนก่อน

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs In this video Mofi Rahman and Ali Zaidi walks through the process of deploying Gemma on GKE on TPU using JetStream. Find Gemma on Huggingface - huggi...

Improve Resource Obtainability (GPUs, TPUs) with Dynamic Workload Scheduler on GCP

8:04

Improve Resource Obtainability (GPUs, TPUs) with Dynamic Workload Scheduler on GCP

มุมมอง 2104 หลายเดือนก่อน

Dynamic Workload Scheduler is a resource management and job scheduling platform designed for AI Hypercomputer. Dynamic Workload Scheduler improves your access to AI/ML resources, helps you optimize your spend, and can improve the experience of workloads such as training and fine-tuning jobs, by scheduling all the accelerators needed simultaneously. Dynamic Workload Scheduler supports TPUs and N...

Reducing data pre-processing time by 95% using Ray

8:59

Reducing data pre-processing time by 95% using Ray

มุมมอง 1.5K5 หลายเดือนก่อน

In ML your model is as good as your data. Often this crucial step is overlooked. In this video Google Solutions Architect Shobhit and Kavitha walks through a reference solution where they managed to bring down data pre-processing of downloading over 20,000 data points each with multiple image URLs from 8 hour to just about 20 minutes using Ray workers. Ray is an open-source unified compute fram...

Serving Gemma on GKE using Nvidia TRT LLM and Triton Server

5:24

Serving Gemma on GKE using Nvidia TRT LLM and Triton Server

มุมมอง 7096 หลายเดือนก่อน

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Nvidia TensorRT is an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications. Triton Inference Server is an open source inference s...

Serving Gemma on GKE using Text Generation Inference (TGI)

5:43

Serving Gemma on GKE using Text Generation Inference (TGI)

มุมมอง 4456 หลายเดือนก่อน

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs). In this video Mofi Rahman and Ali Zaidi walks through the process of deploying Gemma on GKE using TGI serving engine. Find Gemma on Huggingface - huggingfa...

4:56

Serving Gemma on GKE using vLLM

มุมมอง 5296 หลายเดือนก่อน

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. vLLM is a fast and easy-to-use library for LLM inference and serving. In this video Mofi Rahman and Ali Zaidi walks through the process of deploying Gemma on GKE using vLLM serving engine. Find Gemma on Huggingface - huggingface.co/google Follow along the ...

Improve LLM accuracy and performance with Retrieval Augmented Generation

19:50

Improve LLM accuracy and performance with Retrieval Augmented Generation

มุมมอง 1.3K7 หลายเดือนก่อน

Improve LLM accuracy and performance with Retrieval Augmented Generation

Monitoring ML Training Platform using Kueue Metrics and Cloud Monitoring

6:53

Monitoring ML Training Platform using Kueue Metrics and Cloud Monitoring

มุมมอง 2117 หลายเดือนก่อน

Monitoring ML Training Platform using Kueue Metrics and Cloud Monitoring

8:18

AI/ML on GKE: 2023 A Year in Review

มุมมอง 1898 หลายเดือนก่อน

AI/ML on GKE: 2023 A Year in Review

Architecture of a ML Platform with Resource Sharing on Kubernetes

16:41

Architecture of a ML Platform with Resource Sharing on Kubernetes

มุมมอง 4908 หลายเดือนก่อน

Architecture of a ML Platform with Resource Sharing on Kubernetes

Serve LLM on Google Kubernetes Engine on L4 GPUs

16:51

Serve LLM on Google Kubernetes Engine on L4 GPUs

มุมมอง 4448 หลายเดือนก่อน

Serve LLM on Google Kubernetes Engine on L4 GPUs

3:53

Intro to Kueue

มุมมอง 7389 หลายเดือนก่อน

Intro to Kueue

4:06

Monitoring Batch Workloads on GKE

มุมมอง 8410 หลายเดือนก่อน

Monitoring Batch Workloads on GKE

8:12

Basic Job Patterns on Kubernetes

มุมมอง 22910 หลายเดือนก่อน

Basic Job Patterns on Kubernetes

3:11

Building a Batch Platform on Kubernetes

มุมมอง 22511 หลายเดือนก่อน

Building a Batch Platform on Kubernetes

What is HPC and Overview of a HPC architecture

4:37

What is HPC and Overview of a HPC architecture

มุมมอง 87211 หลายเดือนก่อน

What is HPC and Overview of a HPC architecture

Understanding Horizontal Pod Autoscaling

13:37

Understanding Horizontal Pod Autoscaling

มุมมอง 191ปีที่แล้ว

Understanding Horizontal Pod Autoscaling

Kubernetes Job YAML Fields You Should Know

8:34

Kubernetes Job YAML Fields You Should Know

มุมมอง 164ปีที่แล้ว

Kubernetes Job YAML Fields You Should Know

6:03

Create a Simple Web App with Java

มุมมอง 66ปีที่แล้ว

Create a Simple Web App with Java

Containerize Java Spring Boot Application

3:46

Containerize Java Spring Boot Application

มุมมอง 403ปีที่แล้ว

Containerize Java Spring Boot Application

2:00

Create a Simple Web App with PHP

มุมมอง 58ปีที่แล้ว

Create a Simple Web App with PHP

ความคิดเห็น

@hamzazahidulislam3490 6 ชั่วโมงที่ผ่านมา
Thanks for sharing this.
@ajithkumarspartan 7 วันที่ผ่านมา
Could you also share the code for this video ?
@ajithkumarspartan 7 วันที่ผ่านมา
I believe in the method you are using in each pod gemma 2b model is downloaded and it will increase the size of all 24 pods . So could we store it in S3 bucket or some storage service in Gcloud and reduce the size of pods ? Correct me if i am wrong
@ContainerBytes 3 วันที่ผ่านมา
For performance reasons we definitely recommend caching the model in some storage to save on bandwidth downloading the model over and over again. You should also look into GKE secondary boot disk as well to preload the container image, massively improving pod startup time for inferencing.
@ajithkumarspartan 3 วันที่ผ่านมา
@@ContainerBytes when u r free could you make a video on that like caching the model or storing in S3. Also MIG can be used only in high end GPUs like A100 etc... could u do some video on GPU sharing methods like time slicing in low end GPUs
@stephennfernandes 21 วันที่ผ่านมา
can i do this on the free gcloud tier with 300$ of free cloud credits ? i am ok with enabling billing but will the free tier suffice this ?
@ContainerBytes 21 วันที่ผ่านมา
If your project has the quota for it, you could run this for a few hours.
@starkpister1565 24 วันที่ผ่านมา
Very cool, cant wait to try these out!
@LukeSchlangen หลายเดือนก่อน
1:54 “I sort of trust Mofi” is an important security policy to live by. 😂
@ContainerBytes หลายเดือนก่อน
“Trust but verify”, Motto to live by.
@balakrishnag1707 2 หลายเดือนก่อน
thank you
@BigStupidTech 4 หลายเดือนก่อน
Honestly, this is channel has the best explanations I've come across.
@ContainerBytes 4 หลายเดือนก่อน
Glad you like them!
@user-lq2rn4de4r 4 หลายเดือนก่อน
It's grate been looking into GCP docs forever...... Please make another one on using it with GCE
@yoginchopda1515 4 หลายเดือนก่อน
Thanks for the great tutorial! Is there a github repo where we can find the code used in the tutorial? Specifically, k8s files
@rsrini7 4 หลายเดือนก่อน
Thanks for the video. I was under impression resource request is mandatory to decide auto scaling. Ex: if i set req cpu 100m and limit cpu 500m , average utilization is 50%, so, the 50% is mapped with request or limit ?
@g0t4 5 หลายเดือนก่อน
prefer to refer to what I am doing with it (ie the subcommand) and not speak kube-C-T-L otherwise :)
@g0t4 5 หลายเดือนก่อน
cube C-T-L 👍
@NameSingh-eg5ky 5 หลายเดือนก่อน
Can you give the link to notebook to practice
@stanrock8015 5 หลายเดือนก่อน
Please add links you mentioned
@ContainerBytes 5 หลายเดือนก่อน
Coming soon!
@stanrock8015 5 หลายเดือนก่อน
Great video. Will target Ray at next
@shezanbaig895 6 หลายเดือนก่อน
Hey there, i have finetuned Mistral Modela and I have also created TensorRT engine. I wanted to ask do I need preprocessing and postprocessing script or do I just need pbtxt file to serve it on Triton Inference server? Shall I need to follow what you did for gemma?
@AmitKumar-hm4gx 6 หลายเดือนก่อน
thanks for the video but a little disappointed that the future video never came :(
@ContainerBytes 5 หลายเดือนก่อน
Hello there, we actually have a few videos on the channel showing the containerization and pushing of images to Artifacts Registry. Will update the description to link these too. Thanks for watching.
@UltramaticOrange 7 หลายเดือนก่อน
I'll still pronounce it as, "cube cuddle" because that's where the Borg cuddle you: in a cube.
@user-er8iy5dg8b 10 หลายเดือนก่อน
I wrote my Cloud Engineer exam few months ago, and I am writing my Solutions Architect next week. I hope I pass.
@ContainerBytes 10 หลายเดือนก่อน
You got this!!!
@krzysiek5806 10 หลายเดือนก่อน
Hi thanks for this video! Is there a video where you push to artifactory?
@ContainerBytes 10 หลายเดือนก่อน
Are you asking about JFrog Artifactory? We don't have a video planned for artifactory at the moment. But there developer docs should have some guidance.
@NoumanArshad83 10 หลายเดือนก่อน
Good work guys. Love your very simple, easy and clean approach. Excited to wait for upcoming videos.
@ContainerBytes 10 หลายเดือนก่อน
Thank You! Glad you enjoyed it!
@alvardev07 ปีที่แล้ว
Cube C T L team!
@pavel.pavlov ปีที่แล้ว
Yes, that's a real problem in big projects
@aaronwanjala506 ปีที่แล้ว
I'm Cube C-T-L as well!
@bryangockley8570 ปีที่แล้ว
I pronounce it kubectl
@gelvezz23 ปีที่แล้ว
How is the correct or most economical way to run the rest api?
@ContainerBytes ปีที่แล้ว
Correct is hard to say. Most economical way probably would be to do something in Cloud Run. You get 2 Million requests per month for free. For most hobby to small projects 2 million request a month is more that what you can ever get to. In the case you get beyond you are already fairly successful as a business and the cost would be negligible. Cloud run is also easy (comparatively) to get started with.
@gelvezz23 ปีที่แล้ว
but is it necessary to create a virtual machine? or just with the mkdir command?
@ContainerBytes ปีที่แล้ว
Not sure I follow. We did not create a virtual machine. You might be talking about the cloud shell editor? Your local machine would work just fine if you have the necessary tools installed. Mainly the node runtime. I will add some links to setup local dev environments in the description.
@ayushKumar-9835 ปีที่แล้ว
Can you make these videos move beginer friendly? I don't have any idea about Google cloud
@ContainerBytes ปีที่แล้ว
Of course. Anything specific that was confusing? Like the commands I used or tools used?

Container Bytes

ความคิดเห็น