Poisoned Pickles Make You Ill - Adrian Gonzalez-Martin, Seldon

Deploying Apache Kafka and Zookeeper on Kubernetes using Portworx Data Services

New methods of recon with OrwaGodfather

Bike Vs Tricycle Fast Challenge

คดีครูเบญ ตรวจผิด หรือ ทุจริต!? สั่งเอาผิดทั้งจังหวัด!! l คนดังนั่งคุย

ถ้าเพลงกีฬาสีมีท่อนเดียว

Keep HPC Running - an SRE's Guide to Supporting GPUs on Kubernetes - Christopher Dutra, JP Morgan

CNCF [Cloud Native Computing Foundation]

มุมมอง 597

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 20 ก.ย. 2024
Keep HPC Running - an SRE's Guide to Supporting GPUs on Kubernetes - Christopher Dutra, JP Morgan
Operating a traditional Kubernetes cluster requires specific knowledge about telemetry, observability, and what criteria are considered to need human intervention in restoring service. While general (CPU-only) compute paths are well known, introducing GPUs into the fleet of nodes presents additional challenges to "day 2" operational practices, and specific attention must be drawn to how these resource pools are supported. The growing expectations from HPC and AI use cases present further challenges as customer expectations of Generative Pre-trained Transformers (GPTs), machine learning, and quantitative modeling practices continue to elevate. This presentation provides best practices into what metrics should SRE teams incorporate into their armada of operational tools to support High-Performance Compute workloads on Kubernetes. As a working example, this presentation will explore custom plugin monitors for the Kubernetes node-problem-detector daemon, interacting with NVIDIA’s open-sourced DCGM and NVML bindings. Additionally, this talk will review metrics exposed by NVIDIA’s DCGM-Exporter to Prometheus, highlighting their operational importance to the health of both the cluster and the workloads running on top.

ความคิดเห็น •

ต่อไป

เล่นอัตโนมัติ

Poisoned Pickles Make You Ill - Adrian Gonzalez-Martin, Seldon

Poisoned Pickles Make You Ill - Adrian Gonzalez-Martin, Seldon

Deploying Apache Kafka and Zookeeper on Kubernetes using Portworx Data Services

Deploying Apache Kafka and Zookeeper on Kubernetes using Portworx Data Services

New methods of recon with OrwaGodfather

New methods of recon with OrwaGodfather

Bike Vs Tricycle Fast Challenge

Bike Vs Tricycle Fast Challenge

คดีครูเบญ ตรวจผิด หรือ ทุจริต!? สั่งเอาผิดทั้งจังหวัด!! l คนดังนั่งคุย

คดีครูเบญ ตรวจผิด หรือ ทุจริต!? สั่งเอาผิดทั้งจังหวัด!! l คนดังนั่งคุย

ถ้าเพลงกีฬาสีมีท่อนเดียว

ถ้าเพลงกีฬาสีมีท่อนเดียว

มหาเศรษฐีแต่งงานกับภรรยาคนสวย 5 ปี แต่เธอไม่เคยปล่อยเขาแตะต้องสักครั้ง

มหาเศรษฐีแต่งงานกับภรรยาคนสวย 5 ปี แต่เธอไม่เคยปล่อยเขาแตะต้องสักครั้ง

Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes - Kevin Klues, NVIDIA

Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes - Kevin Klues, NVIDIA

Scaling AI Workloads with Kubernetes: Sharing GPU Resources Across Multiple Containers - Jack Ong

Scaling AI Workloads with Kubernetes: Sharing GPU Resources Across Multiple Containers - Jack Ong

Do NOT Learn Kubernetes Without Knowing These Concepts...

Do NOT Learn Kubernetes Without Knowing These Concepts...

Kubernetes Services networking

Kubernetes Services networking

GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem

GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem

Everything you Need to Know about using GPUs with Kubernetes - Rohit Agarwal, Google

Everything you Need to Know about using GPUs with Kubernetes - Rohit Agarwal, Google

Ex-Google Recruiter Reveals 8 Secrets Recruiters Won’t Tell You

Ex-Google Recruiter Reveals 8 Secrets Recruiters Won’t Tell You

I built an app using a single index.php file, here's how it went

I built an app using a single index.php file, here's how it went

What are Cilium & Hubble - With Thomas Graf

What are Cilium & Hubble - With Thomas Graf

คลิปช่วยน้ำท่วมเชียงราย 2024

คลิปช่วยน้ำท่วมเชียงราย 2024

Time to test the speed. #cosplay #joker#Harriet Quinn

Time to test the speed. #cosplay #joker#Harriet Quinn

เมื่อพระนารายณ์ส่งนักเรียนไทยไปเรียนที่ฝรั่งเศส #ศิลปวัฒนธรรม #SilpaMag #OneMinuteHistory

เมื่อพระนารายณ์ส่งนักเรียนไทยไปเรียนที่ฝรั่งเศส #ศิลปวัฒนธรรม #SilpaMag #OneMinuteHistory

มหาเศรษฐีแต่งงานกับภรรยาคนสวย 5 ปี แต่เธอไม่เคยปล่อยเขาแตะต้องสักครั้ง

มหาเศรษฐีแต่งงานกับภรรยาคนสวย 5 ปี แต่เธอไม่เคยปล่อยเขาแตะต้องสักครั้ง

การแข่งขัน RoV Pro League 2024 Winter | รอบเก็บคะแนน Week 6 Day 1

การแข่งขัน RoV Pro League 2024 Winter | รอบเก็บคะแนน Week 6 Day 1

#บุ๋มปนัดดา ถูกหมอสั่งห้ามลงน้ำลึก เสี่ยงมดลูกติดเชื้อ | Shorts Clip 2024

#บุ๋มปนัดดา ถูกหมอสั่งห้ามลงน้ำลึก เสี่ยงมดลูกติดเชื้อ | Shorts Clip 2024

Expected Ending?

Expected Ending?

HIGHLIGHTS : Persib Bandung (IDN) 0-1 Port FC (THA) | AFC Champions League TWO | 19.09.24

HIGHLIGHTS : Persib Bandung (IDN) 0-1 Port FC (THA) | AFC Champions League TWO | 19.09.24