How to Monitor a Kubernetes Cluster in 2022 with Prometheus & Grafana

แชร์
ฝัง
  • เผยแพร่เมื่อ 3 ก.ค. 2024
  • Today we take a look at the kube-prometheus stack. Create a Kubernetes 1.23 cluster and deploy all the monitoring components. We take a look at how everything is stitched up, which will help you understand not only how to monitor a latest version of kubernetes, but also how to monitor future version.
    Checkout the source code below 👇🏽 and follow along 🤓
    Subscribe to show your support! goo.gl/1Ty1Q2 .
    Patreon 👉🏽 / marceldempers
    Also if you want to support the channel further, become a member 😎
    marceldempers.dev/join
    Checkout "That DevOps Community" too
    marceldempers.dev/community
    Source Code 🧐
    --------------------------------------------------------------
    github.com/marcel-dempers/doc...
    If you are new to Kubernetes, check out my getting started playlist on Kubernetes below :)
    Kubernetes Guide for Beginners:
    ---------------------------------------------------
    • Kubernetes development...
    Kubernetes Monitoring Guide:
    -----------------------------------------------
    • Kubernetes Monitoring ...
    Kubernetes Secret Management Guide:
    --------------------------------------------------------------
    • Kubernetes Secret Mana...
    Like and Subscribe for more :)
    Follow me on socials!
    marceldempers.dev
    Twitter | / marceldempers
    GitHub | github.com/marcel-dempers
    Facebook | thatdevopsguy
    LinkedIn | / marceldempers
    Instagram | / thatdevopsguy
    Music:
    Track: JOURNAL - intercrime (freedownload) | is licensed under a Creative Commons Attribution licence (creativecommons.org/licenses/...)
    Listen: / inter-crime
    Track: Fox Beat 2 - Joakim Karud - Summer Vibes - Royalty Free Vlog Music [BUY=FREE] | is licensed under a Creative Commons Attribution licence (creativecommons.org/licenses/...)
    Listen: / joakim-karud-summer-vi...
    Track: Sappheiros - Passion | is licensed under a Creative Commons Attribution licence (creativecommons.org/licenses/...)
    Listen: / passion
    Track: souKo - souKo - Parallel | is licensed under a Creative Commons Attribution licence (creativecommons.org/licenses/...)
    Listen: / parallel
    Chapters:
    00:00 Intro
    01:28 Source Code
    02:18 Create a cluster
    03:33 Intro to kube-prometheus
    05:43 Downloading source code
    06:55 The manifests
    08:56 Deploying kube-prometheus
    11:21 Grafana
    13:51 Prometheus configuration
    15:53 Service Monitors
    16:39 Extras: persistence and remote write
    17:22 Outtro
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 82

  • @Mano-ii4ng
    @Mano-ii4ng 2 ปีที่แล้ว +6

    As always Big Thanks Marcel. You Sir, are a master at explaining K8s and other stuff.

  • @whiteorchid2023
    @whiteorchid2023 2 ปีที่แล้ว +4

    A big thank you, Marcel! 🙏🏻 Your hard work is much appreciated! 🥇

  • @wooja2112
    @wooja2112 2 ปีที่แล้ว +2

    As always, awesome video! Very helpful!

  • @niketsingh87
    @niketsingh87 11 หลายเดือนก่อน +1

    Thanks a lot Marcel. your video are loaded with useful content. No fillers.

  • @zenobikraweznick
    @zenobikraweznick 2 ปีที่แล้ว +2

    Quick, succulent, concise. 10/10 , thank you! 👍

  • @sharathkumar13_
    @sharathkumar13_ 2 ปีที่แล้ว +1

    Wonderful Video. This helps to learn a lot. Your support is awesome.

  • @loicgregoire3058
    @loicgregoire3058 2 ปีที่แล้ว

    This video is pure gold 😀 Thanks a lot

  • @rupeshegishte
    @rupeshegishte 2 ปีที่แล้ว

    Thank you so much marcel, this is very informative

  • @joross8
    @joross8 2 ปีที่แล้ว +11

    As someone who has been working in Kubernetes full time for the last 6 months, this video is a great learning source. Thanks Marcel.

    • @ikhlastitouche8057
      @ikhlastitouche8057 ปีที่แล้ว

      what does working with Kubernetes full time look like ?

  • @nadimkaddis7135
    @nadimkaddis7135 7 หลายเดือนก่อน

    A big thank you it is working .

  • @stefanschmidbauer6075
    @stefanschmidbauer6075 2 ปีที่แล้ว

    Great video. Already looking forward to a follow up.

  • @farzadmf
    @farzadmf 2 ปีที่แล้ว

    Another great video, thanks

  • @nforlife
    @nforlife 2 ปีที่แล้ว

    Great video keep it up

  • @DamienMalakay
    @DamienMalakay ปีที่แล้ว

    you definitely slowed down from the last time i saw your videos! before i had to change the playback speed but now it's perfect for me lol keep up the good work Marcel

  • @carlomagno55
    @carlomagno55 ปีที่แล้ว

    Muchas gracias por explicar tan claramente tantas cosas indispensables para entender cómo trabajar con estas tecnologías tan nuevas, complejas e impresionantes

  • @halobolah3240
    @halobolah3240 2 ปีที่แล้ว

    great video, THX !!! :)

  • @silentwatcher13
    @silentwatcher13 2 ปีที่แล้ว +13

    Hey marcel, great video, really appreciate your efforts 👍🏼👍🏼, i think you should take this prom setup to next level by creating a detailed video on Thanos and it’s components( humble suggestion). 🙂

  • @Fayaz-Rehman
    @Fayaz-Rehman 2 ปีที่แล้ว

    Good Better Best - Never Let Them Rest - Thank you.

  • @shaheerzaman620
    @shaheerzaman620 ปีที่แล้ว +2

    This was fantastic! It would be great if you could do a video on how to integrate Datadog with kubernetes

  • @kalapanirdip
    @kalapanirdip 2 ปีที่แล้ว

    thank you

  • @gunjanlahoti7694
    @gunjanlahoti7694 2 ปีที่แล้ว

    Hey , love the way you explain . Just wanted to check are there any plans to create a video on Node problem detector setup and usage?

  • @shinebayar
    @shinebayar 2 ปีที่แล้ว +1

    Kube Prometheus is awesome project. I wish they maintained official Helm chart. Working with vanilla yaml files isn't scalable in the long term.

  • @yasiryasin7233
    @yasiryasin7233 2 ปีที่แล้ว

    If you could make a video about monitoring Kubernetes using Prometheus without helm or Prometheus operator would be appreciated, just to know how Prometheus monitors without CRD.

  • @andrewfigaroa7031
    @andrewfigaroa7031 2 ปีที่แล้ว

    Thanks Marcel, this video helped to get prometheus installed. But now how do I monitor my custom apps in their namespaces? I would like to start with just a basic scraping.

  • @shubhamkanwal8977
    @shubhamkanwal8977 2 ปีที่แล้ว

    Great video! One question , does these manifests needs to be deployed on same namespace as our kubernetes cluster, if not then how does it know to look into namespace into which kubernetes resources are?

  • @tadeubernacchi3360
    @tadeubernacchi3360 ปีที่แล้ว

    Nice video. What if I don't want to use port-forwarding? What are my options?

  • @sticksen
    @sticksen ปีที่แล้ว

    Hey, what do you think of using the annotations on a Pod/Service/Deployment etc to tell Prometheus where and if it should scrape? Any downsides to this over the ServiceMonitors?

  • @nitishpriyadarshi2591
    @nitishpriyadarshi2591 2 ปีที่แล้ว

    could you please tell the promql expression to monitor POD cpu and memory utilization.

  • @ryanwendel6115
    @ryanwendel6115 ปีที่แล้ว +1

    Would be great to see a video on how you are using prometheus, grafana, and alert manager for visualizations and alerting. I am assuming you are customizing the system you generated in this video, to some degree.
    It would also be great to see how you are monitoring workloads. Is it just as simple as creating service monitors?

    • @MarcelDempers
      @MarcelDempers  ปีที่แล้ว +1

      I only customise verbosity of what metrics are sent out using Prometheus remote write. I dont store metrics in-cluster and send it to a hosted platform, so i dont have to deal with data retention, state and availability.
      Other than that no customisation is needed since all cluster metrics work out the box. Combine this with fluentd for app logs and developers have all they need to monitor. For custom metrics, only service monitors are needed, yes

    • @ryanwendel6115
      @ryanwendel6115 ปีที่แล้ว

      @@MarcelDempers thanks, man. Forgot to mention this was another great video!

  • @georgelza
    @georgelza 2 ปีที่แล้ว

    Love the video... making me think of changing my setup, this looks so much more stitched together...
    Any chance you can do a similar video, but where the Prometheus and Grafana is off the K8S cluster, imagine a environment with multiple K8S clusters, where Prometheus and Grafana is #1 on a dedicated cluster or where #2 Prometheus & Grafana are on a dedicated EC2 hosts.
    PS: you did not mention where the prometheus.yaml file is stored to inform Prom about off cluster targets.

    • @MarcelDempers
      @MarcelDempers  2 ปีที่แล้ว

      You'll need to run a prometheus instance in every cluster for the service monitors to work, and use remote-write to push telemetry out to your central EC2 Prometheus.

    • @georgelza
      @georgelza 2 ปีที่แล้ว

      @@MarcelDempers hi hi, was starting to think the same, was actually thinking Thanos might be a good way of pulling it together onto a central Prometheus/Grafana stack.
      With EKS and multiple AZ's the persistent storage etc of course becomes a much more interesting discussion,
      I got one cluster, spanning 3 subnets, A-App, B-Database, C-Management, and these are then spread over the 3 AZ's. so for the prometheus we want to pin them to Management in A, Management in B, and so on...

    • @farrukhbekshamsutdinov9022
      @farrukhbekshamsutdinov9022 ปีที่แล้ว

      @@MarcelDempers hi, can you share your experience on how to configure remote-write in prometheus inside k8s cluster and get metircs on another central prometheus?

  • @shaikahamadulla5623
    @shaikahamadulla5623 2 ปีที่แล้ว

    Can you share your system configuration of your demo machine. For smooth experience of trying and learning your lessons 🙂

    • @MarcelDempers
      @MarcelDempers  2 ปีที่แล้ว +2

      Nothing fancy, I have an Intel i9 CPU, any i7 would do.
      Memory is key, especially running bunch of containers, virtual machines and k8s clusters locally.
      I have 32GB which is more than enough.
      And SSD, i have a 1TB Samsung SSD (one of those tiny PCI-exp ones😁 )

  • @venkataramanareddy7735
    @venkataramanareddy7735 2 ปีที่แล้ว

    can you pls add one session on Grafana alerts for kubernetes

  • @KaykeTeixeira
    @KaykeTeixeira 2 ปีที่แล้ว

    Hey Marcel, thanks for the video!! With this solution on your source code i can visualize latency of the nodes on grafana?

  • @kanakorn.h
    @kanakorn.h ปีที่แล้ว

    Great, but how to I use Prometheus to monitor the servers outside the K8S cluster?

  • @pindajatt730
    @pindajatt730 2 ปีที่แล้ว +1

    Hey Marcel, I followed this guide to run prometheus & grafana in k3s with an ingress instead of port-forward and I did not have to do the datasources fix, it just works. Not sure if they have fixed it or because I'm running it in k3s.

    • @antoniorodrigo310
      @antoniorodrigo310 7 หลายเดือนก่อน

      Hey, new to k3s, could you share the ingress?

  • @lalchandrajak
    @lalchandrajak 2 ปีที่แล้ว

    Hey, Need one help where I can configure my slack channel to shoot alert messages and how I can bind port permanently.

  • @hereforyouwhat
    @hereforyouwhat 2 ปีที่แล้ว

    Hi Marcel
    What is difference Prom+Grafana vs Kubernetes Dashboard if we want to monitor k8s cluster?
    Or in which scenarios Prom+Grafana should be used over Kubernetes Dashboard

    • @MarcelDempers
      @MarcelDempers  2 ปีที่แล้ว

      The kubernetes dashboard mainly relies on basic API metrics used by HPA for example. They are highly aggregated and limited, I.E memory+cpu only.
      Prometheus solution provides way more in depth telemetry via kube-state-metrics

  • @xuantuongvu2020
    @xuantuongvu2020 2 ปีที่แล้ว

    I wonder that Which one in the monitor shows it can get the network metrics ? Cpu and ram is normal but how about the network ? How this monitor operator get the network metric?

  • @ThesGt
    @ThesGt 2 ปีที่แล้ว +1

    besides blackbox exporter is there an exporter that we can use to get the performance of the applications? im used to Dynatrace and NewRelic APMs to get latency of functions being called but im not sure what to use to get this data inside the containers apps

    • @MarcelDempers
      @MarcelDempers  2 ปีที่แล้ว

      It's a little more complex than dropping NewRelic in there and it's a different type of architecture. Take a look at Jaeger Tracing.
      th-cam.com/video/idDu_jXqf4E/w-d-xo.html

  • @mohamedelhoussein155
    @mohamedelhoussein155 ปีที่แล้ว

    why not use the helm chart for prometheus?

  • @georgelza
    @georgelza 2 ปีที่แล้ว

    hi hi, I deployed the 0.9 version ... my EKS cluster is still 1.21... so it seems the data source problem does not exist... ;) but now neither does any of the dashboards... any ideas? do yo maybe have a video that goes after this, that addresses how to add more selectors... trying to instrument a Python Flask app deployed on EKS. thanks

  • @georgelza
    @georgelza 2 ปีที่แล้ว

    ... 2nd ask... how do I add new targets... aka labels to scrape ? someone I missed this.

  • @antoniorodrigo310
    @antoniorodrigo310 7 หลายเดือนก่อน

    Hello everyone! Do any of you know if it's possible to run all these pods on the worker nodes?
    Thanks in advance

  • @helders
    @helders ปีที่แล้ว

    What do you consider to be the most useful/important metrics to be monitored in a K8s cluster?

  • @skchang2239
    @skchang2239 ปีที่แล้ว +1

    really helpful! Save me from tons of docs

  • @madrum
    @madrum 2 ปีที่แล้ว +1

    What, if anything, are other viewers using to store data collected by Prometheus? TIA

  • @bensiewert5286
    @bensiewert5286 ปีที่แล้ว

    When I try to do the port forwarding part, it just times out. Not sure why. It is a remote cluster on digital ocean.

  • @benhesketh6995
    @benhesketh6995 2 ปีที่แล้ว

    What's your approach for Windows nodes?

    • @MarcelDempers
      @MarcelDempers  2 ปีที่แล้ว +1

      I've dug into this before and Windows is a little tricky at the moment since the kubelet and kube-state-metrics do not provide similar metrics as linux pods. I.E they dont export the same metrics.
      However - metrics-server (used by autoscalers) does get CPU and memory stats for Windows pods. You can use metrics-server and look on github for a project called metric-server-exporter to export pod cpu and memory. Add a service monitor for that and you can make a custom dashboard for it.
      That should give you observability into windows workloads for now until better support lands

  • @YouTubers-rj9xv
    @YouTubers-rj9xv 2 ปีที่แล้ว +1

    Hey bro why don't you show grafana dashboard

  • @amazinggameplays2275
    @amazinggameplays2275 2 ปีที่แล้ว

    Good content.
    1 suggestion to put camera infront of you, so don't need to record at such weird angle.

  • @helders
    @helders 7 หลายเดือนก่อน

    My prometheus pods are using increasingly more RAM up to the point it kills OOMKilled and stop monitoring until i restart the pods, is there any way to minimize this? I was wondering if i add some persistency to the deployment it would do any good

    • @MarcelDempers
      @MarcelDempers  7 หลายเดือนก่อน +1

      As far as i know by default, Prometheus is an in-memory database and persistence is only to write data to disk to prevent loss during restarts
      You may need to research config options if its possible to offload some data to disk to reduce memory usage.
      Alternatively you can shard by running more Prometheus instances (one per namespace etc) to reduce it too

  • @robl39
    @robl39 10 หลายเดือนก่อน

    Please show us how to monitor ephemeral and short lived jobs with push gateway!

  • @JackReacher1
    @JackReacher1 2 ปีที่แล้ว

    17:10 Is Loki also a solution for persistent log writing?

    • @MarcelDempers
      @MarcelDempers  2 ปีที่แล้ว +1

      You can persist data in prometheus by using a persistent volume, you can also persist data in Grafana I believe (have not looked into it)
      Grafana Loki can definitely persist data too

    • @JackReacher1
      @JackReacher1 2 ปีที่แล้ว

      @@MarcelDempers I am learning Loki next.
      Can you make videos on how a python application can be monitored using Prometheus + Grafana + Loki?

  • @musclecode
    @musclecode 2 ปีที่แล้ว

    Why did you do just create and not server-side apply ?

    • @MarcelDempers
      @MarcelDempers  2 ปีที่แล้ว +6

      interesting! i just noticed that. Need to learn what it means. (never used server-side apply) Thanks for pointing it out. Learning something new every day

  • @amjds1341
    @amjds1341 2 ปีที่แล้ว

    Do you hv any udemy course ?

  • @gab796
    @gab796 2 ปีที่แล้ว

    a very newbie question: Kubernetes still suports Docker after all?

    • @MarcelDempers
      @MarcelDempers  2 ปีที่แล้ว +1

      Kubernetes still supports docker images.
      The deprecation is about the runtime on the kubelet. Kubernetes will use containerd going forward as a runtime for containers and will not use docker as a runtime.
      Docker images are still a standard supported by containerd, so your pods will and can still use docker images.
      kubernetes.io/blog/2020/12/02/dockershim-faq/

    • @gab796
      @gab796 2 ปีที่แล้ว

      @@MarcelDempers Thanks!!

  • @wagfeliz
    @wagfeliz ปีที่แล้ว

    An comment about the ¨bug" on the datasource, its because of those stupid Network Policies they put in the manifests, I particularly just remove it all, so you also will not have problems exporting grafana to a load balancer.

  • @PePTo-dx2yj
    @PePTo-dx2yj 7 หลายเดือนก่อน

    Devops)) need help grafana show no DATA for PODs(cpu network, ram) physical servers monitorungs fine.... where can be problem?

    • @MarcelDempers
      @MarcelDempers  7 หลายเดือนก่อน

      make sure your helm chart you are using matches the compatibility matrix and your cluster version

    • @PePTo-dx2yj
      @PePTo-dx2yj 7 หลายเดือนก่อน

      @@MarcelDempers ty for answer, seem to me my problemm more deeper)) 1. Install from kuberspray 2. k8s 1.24 3. (major cause) CRI wcih I m using in my cluster is a DOCKER, I think DOCKER major cause, ty again

  • @parthtrivedi318
    @parthtrivedi318 2 ปีที่แล้ว +1

    Can devops engineer earns 150k per year after 3 years of experience?

    • @whiteorchid2023
      @whiteorchid2023 2 ปีที่แล้ว +1

      Anything it's possible! Just search for a new job 🥳🥳🥳

    • @parthtrivedi318
      @parthtrivedi318 2 ปีที่แล้ว

      @@whiteorchid2023 Thanks sir