- 14
- 407 265
Prometheus Monitoring with Julius | PromLabs
Germany
เข้าร่วมเมื่อ 2 พ.ค. 2020
Prometheus co-founder Julius Volz shares Prometheus monitoring fundamentals, tutorials, tips and best practices, and more via his company PromLabs. See also promlabs.com/ and training.promlabs.com/.
Grafana Heatmaps for Prometheus Histograms | Grafana Heatmap Panel Configuration and Usage
In this video, I explain how to show Prometheus histograms as a heatmap in Grafana. This is a follow-up video to my last one about Prometheus histograms in general. Take a look at that one first: th-cam.com/video/yYbXak-1hew/w-d-xo.html
Also check out my other Prometheus training courses if you want to learn Prometheus in a structured way from the ground up:
training.promlabs.com/
Chapters:
00:00 Introduction
00:23 Adding and configuring a heatmap panel for Prometheus histograms
02:45 Using and interpreting the heatmap panel
03:19 Outro & PromLabs Trainings
---------------------------------------------------------------------------
CREDITS: "Subscribe Button" by MrNumber112 th-cam.com/video/Fps5vWgKdl0/w-d-xo.html
Also check out my other Prometheus training courses if you want to learn Prometheus in a structured way from the ground up:
training.promlabs.com/
Chapters:
00:00 Introduction
00:23 Adding and configuring a heatmap panel for Prometheus histograms
02:45 Using and interpreting the heatmap panel
03:19 Outro & PromLabs Trainings
---------------------------------------------------------------------------
CREDITS: "Subscribe Button" by MrNumber112 th-cam.com/video/Fps5vWgKdl0/w-d-xo.html
มุมมอง: 2 013
วีดีโอ
Understanding Prometheus Histograms | Motivation and Concepts, Instrumentation, Querying in PromQL
มุมมอง 9K6 หลายเดือนก่อน
In this video, I explain Prometheus histograms (for now only the "classic" ones that have been in Prometheus for around a decade - I will make a separate video about the new "native" histograms once they are stable): What are histograms, why are they useful, how can you instrument your service code with histograms, how are histograms exposed as metrics to Prometheus, and how can we query them i...
Relabeling in Prometheus | Relabeling Architecture and Flow, Configuration, Examples, Debugging
มุมมอง 9Kปีที่แล้ว
In this video, I explain the concept of relabeling in Prometheus, a powerful tool for filtering and transforming your flows of labeled objects in the Prometheus server. I go into the motivation for relabeling, explain how relabeling fits into the main Prometheus configuration file, and then show how the relabeling steps and flow works. I also discuss the structure of a single relabeling rule, g...
Understanding Counter Rates and Increases in PromQL | Reset Handling, Extrapolation, Edge Cases
มุมมอง 20Kปีที่แล้ว
In this video, I explain the exact value calculation behaviors of the rate(), irate(), and increase() functions in PromQL for computing rates of increase for counter metrics, show how they are different, how they deal with counter resets, how and when they extrapolate values, and which one you should use. Check out my Prometheus training courses if you want to learn Prometheus in a structured w...
Exposing Custom Host Metrics Using the Prometheus Node Exporter | "textfile" Collector Module
มุมมอง 11Kปีที่แล้ว
In this video, I explain how to use the Node Exporter's "textfile" collector module to expose custom host metrics. The textfile collector example scripts repository mentioned in the video: github.com/prometheus-community/node-exporter-textfile-collector-scripts Check out my Prometheus training courses if you want to learn Prometheus in a structured way from the ground up: training.promlabs.com/...
7 Things You Didn't Know About Prometheus | Little-Known Features and Implementation Details
มุมมอง 2.2Kปีที่แล้ว
In this video, I cover seven little-known features, behaviors, and implementation details in the Prometheus monitoring system that I find mildly interesting. How many of these did you know about? Check out my Prometheus training courses if you want to learn Prometheus in a structured way from the ground up: training.promlabs.com/ Chapters: 00:00 Introduction 00:32 Fact 1: Scrape Offsets 02:07 F...
PromQL Data Selection Explained | Selectors, Lookback Delta, Offsets, and Absolute "@" Timestamps
มุมมอง 18Kปีที่แล้ว
In this video, I explain how Prometheus / PromQL data selection works in detail. I go into instant and range vector selectors, label matchers, the 5-minute lookback delta for instant vector selectors, staleness markers and staleness handling, and then explain relative offsets and absolute "@" timestamps for selectors. Check out my Prometheus training courses if you want to learn Prometheus in a...
Don't Make These 6 Prometheus Monitoring Mistakes | Prometheus Best Practices & Pitfalls
มุมมอง 17Kปีที่แล้ว
In this video, I go through some of the biggest pitfalls you can run into when you're new to Prometheus, and I also explain how to avoid them. Check out my Prometheus training courses if you want to learn Prometheus in a structured way from the ground up: training.promlabs.com/ Related PromLabs blog post about these pitfalls: promlabs.com/blog/2022/12/11/avoid-these-6-mistakes-when-getting-star...
Monitoring Linux Host Metrics with Prometheus | Node Exporter (Setup, Scrape, Query, Grafana)
มุมมอง 17Kปีที่แล้ว
In this video, I show you how you can use the Prometheus Node Exporter to monitor Linux (or Unix) host metrics. We first download and run the Node Exporter and then we monitor it with Prometheus. Then, we explore some of the available metrics in Prometheus and show how to import an exhaustive Node Exporter dashboard into Grafana. Check my Prometheus training courses if you want to learn Prometh...
Understanding "up" and Friends in Prometheus | Synthetic (Auto-Generated) Scrape Metrics
มุมมอง 10Kปีที่แล้ว
In this video, I explain the various synthetic (or auto-generated) metrics that Prometheus records for every target scrape: up, scrape_duration_seconds, scrape_samples_scraped, scrape_samples_post_metric_relabeling, and scrape_series_added. Where do they come from, what do they mean, and what can you do with them? Check my Prometheus training courses if you want to learn Prometheus in a structu...
Understanding Prometheus Metric Types | Meaning and Usage (Gauge, Counter, Summary, Histogram)
มุมมอง 47Kปีที่แล้ว
This video explains the four different metric types in Prometheus: Gauges, Counters, Summaries, and Histograms. It goes into the meaning of each type, explains how to use each type in instrumentation, and how they show up in the exposition format. It also touches on what you have to watch out for when querying each metric type with PromQL. Check my Prometheus training courses if you want to lea...
Creating Grafana Dashboards for Prometheus | Grafana Setup & Simple Dashboard (Chart, Gauge, Table)
มุมมอง 97Kปีที่แล้ว
This video shows you how to use Grafana to build dashboards for Prometheus. The video explains how to download and run Grafana, how to create a Prometheus data source in Grafana, as well as how to create a dashboard with three basic panel types (time series chart, gauge, and table). Check my Prometheus training courses if you want to learn Prometheus in a structured way from the ground up: trai...
Getting Started with Prometheus | Minimal Setup (Download, Config & Run)
มุมมอง 59Kปีที่แล้ว
This video shows you how to download, configure, and run Prometheus in a minimal way, only scraping itself and some demo targets. Check my Prometheus training courses if you want to learn Prometheus in a structured way from the ground up: training.promlabs.com/ Chapters: 00:00 Introduction 00:44 Downloading Prometheus 01:16 Unpacking and Inspecting the Tarball 02:03 Configuring Prometheus 04:22...
Introduction to the Prometheus Monitoring System | Key Concepts and Features
มุมมอง 90K2 ปีที่แล้ว
Get a quick high-level overview of the key concepts of the Prometheus monitoring system straight from the co-founder of Prometheus. This video explains what Prometheus is, what the system architecture looks like, and what the main features and concepts are that make Prometheus-based monitoring so powerful: the dimensional data model, the text-based metrics transfer format, the PromQL query lang...
Saw him blink at 0:52
Proof that I'm human 🙌
HI I AM NOT ABLE TO RUN THE COMMAND ON MY WINDOWS TERMINAL, PLEASE GUIDE
Well... That was easier than I thought.
thank you so much my good sir!
Very good structured video. Everything is on point. Thank you!
This guy is a AI backed robot. How can he not blink? I mean WTF. Nice videos BTW. Learning lots of things here.
If you're not in front of a teleprompter with bright lights shining into your eyes every day, that can happen 😅 I think it got slightly better in my more recent videos, at least I tried to blink once or twice.
Honestly, reading the prometheus documentation there's so many functions, its hard to know where to start at, but this video is a great start to know the most useful functions for a beginner.
I’d love to see how Grafana Alloy fits in to the equation, with examples on where it might be useful.
thank you it is so hit!!
If I pass my interview tomorrow on this, you are invited to Lunch in Cairo, Egypt.
That's great to know, I was thinking of visiting there sometime :)
How does grafana show data as a time series even if we select range as $_range for increase queries?
If let's say you show a graph over a 1-hour period (making $__range == 1h) with a 1-minute query resolution, then what that means is that Prometheus will calculate the result of increase(foo[1h]) at each of the 60 1-minute steps over the 1-hour window. So it's like a very large sliding window, but producing an output point at every 1 minute. So in the end you see a (usually very smooth) time series.
@PromLabs metric_name[$_range] So this $_range means grafana time range, that we change from drop down right?
@@realnight_bot2536 Yes, see grafana.com/docs/grafana/latest/dashboards/variables/add-template-variables/#global-variables
@@PromLabs So the calculation of 60 dots shown on the graph will be done as : Increase from 6.00 pm to 5.00 pm, Increase from 5.59 pm to 4.59 pm, Increase from 5.58 pm to 4.58 pm and so on... ? Even if I select $_range = 1hr, So this is what you are saying sir?
You are a Gem. Thanks for the Tutorial
I have a quick question to clarify. When we use the `increase` function in our queries, for the very first time a metric is reported, it will have a value of 1 in Grafana. However, the `increase` function returns zero in this case, causing the query to fail. Other functions, like `rate`, exhibit the same behavior. I believe this issue arises due to the calculation method used in the `increase` function(First time first and last data points are the same). However, starting from the second metric report onward, it works fine. Do you know the reason behind this behavior and how we can avoid this issue?
Yes, when a counter metric just appears for the first time with a value of 1, the rate() and increase() functions do not know whether this was an actual increase, or whether the time series already had the value of 1, but was just temporarily absent for some reason (like a scrape failure or a too short rate window). That's why both functions currently require at least two samples to compare under the provided window. However, there is some work going on to start tracking the creation timestamps of counters, which could then be used in functions like rate() and increase() to handle these situations better. See for example this PromCon 2023 talk: th-cam.com/video/nWf0BfQ5EEA/w-d-xo.html. And in this PromCon 2024 talk, there is more information about the possible future metadata store to store all kinds of metadata about metrics, including counter creation timestamps: th-cam.com/video/Torm3M23Uyk/w-d-xo.html
@@PromLabs Thank you for the response. However, let’s say a second sample is received after one hour. When the rule is evaluated for the last five minutes at the time the second sample is received, it provides results. (even when I have only one data point). In my scenario, I need to trigger an alert if at least one failure is detected (I cannot use the sum function because I need to capture all labels as well). This is important because the next failure could occur after some time, as my service does not experience heavy traffic. I am monitoring metrics for the last 10 minutes.
@@pulithawanniarachchi7991 "even when I have only one data point" -> No, both functions will return an empty result if they find only one sample under the requested window. What you are maybe seeing is that you do have multiple scraped samples under the window, but only one increment among those samples. In that case, yes, the functions will report that increment. If you want to detect whether there was any failure at all under a given window, the best course of action would be to either pre-initialize all relevant counter metrics to 0 upon startup (see also promlabs.com/blog/2023/09/13/dealing-with-missing-time-series-in-prometheus/) or use an expression that has a fallback in case of an empty rate result. For example, you could do something like: increase(mymetric[5m]) or (mymetric unless mymetric offset 5m) Meaning: give me the increase over "mymetric" over the last 5 minutes, and if it's not present, give me instead the value of "mymetric" right now, but only if it didn't exist already 5 minutes ago (should match the rate window length). Haven't tested it, but something like this.
Basic question - how is tha alert option in graphana UI is different from alert in prom? and if one is using grafana for generating alerts, do one really needs to implement alertmanager in prom?
hi thanks for this video. The metric created its type is untyped, how to make this metric gauge type?
The Node Exporter takes the HELP and TYPE lines from the custom metric file if they are present, so for what you want, the following should be sufficient: # HELP my_metric_name My metric description # TYPE my_metric_name gauge my_metric_name 42
@PromLabs thank you got it
Thank you vey much Sir. I bough your training in PromLabs which so far has been very useful. For the histograms part in particular I struggled a bit until I took my time watching your video and trying thing on my own environment. This video was of great help and I think I got now the whole picture. Pls keep doing this. Thanks again
Thanks Julius!
Julius i am not able to install Grafana in prometheus after watching video also i am facing problem can you help me please
iam wondering which linux distro and theme youre using. looks very clean!
Thanks, it's just Arch Linux with the i3 tiling window manager :)
Thanks a lot Julius!
Great!!! Thank you for such wonderful video!!!
Wonderful course..Thanks for creating and sharing..
Wow. Didn't blink a single time. Quite impressive.
Trained that for a long time.
Impressive, very nice.
Now let's see Paul Allen's card... th-cam.com/video/Kxnf_MJ5IWs/w-d-xo.html
@@PromLabs ;D
Learning from the creator of a App, was a great blessing and an incredible opportunity that came our way. thank you Julius <3
Loving your videos very informative!
Excellent video! What do you use for your diagrams? I am preparing to present at Open Source Observability day in a week and would love to have nice looking diagrams like yours.
Thanks! For the diagrams I just used www.drawio.com/. For many of the animations in my latest videos I used motioncanvas.io/.
These videos are amazing! I have been transitioning from influx to otel, and Prometheus for custom metrics and these are super helpful Just a question from something i noticed in the video, isn’t the count always going to be equal to the +Inf bucket since it’s a cumulative histogram? Was wondering if there’s a reason they are separate other than to avoid confusion?
Thanks! Yes, the count is always going to be the same as the +Inf bucket, but it's still useful to have both: when working with histogram_quantile(), it's easier to just pass in all the *_bucket time series. When working with just the count to get a normal request rate, it's easier to just work with the _count series.
Awesome video, thank you for posting it!
What software you use for these visualizations?
motioncanvas.io/ mostly :)
Great video, nicely explained
Very informative videos. Thanks a lot !
What are the units in the "value" field of the tooltip? Like for your last example of the 333-499 bucket, the value displayed is 0.007. How do I interpret this number?
I mention it a few seconds before that for the other bucket: it's requests per second. That's because each bucket series is a counter and we're taking the rate() of it, which always returns a per-second result. See also my histograms video (th-cam.com/video/yYbXak-1hew/w-d-xo.html) and the one about rates and increases (th-cam.com/video/7uy_yovtyqw/w-d-xo.html).
Awesome video mate. Thanks a lot for sharing it.
Awesome, huge fan of prometheus (still a beginner) , the simple architecture and the kind of services it can provide, too good
Julius. It was a great video thank you.
Hi @Julius, I wanted to share some feedback regarding the product decisions you made in Prometheus, specifically around the extrapolation of values, as highlighted at 8:21 in the video and earlier. While I appreciate the continuous effort to enhance the platform, I must express my concern over the implementation of extrapolated values and the introduction of arbitrary figures like the 1.1 factor. From my perspective, these changes have complicated the user experience for many developers. The rationale behind such decisions seems unclear, and they appear to contribute more to confusion than to clarity. This kind of over-engineering, especially in such a critical aspect as chart accuracy, undermines trust in the tool. In my experience, the reliability of charts is paramount. Rather than relying on extrapolation, I strongly recommend focusing on gathering more precise data within the range window. Displaying charts based solely on actual measurements before and after the range window would provide a much clearer and more trustworthy visualization. I hope you consider revisiting these decisions to align better with the needs of the developer community. Prometheus has always been a great tool, and with a bit more focus on core functionality, it can maintain that trust and reliability. Thank you for your attention to this matter.
Perfectly explained! Thank you for this!
amazing!
Amazing!
Thank you for this video it worked well
Super nice explanation!
Thankyou for such an awesome and crisp explanation. Please make more videos areound promql and grafana
Thanks for this amazing job! I have a question though. How technically Prometheus server is able to scrape the status code of server like nginx ? From my knowledge it is not something that is exposed by nginx through a client library.
See the part of the video at th-cam.com/video/STVMGrYIlfg/w-d-xo.html where I talk about targets that don't have native Prometheus instrumentation. For software (or even hardware devices) that don't expose native Prometheus metrics, you would use a so-called exporter - an agent process sitting next to the thing you want to monitor that gets the metrics from the target and that Prometheus can scrape. For example, for nginx you would use the nginx-exporter: github.com/nginxinc/nginx-prometheus-exporter
@@PromLabs many thanks for your reply
Thanks for this amazing job. I have a question though. When you add the sum by(path) does that mean that the expression will return the sum of all the values of the time series ? I don't quite get this part.
Yes, the sum() aggregator aggregates across multiple time series, adding together all the individual values of the aggregated time series into fewer series. In this case, the "by(path)" preserves the "path" label, so the individual paths are not aggregated over, but still present in the result. That's kind of similar to the behavior of GROUP BY in SQL.
@@PromLabs many thanks for your reply
Hi bro, how to backup and restore Prometheus Data ?
You can use the snapshot API (prometheus.io/docs/prometheus/latest/querying/api/#snapshot) to create a snapshot of the entire TSDB that can be copied / backed-up to somewhere else. Just drop it back in place as Prometheus' data directory when you want to restore it. Triggering a snapshot via the API requires the admin API to be enabled via the "--web.enable-admin-api" command line flag.
Simple and concise. Great video once again.
Great video. I wanted to setup prometheus with my jellyfi setup. This video helped.
Hi, just want to say that i chanced upon using prometheus while trying to conduct a load test. This is really interesting tool! I deployed a postgres exporter to expose certain metrics from my postgres instance. Encountered some issues as there wasnt too many tutorials about how to do this in azure managed prometheus. But nonetheless managed to figure it out. I watched most of your video in one ago just afew hours ago, and these are really useful knowledge, including those tips in grafana. thank you!
I'm watching your tutorial at 3:50 AM and having my head racked for 2 weeks thanks 🧍