You have presented using ad-hoc filters perfectly to learn! Thanks you dear sir, I was trying to understand it from Grafana docs but it is just overwhelming.
You can do that using 2 queries, one for the actual time frame, the other using the offset modifier know from PromQL (prometheus.io/docs/prometheus/latest/querying/basics/#offset-modifier)
Thanks for the feedback. I wrote a blog post that accompanies the video, released yesterday: grafana.com/blog/2023/05/18/6-easy-ways-to-improve-your-log-dashboards-with-grafana-and-grafana-loki/
@@condla Thank you so much I was able to complete a task thanks to this, my logs were in JSON format, logfmt was not parsing and I guess ad-hoc variables would not work in that case
Nice! Got me going. I'm new to LogQL and Grafana, got some Splunk experience and am struggling to translate what I have. But this was a nice start. Any recommended youtubes as next step? I'm still struggling with a few things: 1) The base query is implemented in each panel - a lot of maintenance and I guess the query spends CPU x Nbr of panels. 2) I have a few regexes, I guess I should consider implementing them in the proxy infront of loki so they are available in simple filters for performance and maintenance. 3) The drill downs with Data Links - I only manage to do them in one level, and what corresponds to your "cluster" filter gets stuck for some reason - I want to drill down like 4 levels without making 4 dashboards with 8 separate panels with separate queries because that's a lot of maintenance. 4) Doing some arithmetic, I guess I have to learn transformations - like error rate in %, not in "per second". 5) Combining similar values in the same graph - some of my log entries have 4 timings - time to first byte, request end time etc - right now in 4 panels. 5) Do the same but for logs in BigQuery. I'm sure I'll figure some of this out on my own but one more kick in the right direction would save me some pulled hairs
Hey, quite a lot of questions for s small comment block 😅 but let's try: 1) have a look at this grafana.com/blog/2020/10/14/learn-grafana-share-query-results-between-panels-to-reduce-load-time/, plus Loki does a lot of caching, also take a look at recording rules to speed up high volume Loki queries th-cam.com/video/qGyoJPUIOz8/w-d-xo.htmlsi=BtYmT94Bt5_U21O3 2
2) it depends, generally you want to set as few labels as possible and with a rather lower cardinality during ingest; also it's rather bad practice to set a label is something that already is in the log message. On the other hand, at query time, you want to use as many labels as possible to speed up the query.
3) Grafana Scenes enters the conversation. "Hi there, let me help you" grafana.com/developers/scenes/ Scenes can help you achieve the connection of multiple dashboards while keeping the context when jumping back and forth.
4) Yes, learn transformations, but you can * also do arithmetics on queries directly * depending on the panel type you will already have suggestions in the suggestions tab of the visualization section that show %
A more concrete question maybe? Was counting top user agents, but now that our traffic has increased we have more than 2000 different user agents per time unit and I run into the max series issue, with a query like topk(50, sum by(user_agent_original) (count_over_time({deployment_environment="prod", event_name="request"} [$__range]))), where I naively first thought the topk(50 would protect me from that limit. It's an instant query, showing a table view with the values as a gauge. I could parse the user agent harder to get major browser version to get the options below 2000, but this is structured metadata, so I can't do that in LogQL, I have to do it in the collector (or in promtail?). I can't increase the 2000 limit, and I don't want to. Any way to rewrite the query to come around this issue?
Hey, sorry for the late reply. That's weird, the topk should actually protect you from that indeed. Just curious: have you eventually solved your problem?
Thanks 😊. Yes, you can use the regex/pattern parser to do any kind of ad hoc filtering. Examples are dependent on the expressions and patterns of course. What's your log pattern and what would you like to filter for?
Hi @@condla , Thanks for your quick reply! This is my loggql for a huge file with more or less unstructured log rows, which shows up the amount of all errors occurred in the selected period: sum by(logMsgMasked) (count_over_time({env=~"$env", job="core-files", filename=~"activities.log"} |~ `(WARNING|ERROR)` | regexp `^\[(?P.+) (?P.+)\]\[PID:(?P\d+)\] level\.(?P\S+): (?P.*)` | regexp `((a|A)ccount #?(?P\d+))` | label_format logMsgMasked="{{regexReplaceAll \"(\\\\d+)\" .logMsg \"\\\\d+\"}}" | line_format "{{.logMsgMasked}}" [$__range])) Suppose there was a log message in the "logMsg" pattern match section: "Memory for account 4711 exhausted by 123456 bytes.", this will be converted to "Memory for account \d+ exhausted by \d+ bytes.". So the converted message should be in an ad hoc filter panel. Activating the adhoc filter on it should display all messages in a corresponding raw message panel below of it, regardless of the number of bytes or the account where the error occurred. I hope I have been able to describe my problem clearly enough.
Currently my office is working some pilot projects to have centralized logging and metrics dashboard using Grafana Loki. We found out Grafana and Loki are powerful tools, however it is quite difficult to find references in Google. This video is very very insightful video for Grafana Loki. However, there is one thing that is not working from our Grafana (v10.0.1). If we change to instant type, then all different values in a pie chart will be aggregated so it will display one value only. This issue doesn't happen in Query type. Have you ever heard about this issue?
Hi @albogp thanks for the feedback. I haven't heard about this yet, but you can ask the community.grafana.com or join the community slack and ask your question there: grafana.slack.com
there's several ways you can accomplish this. Either host your own Loki and use Azure blob storage as a storage layer or ingest the logs into Grafana Cloud Loki and configure an export job (grafana.com/blog/2023/06/08/retain-logs-longer-without-breaking-the-bank-introducing-grafana-cloud-logs-export/)
You mean application as in Grafana for creating dashboards? And Grafana Loki as the solution to store and query logs? I'm confused because I put this in the title. If you search online you should be finding tons of resources for both.
Hey, generally speaking it's working the same way. You just need to define the pattern or a regex first to extract the information your want to visualize. Which metrics do you want to extract from which type of logs?
Hi, thanks for the comment. this comes with a trade-off and any kind of query language has a certain learning curve. I'm trying to reduce the one for Loki with this video. In the near future you will see Grafana implementing an explore UI that allows you to query and aggregate logs without any query language at all. But users can still make use of the power of LogQL if they want
Literally the only guide that actually shows how to do stuff! Like from me and my team:)
Thanks for the nice words to you and your team ❤
like seriously, god bless you.
Thank you, sounds like this solved an issue for you 😊
Great tutorial to start with Graphana Loki! 👍👍
Thanks a lot! It's Grafana though ;P
This was very well done. Thank you. Please continue to make additional videos like this tutorial.
thanks for your kind words! considering this ;)
Great video! Thanks!
You have presented using ad-hoc filters perfectly to learn! Thanks you dear sir, I was trying to understand it from Grafana docs but it is just overwhelming.
Thanks for the feedback... Anything else that's commonly used but needs clarification?
hi, is there a way to get the difference of the timestamp for us to get an api latency and do a trendchart@@condla
You saved me a lot of time. Great video
Thanks bro! It's amazing explanation how using loki more effectively
thanks, I'm happy you found the video useful!
It was awesome, I learned alot ❤
It is great. Thanks.
Thank you 🙂
Great! Thank you
Thanks :)
Superb
Thank you! Cheers!
Vielen Dank!
Haha, gern geschehen 😊
very useful and reality
Thank you!
Loved your video, could you please help me how to get a visualisation that compares previous time?
You can do that using 2 queries, one for the actual time frame, the other using the offset modifier know from PromQL (prometheus.io/docs/prometheus/latest/querying/basics/#offset-modifier)
Thank ❤
Great video! Please make another one like this. For prometheus maybe? Or tempo
What are you trying to visualize in Tempo that would be helpful? (Sorry for the late reply :) )
@@condla for example making prometheus dashboard, step by step
@@fumaremigel I have the feeling there's a lot of content out there for Prometheus already, but I'll think about it 😊
Thank you so much. This was really helpful!
Thanks for the feedback. I wrote a blog post that accompanies the video, released yesterday: grafana.com/blog/2023/05/18/6-easy-ways-to-improve-your-log-dashboards-with-grafana-and-grafana-loki/
@@condla Thank you so much I was able to complete a task thanks to this, my logs were in JSON format, logfmt was not parsing and I guess ad-hoc variables would not work in that case
Nice! Got me going. I'm new to LogQL and Grafana, got some Splunk experience and am struggling to translate what I have. But this was a nice start. Any recommended youtubes as next step? I'm still struggling with a few things: 1) The base query is implemented in each panel - a lot of maintenance and I guess the query spends CPU x Nbr of panels. 2) I have a few regexes, I guess I should consider implementing them in the proxy infront of loki so they are available in simple filters for performance and maintenance. 3) The drill downs with Data Links - I only manage to do them in one level, and what corresponds to your "cluster" filter gets stuck for some reason - I want to drill down like 4 levels without making 4 dashboards with 8 separate panels with separate queries because that's a lot of maintenance. 4) Doing some arithmetic, I guess I have to learn transformations - like error rate in %, not in "per second". 5) Combining similar values in the same graph - some of my log entries have 4 timings - time to first byte, request end time etc - right now in 4 panels. 5) Do the same but for logs in BigQuery. I'm sure I'll figure some of this out on my own but one more kick in the right direction would save me some pulled hairs
Hey, quite a lot of questions for s small comment block 😅 but let's try:
1) have a look at this grafana.com/blog/2020/10/14/learn-grafana-share-query-results-between-panels-to-reduce-load-time/, plus Loki does a lot of caching, also take a look at recording rules to speed up high volume Loki queries th-cam.com/video/qGyoJPUIOz8/w-d-xo.htmlsi=BtYmT94Bt5_U21O3
2
2) it depends, generally you want to set as few labels as possible and with a rather lower cardinality during ingest; also it's rather bad practice to set a label is something that already is in the log message. On the other hand, at query time, you want to use as many labels as possible to speed up the query.
3) Grafana Scenes enters the conversation. "Hi there, let me help you" grafana.com/developers/scenes/
Scenes can help you achieve the connection of multiple dashboards while keeping the context when jumping back and forth.
4) Yes, learn transformations, but you can
* also do arithmetics on queries directly
* depending on the panel type you will already have suggestions in the suggestions tab of the visualization section that show %
5) just click "add query" below the first query of panel and add as many queries to one panel as you want.
A more concrete question maybe? Was counting top user agents, but now that our traffic has increased we have more than 2000 different user agents per time unit and I run into the max series issue, with a query like topk(50, sum by(user_agent_original) (count_over_time({deployment_environment="prod", event_name="request"} [$__range]))), where I naively first thought the topk(50 would protect me from that limit. It's an instant query, showing a table view with the values as a gauge. I could parse the user agent harder to get major browser version to get the options below 2000, but this is structured metadata, so I can't do that in LogQL, I have to do it in the collector (or in promtail?). I can't increase the 2000 limit, and I don't want to. Any way to rewrite the query to come around this issue?
Hey, sorry for the late reply. That's weird, the topk should actually protect you from that indeed. Just curious: have you eventually solved your problem?
@condla No I haven't. But we're switching from one CDN to another right now so I'm sure I'll run into much heavier observability issues 😅
@@joffemannen 🤣 I bet you will! Hopefully exciting challenges ;)
Great video! I wished there was more. I wonder if there Is any solution to do such ad hoc filters with "regex" or "pattern" parsed logs?
Thanks 😊. Yes, you can use the regex/pattern parser to do any kind of ad hoc filtering. Examples are dependent on the expressions and patterns of course. What's your log pattern and what would you like to filter for?
Hi @@condla , Thanks for your quick reply!
This is my loggql for a huge file with more or less unstructured log rows, which shows up the amount of all errors occurred in the selected period:
sum by(logMsgMasked) (count_over_time({env=~"$env", job="core-files", filename=~"activities.log"} |~ `(WARNING|ERROR)` | regexp `^\[(?P.+) (?P.+)\]\[PID:(?P\d+)\] level\.(?P\S+): (?P.*)` | regexp `((a|A)ccount #?(?P\d+))` | label_format logMsgMasked="{{regexReplaceAll \"(\\\\d+)\" .logMsg \"\\\\d+\"}}" | line_format "{{.logMsgMasked}}" [$__range]))
Suppose there was a log message in the "logMsg" pattern match section: "Memory for account 4711 exhausted by 123456 bytes.", this will be converted to "Memory for account \d+ exhausted by \d+ bytes.". So the converted message should be in an ad hoc filter panel. Activating the adhoc filter on it should display all messages in a corresponding raw message panel below of it, regardless of the number of bytes or the account where the error occurred. I hope I have been able to describe my problem clearly enough.
can you send me your promtail configuration for the above dashboard please?
There's nothing notable done in promtaill. What's your challenge?
Currently my office is working some pilot projects to have centralized logging and metrics dashboard using Grafana Loki. We found out Grafana and Loki are powerful tools, however it is quite difficult to find references in Google. This video is very very insightful video for Grafana Loki.
However, there is one thing that is not working from our Grafana (v10.0.1). If we change to instant type, then all different values in a pie chart will be aggregated so it will display one value only. This issue doesn't happen in Query type.
Have you ever heard about this issue?
Hi @albogp thanks for the feedback. I haven't heard about this yet, but you can ask the community.grafana.com or join the community slack and ask your question there: grafana.slack.com
How to store Grafana Loki logs in Azure Blob Storage
there's several ways you can accomplish this. Either host your own Loki and use Azure blob storage as a storage layer or ingest the logs into Grafana Cloud Loki and configure an export job (grafana.com/blog/2023/06/08/retain-logs-longer-without-breaking-the-bank-introducing-grafana-cloud-logs-export/)
Can you please share the application you used to create this dashboard?
You mean application as in Grafana for creating dashboards? And Grafana Loki as the solution to store and query logs? I'm confused because I put this in the title. If you search online you should be finding tons of resources for both.
@@condla I wanted the application code which was generating the logs and trace. Thanks a ton for the video !!
Ahhhh 😁 I've used a dummy observability application that can be deployed to test things like this: follow the link for more information microbs.io/
This is not working for unstructured logs where we use pattern to match
Hey, generally speaking it's working the same way. You just need to define the pattern or a regex first to extract the information your want to visualize.
Which metrics do you want to extract from which type of logs?
Can we use it for Json data
Yes of course, you would just use the json parser instead of the logfmt one
please dont add music br it is very disturbing
Thanks for the feedback.
It didn't bother me
just type simple query:
fields @message
| filter @message like /$Filter/
| limit 100
dont make it hard
Hi, thanks for the comment. this comes with a trade-off and any kind of query language has a certain learning curve. I'm trying to reduce the one for Loki with this video.
In the near future you will see Grafana implementing an explore UI that allows you to query and aggregate logs without any query language at all. But users can still make use of the power of LogQL if they want