Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/
Hi dude, your last sentence saved the rest of my day! I was struggling for hours figuring out the results calculated by quantile() on a vector of only 10 entries!!!
How do you have the first blue dot being a 0.25 quantile at 3:07 and then suddenly becoming a 0.20 quantile at 5:05? I am a little confused, if the technical definition is that of a quantile being the amount of points less than itself. Thanks!
The difference is that when we are explicitly trying to find quantiles - 4 ways to divide the data into equal sized bins - we have to round to either the 25th, 50th or 75th quantile.. So that's what is going on at 3:07. Later, we are calculating percentiles (but calling them quantiles because that is what is commonly done - and I mention this at 4:16 ) and we don't have to round, so we call the point the 20th quantile, because that is what it is without rounding.
Thank you so much for this video. This was really clearly explained as promised in the video's title. I watched so many videos before I found yours. None of the previous videos was well explained as yours . You are totally right when u said that StatQuest is special , YES IT IS!
just saw someone post a Q-Q plot with regards to the price of an asset and was like, "what the hell is a Q-Q plot? and what the hell is a 'quantile'?" only to find out that quantiles were related to percentiles which were a mathematical concept that i had always struggled with. i love love love math but percentiles were one of those things i just couldn't fully grasp. bookmarking this to my lil "education" folder so i can come back to this when i need it. thanks!! :^)
I liked the level at which you explained this. It was easy enough for me to understand, but explained fully so I feel like I totally get it. Thank you!
Both! Starting at 0:36 I talk about how quantile and percentile have multiple definitions and multiple ways to be calculated. The point is that with quantiles and percentiles, if you have a lot of data, details are not important, the bigger picture is important. If you don't have a lot of data, then be very cautious with your conclusions.
Dear Josh, so the top dot is 14/15 = 93% quantile? And we never have the 100% quantile? Supposed we have 1000 dots, the top is 999/1000 = 99.9% quantile, could we round it to say it is the 100% quantile?
Remember, there are a lot of ways to define quantile and percentile. One way to define it, used in this video, is the percent of values below a specific value. However, it's also defined as the number of values equal to or less than. In this case you'd have 100%.
It's a little different. For example, if you only had 3 points (point A, B and C), then the gaps between datapoints will be large and there would be a relatively big difference between the quantiles if one method said the first quantile was A and another method said the first quantile is B. But when there is tons of data, then the gaps between datapoints will be small and difference between A and B will be much smaller, so if one method says the first quantile is A and the other says it is B, those two values will be close to each other. Does that make sense?
@@statquest thanks for your response, so we can pick 5 as the median as well and we can change the scale to 10 instead of how you chose the scale as 9 and the median as 4.5, am I correct?
@@govamurali2309 In this example, 5 is not a good value for the median because there are more observations with values < 5 than there are observations values > 5. In contrast, 4.5 is a good value for the median because there are an equal number of observations with values < 5 and observations with values > 5. If you changed the scale, then you would change the median value. However, we still want a value that splits the data such that there is an equal number of observations with values > median and observations with values < median.
Dear Josh, Why do I get residual plots in some software after fitting line,where the X-axis is labeled 'Regular Residual' and the Y-axis is labeled 'percentile'? Is this a Q-Q plot of residuals?
Hi Josh, I don't understand the graph. Since the y-axis is gene expression, what is the x-axis? Also, what do you mean by gene expression on the y-axis? Are these types of gene expression?
The data come from gene expression measurements made from mouse cells. So the y-axis is gene expression (how much each gene is transcribed) and the x-axis represents the specific mouse. If we had 2 mice, we'd have 2 columns of dots.
@@BeginnerVille It's a toss up as to whether or not we have 0% percentile or a 100% percentile. We can have either one, but not both. So you can include the point in the calculation (1/15 = 0.07% or 15/15 = 100%) or not (0/15 = 0% or 14/15 = 93%). In this example, we don't include the point. Note: We can do it either way because usually these sort of divisions are only done with a lot of data, and one point doesn't make a big difference.
I feel like I understood the video, but I feel like I'm missing a logical jump. We said that in a sample with fifteen data points the 50% quantile would have seven points below it, and seven points above it. Fair enough. 15/2=7.5, and perhaps that 0.5 comes from the line going through the median point itself. But this doesn't really seem to generalize well in the scheme you present at the beginning of this video. Perhaps its best shown by saying at 2:56 you highlight a purple point as being the 25% quantile because you've bisected twice. However, at 4:52 you refer to that same point as the 20% quantile, because three of the fifteen points are below it. Both approaches make some intuitive sense to me, but they give notably different results for quantile measurement.
Unfortunately, one of the annoying things about quantiles is that there are a ton of ways to calculate quantiles ( as I mentioned at 0:44 ) Depending on the number of values in your dataset, you end up with situations where it's not possible to groups that are exactly equal, so we have a lot of different formulas to deal with this, and that means we get small differences in the results. However, that said, when the dataset is large enough, the differences don't matter any more.
i'm confused. if a quantile is dividing the data into groups of equal numbers of points how is, for example a .95 quantile achieving this? i get that at the .95 point it means %95 of my data points are below that "line" but where does the definition lie here ? how are the datapoints divided into equal points? because below that "line" you have %95 of you data and above it %5 ... the datapoints are not equally divided.
Depending on the number of values in your dataset, you end up with situations where it's not possible to groups that are exactly equal. In order to deal with this problem, there are a ton of ways to calculate quantiles ( As I mentioned at 0:44 ). However, that said, when the dataset is large enough, the differences don't matter any more.
Hi Josh. There is a rule to decide the quantity of quantiles to separate the data? Or Can I just pick a random number independent of characteristic of data?
I have so much trouble with our teacher, he just inserted quartile, percentile, deciles and teach us in one subject and now its exam and were having trouble because its included in the test and we barely gets anything 😣
Considering the point where you mentioned "the terms quantile and percentile are used when we divide each datapoint in it's own group" , what happens when we have lets say 200 datapoints .... do we have 200 pecentiles ? If yes, do we plot all these 200 pecentiles in Q-Q plot ? I am really stuck at this ....
you said " quantiles are just the liens that divide data into equally sized groups". What equally-sized groups does the 25% quantile split the data into ?
When we look at all of the quantiles that we are going to use, so, in your case, you might look at the 25% 50% and 75%, you'll create 4 equally sized groups.
hey josh..one doubt!! if someone says 1st quantile is 0.07..what should i interpret from that..does that mean below that value only 1 data point falls or something else?
@@statquest The first observation marks the 0% quantile, as there are 0 observations below it. The second observation marks the 7% quantile, because 1/15th (.666...) of the observations are below it. Following this logic... The eighth has 7/15 of the observations below it, and would be the 46%. I obviously understand that it is "in the middle" but I thought you were defining quantiles by what percentage of the observations are below them.
It's just rounding. When we divide the data into just 4 quartiles, each one contains approximately 25% of the data, and we do the best we can. However, when we divide the data into smaller quantiles, we can be more precise. Does that make sense?
For the 4th blue Dot, which you are addressing as 25th percentile, there are 3 Dots below that Dot, so the percentage becomes 3/15*100= 20%. If I count that Dot also than will have 4/15*100 = 26.67%. Then why are you calling that Dor the 25th percentage, can't figure it out as you are saying 25th percentile means 25% of the data is equal to or less than that value.
In this part of the example, I call this the 25th percentile or 25% quantile, because the lines have divided the data into 4 equal portions. This makes the lowest line the 25th percentile, the middle line the 50th percentile and the highest line the 75% percentile. Dividing the data into four equally sized groups is just one way to determine quantiles. When you do it this way, you have to do some rounding because, as you noticed, there is no specific point that is exactly above 25% of the data, however, since each group of points is equally sized, we still call it the 25th precentile. Later, when I show how each data point can be considered its own quantile, you can be much more precise in defining the quantiles for each point. Does that make sense? The important thing to remember is that you should really only trust quantiles when there is a lot of data. When you have a lot of data, the differences among ways of determining quantiles are insignificant.
@@statquest For the data : 5, 10, 15, 20, 25, 30, 35, 40, 45, 50. I have calculated the Q1= 13.75, Q2= 27.5, Q3= 41.25 from the Percentile Formula P=(N+1)/100 and Excel also gives the same results. But by observing the data, it is 15, 27.5, 40 that devides the data into 4 equal parts, so Q1= 15, Q2= 27.5, Q3= 40 and my CASIO fx-991 EX calculator gives the same results. So, can you tell why is this absurdity and which answer should I take?
Thanks for the explanation. By the way, there is no difference between 0.5 and 50% since 50%=0.5. It's mathematically exactly the same, so both notations can always be used.
It depends on how, exactly, you define percentiles. In this video I demonstrate one of many methods. When there is a lot of data, all of the methods are going to give you very similar results, so it's no big deal. However, if you only have a small amount of data, it's worth trying different approaches.
@@statquest Thanks! So when someone says they scored in the 100th percentile of a large standardized test, did they actually score in a percentile labelled "the 100th percentile" or did they really score in the 99.6th and have it rounded up to 100?
Hey and thanks a lot for your amazing videos, they've helped me a lot. One question regarding this one: "quantiles are just the lines that divide data into equally sized groups". Isn't that true only for the median? For example, the 75% quantile in your video splits the data into 2 groups, one with 3 observations larger than the 75% quantile and 11 observations smaller than it.
You are correct in that individual quantiles do not all separate the data into equally sized groups - however, all of the quantiles, taken together, divide the data into equally sized groups. So if someone said, "I divided the data with 4 quantiles" you would know that there were 5 equally sized groups.
@@statquest Hi Josh, lets say that we divide the data in 4 quantiles, i,e 25 percentile,50 percentile, 75 percentile and 100 percentile. How can there be 5 regions? Should not there be 4 regions?
I'm not sure I understand the question. This video talks about how there is a strict definition of quantile, which is one thing, and then there is how the term is used in practice, which is different. In practice, the terms quantile and percentile are interchangeable.
@@statquest so if I have a data set with series / respective frequency & probability. 1.) Who can I find the worst 5% tail data points ? 2.) Best method or technique - How can I create class / categorical slabs for worst events data points?
My intention was that sorting would be implied by the way the data is put on the graph. However, you are correct, I probably should have stated it explicitly.
@@statquestYes , I under stood for Quantile you split the values into 4 equal parts but for percentile you divided into 15 parts how ? Thanks for Uploading this video!
A percentile implies that a percentage of data has smaller values. For example, the 6th percentile implies that 6% of the data has lower values. In contrast, 6% simply means 6% of the data share some feature. In other words, percentile has a narrower definition, and is a specific case of a percentage.
"Quantiles" when the data is considered to be 1. So, the Median is 0.5th quantile. "Percentiles" when the data is considered 100. the Median is the 50th percentile. "Deciles" when the data is considered 10. the Median is 5th Decile. right?
I don't quite understand something. At 3:08 you draw a line and say "this is the 0.25 quantile because 25% of the points are less than this line." However, this is not true. There are 15 points total, and only 3 are less than the 0.25 quantile line. Later in the video at 4:55, you draw a line through the same point and call it the 20th quantile/percentile. You then conclude, "I've shown you just one way to calculate the quantiles and percentiles, however, there are many more." Did you not just show us two significantly different ways that would have a meaningful impact on downstream calculations and interpretation? My head is spinning.
So, with a relatively small dataset like the one I used, rounding plays a big role. If we specifically want to label the quantiles, we have to just pick the point that is closest to it, and go with that. Likewise, we do the same for percentiles. As a result, this means we have a lot of variability in what we might call the 25% quantile or the 20th percentile. However, with larger datasets with more points, these problems go away and have relatively little impact in the final analysis.
Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/
aaand Sold Out! JK :)
Clearly explained for a novice. Thank you, Josh. I really appreciate the time you've put into creating these. They're very helpful.
Glad you like them!
Dude, your intros are incomparable!
I'm definitely in the minority, I know that. But I really hate them.
Hi dude, your last sentence saved the rest of my day! I was struggling for hours figuring out the results calculated by quantile() on a vector of only 10 entries!!!
Hooray! I'm glad the video helped you figure what was going on. :)
You are technically sound and logically consistent
Since you explain in depth
Therefore I love watching your videos
Thanks! :)
How do you have the first blue dot being a 0.25 quantile at 3:07 and then suddenly becoming a 0.20 quantile at 5:05? I am a little confused, if the technical definition is that of a quantile
being the amount of points less than itself. Thanks!
The difference is that when we are explicitly trying to find quantiles - 4 ways to divide the data into equal sized bins - we have to round to either the 25th, 50th or 75th quantile.. So that's what is going on at 3:07. Later, we are calculating percentiles (but calling them quantiles because that is what is commonly done - and I mention this at 4:16 ) and we don't have to round, so we call the point the 20th quantile, because that is what it is without rounding.
@@statquest Thank you for clearing it. I also had the same doubt.
Thank you so much for this video. This was really clearly explained as promised in the video's title. I watched so many videos before I found yours. None of the previous videos was well explained as yours . You are totally right when u said that StatQuest is special , YES IT IS!
Great to hear!
I liked your pronunciation and the absolute clarity of the presentation of information.
Thank you!
Thanks for the hard work.
I didn't expect that intro from a Statistician LOL
:)
just saw someone post a Q-Q plot with regards to the price of an asset and was like, "what the hell is a Q-Q plot? and what the hell is a 'quantile'?" only to find out that quantiles were related to percentiles which were a mathematical concept that i had always struggled with. i love love love math but percentiles were one of those things i just couldn't fully grasp. bookmarking this to my lil "education" folder so i can come back to this when i need it. thanks!! :^)
bam! :)
me feeling so confused before and then watching this THANK YOU!!!!
bam!
Its a joy watching your stats videos. Thanks a lot.
I liked the level at which you explained this. It was easy enough for me to understand, but explained fully so I feel like I totally get it. Thank you!
Thank you very much! :)
Wow!!!! no one can explain quantiles and percentiles better than this explanation, at least I feel this way.
I'm glad you like the video so much! :)
hahaha great lecture and you got good sense of humour which makes the whole video more entertaining :)
Glad you enjoyed it!
Stat Quest is special.....Yes it is!!
:)
Is the blue point the 25th(3:09) percentile or 20th (4:56)? Thanks for answering.
Both! Starting at 0:36 I talk about how quantile and percentile have multiple definitions and multiple ways to be calculated. The point is that with quantiles and percentiles, if you have a lot of data, details are not important, the bigger picture is important. If you don't have a lot of data, then be very cautious with your conclusions.
@@statquest Got it! thanks for helping out
Hi Joshua. Once again, a very good video! Any plans on making videos about quantile regression?
You always have the best explanation My Prof Starmer!
Thank you! :)
Hi Josh, as usual, a super video, taking into account all the subtleties of quantiles/percentiles, Thank you!!
Thank you! :)
"Quantiles and percentiles are just a metter of finding out how many values are less than the value you are interested in". Interesting. Thanks!
:)
AWESOME BEGINNING!!!
Thanks! :)
Wow...This is literally the best movie I've ever seen. Thank you!
bam! :)
If you consider the first element as 0th quantile, then how do you get 100th as you get 14/15 for the last one?
It doesn't really make sense to call the first element the 0th quantile because that means 0% of the data is equal to or less than that quantile.
Easy to understand and to the point. Thanks!
Thank you! :)
What a great and clear way to teach! congrats :)
Thank you! 😃
At 3:37 can be related to Box Plots.
Yes! :)
Explanations are brilliant too!! NICE WORK!!
Thanks!
Dear Josh, so the top dot is 14/15 = 93% quantile? And we never have the 100% quantile?
Supposed we have 1000 dots, the top is 999/1000 = 99.9% quantile, could we round it to say it is the 100% quantile?
Remember, there are a lot of ways to define quantile and percentile. One way to define it, used in this video, is the percent of values below a specific value. However, it's also defined as the number of values equal to or less than. In this case you'd have 100%.
@@statquest thank you so much for the detailed explanation
This is incredibly clear and well explained! Thank you!
Thank you!
@ 3:25, why is the 0.75 quantile 7.3, instead of 7.5? The 0.25 quantile was 2.5...
The 75% quantile crosses the y-axis at 7.3.
5:44 i have the feeling that this kinda explains the central limit theorem, am i wrong?
It's a little different. For example, if you only had 3 points (point A, B and C), then the gaps between datapoints will be large and there would be a relatively big difference between the quantiles if one method said the first quantile was A and another method said the first quantile is B. But when there is tons of data, then the gaps between datapoints will be small and difference between A and B will be much smaller, so if one method says the first quantile is A and the other says it is B, those two values will be close to each other. Does that make sense?
@@statquest yeah totally, thanks
I'll be honest 50% of this video is what i need.
I was struggling to find out what Quantile means & finally got it! thank you.
At 2:05, how is the gene expression calculated as 4.5 and how is the scale for the axis choosen?
The scale is arbitrary. However, we pick 4.5 as the median because 50% of the measurements are below that value.
@@statquest thanks for your response, so we can pick 5 as the median as well and we can change the scale to 10 instead of how you chose the scale as 9 and the median as 4.5, am I correct?
@@govamurali2309 In this example, 5 is not a good value for the median because there are more observations with values < 5 than there are observations values > 5. In contrast, 4.5 is a good value for the median because there are an equal number of observations with values < 5 and observations with values > 5.
If you changed the scale, then you would change the median value. However, we still want a value that splits the data such that there is an equal number of observations with values > median and observations with values < median.
@@statquest Thanks got it now :)
@@govamurali2309 Hooray! :)
Thank you so much. Your explanation is Top notch👌
Thanks! :)
do we always need to arrange data to ascending order for ungrouped data?
In practice, you can just call a quantile function on your data without having the pre-sort it. The quantile function will sort it for you.
I'm staying for the intros (and the content, of course!)
BAM! :)
awesome explanation! Thanks a lot
Thanks! :)
Dear Josh, Why do I get residual plots in some software after fitting line,where the X-axis is labeled 'Regular Residual' and the Y-axis is labeled 'percentile'? Is this a Q-Q plot of residuals?
I don't know. I've never seen one before.
Hi Josh, I don't understand the graph. Since the y-axis is gene expression, what is the x-axis? Also, what do you mean by gene expression on the y-axis? Are these types of gene expression?
The data come from gene expression measurements made from mouse cells. So the y-axis is gene expression (how much each gene is transcribed) and the x-axis represents the specific mouse. If we had 2 mice, we'd have 2 columns of dots.
So, thers's no 100% percentile in the example, for the top only have 14 lower than it as 14/15 right?
What time point in the video, minutes and seconds, are you asking about?
@@statquest
Thanks for notifying!
At 5:03.
@@BeginnerVille It's a toss up as to whether or not we have 0% percentile or a 100% percentile. We can have either one, but not both. So you can include the point in the calculation (1/15 = 0.07% or 15/15 = 100%) or not (0/15 = 0% or 14/15 = 93%). In this example, we don't include the point. Note: We can do it either way because usually these sort of divisions are only done with a lot of data, and one point doesn't make a big difference.
@@statquest
I like you referring back to the example showed in the video, it becomes much clearer!
Thank you so much!
(I felt like I heard a 'Bam!)
@@BeginnerVille bam!
I feel like I understood the video, but I feel like I'm missing a logical jump. We said that in a sample with fifteen data points the 50% quantile would have seven points below it, and seven points above it. Fair enough. 15/2=7.5, and perhaps that 0.5 comes from the line going through the median point itself. But this doesn't really seem to generalize well in the scheme you present at the beginning of this video.
Perhaps its best shown by saying at 2:56 you highlight a purple point as being the 25% quantile because you've bisected twice. However, at 4:52 you refer to that same point as the 20% quantile, because three of the fifteen points are below it. Both approaches make some intuitive sense to me, but they give notably different results for quantile measurement.
Unfortunately, one of the annoying things about quantiles is that there are a ton of ways to calculate quantiles ( as I mentioned at 0:44 ) Depending on the number of values in your dataset, you end up with situations where it's not possible to groups that are exactly equal, so we have a lot of different formulas to deal with this, and that means we get small differences in the results. However, that said, when the dataset is large enough, the differences don't matter any more.
Nicely explained !! crystal clear !
i'm confused. if a quantile is dividing the data into groups of equal numbers of points how is, for example a .95 quantile achieving this?
i get that at the .95 point it means %95 of my data points are below that "line" but where does the definition lie here ? how are the datapoints divided into equal points? because below that "line" you have %95 of you data and above it %5 ... the datapoints are not equally divided.
Depending on the number of values in your dataset, you end up with situations where it's not possible to groups that are exactly equal. In order to deal with this problem, there are a ton of ways to calculate quantiles ( As I mentioned at 0:44 ). However, that said, when the dataset is large enough, the differences don't matter any more.
@@statquest thank you for replying.
I think it's more clear now what quantiles are all about. thanks for the help
very clear explanation!!! tks!
Thank you!
say if today i have 1000 samples, can i refer the third data point (2/1000) as 0.2% quantile or 0.2th percentile
If you wanted to.
What would i even do without you Josh
:)
Can you go over hidden markov models? Love your videos btw.
Always clearly explaining!
Hi Josh. There is a rule to decide the quantity of quantiles to separate the data? Or Can I just pick a random number independent of characteristic of data?
Generally speaking, the most commonly used are quartiles (dividing the data into 4 equally sized pieces) or percentiles.
Hi thanks for the great video, do u have R tutorial on hoe to get the quantile from a fitted distribution?
I don't, but that's a great idea. :)
@@statquest would love to watch it from your channel soon
Thank you for the clear explanation. :D
You're welcome!
I have so much trouble with our teacher, he just inserted quartile, percentile, deciles and teach us in one subject and now its exam and were having trouble because its included in the test and we barely gets anything 😣
That's why god created youtube! :)
I have a auestion. How can the blue point (4th from the bottom) be the 20th and 25th percentile at the same time?
Rounding. With more data points, we'd end up with finer, more precise quantiles and percentiles.
Considering the point where you mentioned "the terms quantile and percentile are used when we divide each datapoint in it's own group" , what happens when we have lets say 200 datapoints .... do we have 200 pecentiles ? If yes, do we plot all these 200 pecentiles in Q-Q plot ? I am really stuck at this ....
Usually we just use 100 percentiles.
you said " quantiles are just the liens that divide data into equally sized groups". What equally-sized groups does the 25% quantile split the data into ?
When we look at all of the quantiles that we are going to use, so, in your case, you might look at the 25% 50% and 75%, you'll create 4 equally sized groups.
hey josh..one doubt!! if someone says 1st quantile is 0.07..what should i interpret from that..does that mean below that value only 1 data point falls or something else?
It depends...however, usually that means 1% of the data are less than that point (0.07).
@@statquest thanks man!!
So the median has 7/15ths of the observations below it. How is it then the .5 quantile?
Because 7/15ths of the observations are below it and 7/15ths are above it, the median is right in the middle, and thus, the 0.5 quantile.
@@statquest The first observation marks the 0% quantile, as there are 0 observations below it.
The second observation marks the 7% quantile, because 1/15th (.666...) of the observations are below it.
Following this logic...
The eighth has 7/15 of the observations below it, and would be the 46%.
I obviously understand that it is "in the middle" but I thought you were defining quantiles by what percentage of the observations are below them.
@@uiru2900 Unfortunately there are a ton of ways to define "quantile", however, one common way is
how are you getting the values of 2.5 & 7.3?
Those are just the y-axis values that correspond the the values at the different quantiles.
Hey, can you please recommend book for practicing your taught concepts?
Not yet. I'm writing one right now, though, and it should be out in early 2022.
very nice explanation
Thank you! :)
At 4:14 you said that point is the 25th percentile. At 4:58 you pointed at the same point and called it the 20th percentile? I don't get it
It's just rounding. When we divide the data into just 4 quartiles, each one contains approximately 25% of the data, and we do the best we can. However, when we divide the data into smaller quantiles, we can be more precise. Does that make sense?
@@statquest Oh yes I get it!
Sir I have a question?
How to find interquartile range when full dataset not given. Instead Q1,Q3, min and max values are given.
Plz reply...
The interquartile range is the middle 50%, so the values between Q1 and Q3
Thank you! clear and to the point!
Thanks! :)
For the 4th blue Dot, which you are addressing as 25th percentile, there are 3 Dots below that Dot, so the percentage becomes 3/15*100= 20%. If I count that Dot also than will have 4/15*100 = 26.67%. Then why are you calling that Dor the 25th percentage, can't figure it out as you are saying 25th percentile means 25% of the data is equal to or less than that value.
In this part of the example, I call this the 25th percentile or 25% quantile, because the lines have divided the data into 4 equal portions. This makes the lowest line the 25th percentile, the middle line the 50th percentile and the highest line the 75% percentile. Dividing the data into four equally sized groups is just one way to determine quantiles. When you do it this way, you have to do some rounding because, as you noticed, there is no specific point that is exactly above 25% of the data, however, since each group of points is equally sized, we still call it the 25th precentile. Later, when I show how each data point can be considered its own quantile, you can be much more precise in defining the quantiles for each point. Does that make sense? The important thing to remember is that you should really only trust quantiles when there is a lot of data. When you have a lot of data, the differences among ways of determining quantiles are insignificant.
@@statquest For the data : 5, 10, 15, 20, 25, 30, 35, 40, 45, 50.
I have calculated the Q1= 13.75, Q2= 27.5, Q3= 41.25 from the Percentile Formula P=(N+1)/100 and Excel also gives the same results.
But by observing the data, it is 15, 27.5, 40 that devides the data into 4 equal parts, so Q1= 15, Q2= 27.5, Q3= 40 and my CASIO fx-991 EX calculator gives the same results.
So, can you tell why is this absurdity and which answer should I take?
Thanks for the video! Can you make one quantile regression, please
That's on the to-do list, but it might be a while before I get to it.
@@statquest "Waah, waah, waah". :(
Thanks for the explanation. By the way, there is no difference between 0.5 and 50% since 50%=0.5. It's mathematically exactly the same, so both notations can always be used.
That's exactly right! :)
Love watchng ur videos
Thank you so much! I'm really glad to hear you like the videos :)
So quantile and percentile are same
In theory they are different, but practically speaking, they are the same.
Hi Josh, do you have videos on Rstudio?
I don't, but maybe one day I will.
Thank you for the beautiful video :-)
Glad you enjoyed it!
Is it possible for someone to score in the 100th percentile of a standardized test?
It depends on how, exactly, you define percentiles. In this video I demonstrate one of many methods. When there is a lot of data, all of the methods are going to give you very similar results, so it's no big deal. However, if you only have a small amount of data, it's worth trying different approaches.
@@statquest Thanks! So when someone says they scored in the 100th percentile of a large standardized test, did they actually score in a percentile labelled "the 100th percentile" or did they really score in the 99.6th and have it rounded up to 100?
I would like to see videos on time series- ARIMA Model,ACF and PACF plots
this intro is bomb
bam! :)
Super dude. Keep them coming!!!!
Thank you! :)
Anyone know what percentile errors are? Thank you
I'm not familiar with it. :(
@@statquest No problem, thanks for all the great videos!
Hey and thanks a lot for your amazing videos, they've helped me a lot. One question regarding this one: "quantiles are just the lines that divide data into equally sized groups". Isn't that true only for the median? For example, the 75% quantile in your video splits the data into 2 groups, one with 3 observations larger than the 75% quantile and 11 observations smaller than it.
You are correct in that individual quantiles do not all separate the data into equally sized groups - however, all of the quantiles, taken together, divide the data into equally sized groups. So if someone said, "I divided the data with 4 quantiles" you would know that there were 5 equally sized groups.
@@statquest I see, thank you for the clarification! Have a great day.
@@statquest Hi Josh, lets say that we divide the data in 4 quantiles, i,e 25 percentile,50 percentile, 75 percentile and 100 percentile. How can there be 5 regions? Should not there be 4 regions?
@@xoda345 I guess you could debate whether or not the 100 percentile is actually a quartile or not.
Great explanation, just how 7% quantile is equal with 7 percentile! It make sense if they are different. Can you please explain? thanks
I'm not sure I understand the question. This video talks about how there is a strict definition of quantile, which is one thing, and then there is how the term is used in practice, which is different. In practice, the terms quantile and percentile are interchangeable.
Love your videos!!
Then wht is a Difference between Decile and Quantile
Technically a decile divides the data into 10 parts. But practically speaking, people just use quantiles and percentiles.
@@statquest so if I have a data set with series / respective frequency & probability.
1.) Who can I find the worst 5% tail data points ?
2.) Best method or technique - How can I create class / categorical slabs for worst events data points?
First we need to sort the data in this case.
My intention was that sorting would be implied by the way the data is put on the graph. However, you are correct, I probably should have stated it explicitly.
@@statquest No worries Sir. Very well explained tho.
can I ask why 75% quantile is 7.3 instead of 7.5 ?
Because the value of the point, such that 75% of the data is below it, is 7.3.
I never know the meaning of quantile until today.
bam!
can someone explain the last part 1/15 part how it is 1 percentile
At 4:36 I say that 1/15 is the 7% quantile or 7th percentile. Is that what you are asking about?
@@statquestYes , I under stood for Quantile you split the values into 4 equal parts but for percentile you divided into 15 parts how ?
Thanks for Uploading this video!
There are 15 data points, and each one represents a different percentile.
Love your videos
Thanks!
Great explanation, have a nice day :)
Thanks a lot.
Most welcome!
How is percentage different from percentile?
A percentile implies that a percentage of data has smaller values. For example, the 6th percentile implies that 6% of the data has lower values. In contrast, 6% simply means 6% of the data share some feature. In other words, percentile has a narrower definition, and is a specific case of a percentage.
Thank you
:)
is a quartile just a 0.25 quantile ?
I guess the first quartile (or Q1) is a 0.25 quantile. The second (Q2) is 0.5 quantile, the 3rd (Q3) is 0.75 quantile.
OMG so easy THANK YOU SO MUCH 😃😃❤️❤️❤️❤️❤️😍😍😍😍😍
You're welcome 😊
How to calculate???
Thanks
:)
awesome video. It's made my day :)
Thank you! :)
Quantile Dingle
:)
😮
"Quantiles" when the data is considered to be 1.
So, the Median is 0.5th quantile.
"Percentiles" when the data is considered 100.
the Median is the 50th percentile.
"Deciles" when the data is considered 10.
the Median is 5th Decile.
right?
That's correct. But it is very common for people to use the terms interchangeably, so try to be flexible.
I don't quite understand something. At 3:08 you draw a line and say "this is the 0.25 quantile because 25% of the points are less than this line." However, this is not true. There are 15 points total, and only 3 are less than the 0.25 quantile line. Later in the video at 4:55, you draw a line through the same point and call it the 20th quantile/percentile. You then conclude, "I've shown you just one way to calculate the quantiles and percentiles, however, there are many more." Did you not just show us two significantly different ways that would have a meaningful impact on downstream calculations and interpretation? My head is spinning.
So, with a relatively small dataset like the one I used, rounding plays a big role. If we specifically want to label the quantiles, we have to just pick the point that is closest to it, and go with that. Likewise, we do the same for percentiles. As a result, this means we have a lot of variability in what we might call the 25% quantile or the 20th percentile. However, with larger datasets with more points, these problems go away and have relatively little impact in the final analysis.
Superb
Thans! :)
You save me thanks
Thanks!