Quantile Normalization, Clearly Explained!!!

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 ก.ย. 2024
  • Quantile Normalization lets us compare data that has all kinds of noise in it. It sounds fancy but is really super simple. Essentially you just sort each sample data from high to low. If your samples are rows, you then replace the values with the average of each row. BAM!
    For a complete index of all the StatQuest videos, check out:
    statquest.org/...
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumr...
    Paperback - www.amazon.com...
    Kindle eBook - www.amazon.com...
    Patreon: / statquest
    ...or...
    TH-cam Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshi...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer....
    ...or just donating to StatQuest!
    www.paypal.me/...
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    #statquest #quantile

ความคิดเห็น • 116

  • @statquest
    @statquest  2 ปีที่แล้ว +1

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @mzheng85
    @mzheng85 ปีที่แล้ว +6

    I searched for an hour to get a clear understanding of normalization and you explained it in 30 seconds. Thank you!

  • @tanishasharma3665
    @tanishasharma3665 3 ปีที่แล้ว +19

    Short, well-explained and so much better than the confusing webpages I was wasting my time on. Saw your video on quantiles and percentiles as well! Thank you so much for these videos, they help me both with my job and statistical knowledge!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Glad it was helpful!

  • @alirezaforoozani7833
    @alirezaforoozani7833 5 ปีที่แล้ว +16

    Oh how I wish you were my maths teacher, you make it seem so easy! I thank and salute you, sir!

    • @statquest
      @statquest  5 ปีที่แล้ว +2

      Hooray! I'm glad you like the video. :)

  • @charliejin8620
    @charliejin8620 4 ปีที่แล้ว +5

    thanks! quite a simple and clear explanation! much much better than our lecturers

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you! :)

  • @RDFannin3
    @RDFannin3 4 ปีที่แล้ว +9

    Good grief, this is good stuff! I wish there was something as clear for learning R code

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      I have a few videos on R code listed on my website: statquest.org/video-index/

  • @lunapeverell3602
    @lunapeverell3602 2 ปีที่แล้ว +1

    As always, this is the only way statistics comes easy and non unbearable to me. Thanks!

    • @statquest
      @statquest  2 ปีที่แล้ว

      Happy to help!

  • @hieuthepunk
    @hieuthepunk ปีที่แล้ว +1

    This is the clearest answer i get about normalization. Thank you

  • @moomoocheng6009
    @moomoocheng6009 5 ปีที่แล้ว +3

    very great vedio, explained so clear that even I can understand,I am pretty new to bioinformatics, I am very confused about the relationship between quantile normalization and mas5 or rma normalization.

  • @severtone263
    @severtone263 5 หลายเดือนก่อน +1

    Quick, easy and fun. Thank you Josh!

    • @statquest
      @statquest  5 หลายเดือนก่อน

      Thanks!

  • @afs208
    @afs208 5 ปีที่แล้ว +5

    my mum asked what am I watching when she heard the intro, didn't know how to say it's an educational video

  • @blenderwang5061
    @blenderwang5061 2 ปีที่แล้ว +2

    You did a great explanation, man! Thank you!

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      Glad you liked it!

  • @veeranagoudayaligar
    @veeranagoudayaligar 6 ปีที่แล้ว +1

    It looked complicated, after watching your video, Umm, very simple. Thanks a lot.

  • @shuaishigao6356
    @shuaishigao6356 6 ปีที่แล้ว +1

    Very helpful, that's exactly what I'm looking for! Thanks Joshua.

  • @lucarauchenberger628
    @lucarauchenberger628 2 ปีที่แล้ว +1

    finally, I got it now!!

  • @cjgilmore283
    @cjgilmore283 ปีที่แล้ว +1

    THANK YOU you're amazing

  • @amberrose8965
    @amberrose8965 ปีที่แล้ว +1

    I appreciate this!

  • @leesweets4110
    @leesweets4110 2 ปีที่แล้ว

    And if the data sets in each sample have different numbers of genes? How do you quantile normalize between sets of different sizes?
    My first thought before starting the real explanation of this video... was that we'd simply scale and shift the data in each sample according to their own standard deviations and means. This would preserve order, fix the means, and preserve relative instensities within each sample.

    • @statquest
      @statquest  2 ปีที่แล้ว

      Regardless of the number of genes in each sample, you can match the quantiles.

    • @leesweets4110
      @leesweets4110 2 ปีที่แล้ว

      @@statquest but that doesn't explain where to place the point on the graph.

    • @statquest
      @statquest  2 ปีที่แล้ว

      @@leesweets4110 Ah. Ok, now I understand the question better. Since the datasets have different sizes, you need to look at quantile normalization with missing values. I believe one commonly used approach is to interpolate the missing values first, to equalize the datasets, and then apply quantile normalization as described in this video.

  • @muhammadabdullahnabeel6039
    @muhammadabdullahnabeel6039 2 หลายเดือนก่อน

    @StatQuest Doesn't this normalization remove information? For example, in sample 2, the levels of expression are too high compared to sample 1 and we can't conserve this information.

    • @statquest
      @statquest  2 หลายเดือนก่อน

      Yes, some information is lost, but we gain the ability to make a comparison that we didn't have before.

    • @muhammadabdullahnabeel6039
      @muhammadabdullahnabeel6039 2 หลายเดือนก่อน +1

      @@statquest Thanks for the reply! I am still learning and transitioning to computational biology. I will further research improved methods if there are any.

  • @oanaflorean83
    @oanaflorean83 3 ปีที่แล้ว +1

    Awesome BAM!! Thx buddy :)

  • @mrcoolgs100
    @mrcoolgs100 6 ปีที่แล้ว +1

    very good explanation! thank you!

  • @jdm89s13
    @jdm89s13 5 ปีที่แล้ว +1

    So what if I have microarray data for different cohorts, and I am not worried about the specific intensity values, but just want to compare gene expression level across cohorts (i.e. which samples express a certain gene high versus those which express it low)? Would quantile normalization be a valid way to scale the data prior to clustering?

    • @statquest
      @statquest  5 ปีที่แล้ว

      Quantile normalization is commonly used with microarray data, so I would give it a try.

  • @msumode4493
    @msumode4493 4 ปีที่แล้ว

    Thank you so much Josh.

  • @etzhaim
    @etzhaim 5 ปีที่แล้ว +3

    Thanks for this video. A question: Why perform quantile normalization instead of z-scores?

    • @statquest
      @statquest  5 ปีที่แล้ว +6

      Great question! I think the big difference is quantiles allow you to compare ranks (i.e. quantiles tell us which measurement was the largest, or the 75th largest etc), and z-scores are more quantitative (how many standard deviations away from the mean a given data point is). Test score are often reported using quantiles since they make it easy to know how your test ranked among the others. If I said your test score was the top quantile, then you would know your test score was the best. In contrast, if I told you your test score was two standard deviations above the mean, you wouldn't know if it was the best or not... Does that make sense? There are also statistical tests that work well with rank data (quantiles), and those might be more appropriate in certain situations - but explaining all that detail might be better done in a video rather than a comment.... :)

    • @pratapseshachalam2859
      @pratapseshachalam2859 5 ปีที่แล้ว +1

      Nice video. the order of genes is preserved. My doubt is gene expression is shown on same level among the samples after quantile normalisation. Then, how could you see the difference among the sample for the gene?

    • @statquest
      @statquest  5 ปีที่แล้ว +2

      @@pratapseshachalam2859 Like I said in the previous response, there are statistical tests that work with rank data, which is what you have after quantile normalization. That's a subject for another StatQuest. In the mean time, check out the mann-whitney U-test: en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test

    • @pratapseshachalam2859
      @pratapseshachalam2859 5 ปีที่แล้ว

      @@statquest Thanks a lot :)

    • @HR-yd5ib
      @HR-yd5ib 5 ปีที่แล้ว

      @@statquest , Actually since the ranks are not changed by QN you can do rank test just as well on the original data. I think the point is that one assumes that the total mRNA distribution in a cell/sample doesn't change with condition. Hence all mRNA distributions from all samples are made identical. This should improve subsequent t-tests, FC computations etc.

  • @user-ib9lp8zx6x
    @user-ib9lp8zx6x 6 ปีที่แล้ว

    Hej, Joshua. Could you talk about the statistical methods that are used in single-cell RNA-seq, especially the normalization methods we used and the difference in analysis between bulk-RNA-seq and single-cell RNA-seq.

    • @user-ib9lp8zx6x
      @user-ib9lp8zx6x 6 ปีที่แล้ว

      Yeah, I have gone through some online tutorials about single-cell RNA-seq. But most of them just talk about how to run the code and the subsequent fancy data visualization. The basic statistical methods are more important especially considering there are still quite a lot of differences between bulk and single-cell. Very looking forward to your following videos!!!

  • @zhaowu3193
    @zhaowu3193 2 ปีที่แล้ว

    Thank you for this simple yet illustrative example pf quantile normalization. I would like to know what happened if we have missing values in some of the samples. Can we still do the quantile normalization ?

    • @statquest
      @statquest  2 ปีที่แล้ว

      That's a good question. I'm pretty sure you would need to impute the missing values first.

  • @paveldvorak2014
    @paveldvorak2014 5 ปีที่แล้ว +1

    @Josh, which software do you use for these videos? It looks like Powepoint, but some more advanced version 😄 👍👍

    • @statquest
      @statquest  5 ปีที่แล้ว +1

      I started out using PowerPoint (and this video was done with PowerPoint). But PowerPoint doesn't work well on my computer so I switched to Apple's "Keynote" program. Now I like Keynote a lot more than powerpoint.

  • @fkhan4504
    @fkhan4504 6 ปีที่แล้ว +1

    Thanks for making the video

    • @statquest
      @statquest  6 ปีที่แล้ว

      I'm glad you like it! I'll make more! :)

  • @saptashwachatterjee6875
    @saptashwachatterjee6875 4 ปีที่แล้ว

    Please do a video on quantile regression

  • @ranjeetkumar273216
    @ranjeetkumar273216 6 ปีที่แล้ว +1

    Hi, Nice Explanation. Could you talk on PCA vs Factor analysis difference?

    • @statquest
      @statquest  6 ปีที่แล้ว +1

      One day I'll do that. Right now I'm gearing up to cover lasso and ridge regression techniques. Those videos should be out by the end of September.

  • @danielwiczew
    @danielwiczew 3 ปีที่แล้ว

    A question: couldn't we just normalize the data in the y axis, by turning it into 0 mean and 1 variance? Then the scale on the y axis would be 0 ... 1.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Sure, you could do that, but that would be a different type of normalization. There are lots of ways to normalize data, and quantile normalization is just one of them.

  • @adetayoaborisade9346
    @adetayoaborisade9346 3 ปีที่แล้ว +1

    Double Bam

  • @dorjexx
    @dorjexx 4 ปีที่แล้ว

    Thank you, Josh. BUT, I got a question about 'BUT': at the end of the video, you said: "after quantile normalization, the values for each sample are the same... BUT, the original gene orders are preserved."
    if the values are the same, the orders are the same, right? So, why use 'but'?

    • @statquest
      @statquest  4 ปีที่แล้ว

      At 3:43 you see that the original values are on the left and the normalized values are on the right. The normalized values are all equal for each sample (1, 2 and 3), this is what I meant by "same". These equal values, however, are different from the original values on the left. So the "but" means that even though we changed the values to be all equal, the order of the values on the right is the same as the order of the values on the left.

    • @dorjexx
      @dorjexx 4 ปีที่แล้ว

      @@statquest Than you very much, Josh. Now I see. ;)
      Cheers.

  • @shubha1Ana2
    @shubha1Ana2 2 ปีที่แล้ว

    Hello Sir, I have a doubt. Is image segmentation for finding size, shape , pleomorphism of nuclei always necessary to classification of H&E WSI? If we you deep learning networks, can we pass HE images( may be rescaled) as it is without segmentation? Kindly answer if possible

    • @statquest
      @statquest  2 ปีที่แล้ว

      My series of videos on neural networks, which includes image classification, might help: th-cam.com/video/CqOfi41LfDw/w-d-xo.html

  • @mpat53
    @mpat53 4 ปีที่แล้ว

    HI John, I have a question
    1) I have been provided with a table of quantile normalized read data (RNA seq). I want to progress using the program IDEP online . Should I enter this data as 'read count data' or as 'normalised expression values eg RNA seq FPKM, microarray etc'
    I think it's the second one as it is quantile normalised but I'm not fully sure as it's not FPKM.. thanks

    • @statquest
      @statquest  4 ปีที่แล้ว

      I think the "etc" in the second option covers the quantile normalization. You can always email support or the authors just to be sure.

  • @grantsmith3653
    @grantsmith3653 4 ปีที่แล้ว +1

    Great vid!

  • @urjaswitayadav3188
    @urjaswitayadav3188 6 ปีที่แล้ว

    Great video. Thank you!

  • @stephenpower6876
    @stephenpower6876 3 ปีที่แล้ว

    Hi Josh, great video. I'm very new to bioinformatics / statistics; I've been provided with a massive RNASeq dataset, and I've no idea if the data is quantile normalised or not. Do you know of any handy way I can check to see if quantile normalisation has been performed?

    • @statquest
      @statquest  3 ปีที่แล้ว

      Do all of the highest expressed genes in each sample have the exact same value? If so, it is probably quantile normalized.

  • @omarabdelrahman3739
    @omarabdelrahman3739 3 ปีที่แล้ว

    How about a quantile regression video?...PLEASE?

  • @ai1888
    @ai1888 6 ปีที่แล้ว

    At 3:32, one thing I noticed is that the the red colored gene for samples 1 and 3 now have the exact same intensity values. In reality this is almost certainly not true. I notice this a lot when performing RMA for microarrays where Quantile normalization compresses smaller fold change differences. Is this just a caveat of the normalization method we just have to accept?

    • @ai1888
      @ai1888 6 ปีที่แล้ว

      I just checked and it does perform a strict quantile normalization just the way you described. Following that it fits a linear model to the normalized data and performs a median polish.

    • @ai1888
      @ai1888 6 ปีที่แล้ว

      Hooray!

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 4 ปีที่แล้ว

    just wondering, could we not normalize each data set into standard normal by using its respective mean and standard deviation?

    • @statquest
      @statquest  4 ปีที่แล้ว

      That's definitely a common way to normalize things.

  • @alecvan7143
    @alecvan7143 4 ปีที่แล้ว +1

    Amazing!

  • @ChadMc74
    @ChadMc74 4 ปีที่แล้ว

    Is this similar to blocking?

    • @statquest
      @statquest  4 ปีที่แล้ว

      I'm not sure what you mean. Can you elaborate on your question?

  • @hengdezhu2832
    @hengdezhu2832 5 ปีที่แล้ว

    Thank you. Got a question, the same color of each sample represents the same gene measured from different experiment, is it right?

    • @statquest
      @statquest  5 ปีที่แล้ว

      Yes. One color per gene.

    • @hengdezhu2832
      @hengdezhu2832 5 ปีที่แล้ว

      @@statquest what if different samples have different number of gene, how to do quantile normalization? For example, Sample1 has 3 genes, A, B,C. Sample2 has 4 genes, A, B,C,D. Can I set D gene in Sample1 to zero and do the quantile normalization?

    • @statquest
      @statquest  5 ปีที่แล้ว

      @@hengdezhu2832 That might work, but, to be honest, I'm not sure is best in this situation.

    • @hengdezhu2832
      @hengdezhu2832 5 ปีที่แล้ว

      @@statquest Ok, thank you so much!

  • @TheEbbemonster
    @TheEbbemonster 5 ปีที่แล้ว +1

    What is the purpose for doing this?

    • @statquest
      @statquest  5 ปีที่แล้ว

      It helps normalize data when you have a lot of technical noise.

    • @SergeySenigov
      @SergeySenigov 9 หลายเดือนก่อน

      Say, three parfume experts rate 4 new parfumes.
      It is known that absolute scores are less reliable than relative.
      So we want to average equally ranked absolute parfume scores and preserve relative.
      Now suppose we have got very little distance between 2nd and 3rd ranks. So we cannot confidently choose between blue (ranks 1, 2, 2) and yellow (2, 1, 3) cause ranks 2 and 3 are near. Presumably we should engage the forth expert.
      However if the distance between 2nd and 3rd ranks is large we confidently choose blue.

  • @gpgor
    @gpgor 4 ปีที่แล้ว

    How about median normalization?

  • @eiderdiaz7219
    @eiderdiaz7219 4 ปีที่แล้ว +1

    i love it

  • @hamade7997
    @hamade7997 4 ปีที่แล้ว

    you area a fucking king.

  • @illiap3865
    @illiap3865 4 ปีที่แล้ว

    But doesn't it erase information about how measurements compare to each other in one sample?

    • @statquest
      @statquest  4 ปีที่แล้ว

      You still retain information about rank (i.e. gene X is higher than gene Y), but you can no longer quantify the difference. However, you wouldn't quantile normalize in the first place if you were only interested in the values within a single sample.

  • @abdrnasr
    @abdrnasr 4 ปีที่แล้ว

    Is there an example where this can be helpful ?

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      I believe quantile normalization was invented for microarrays (a method for measuring gene expression). However, I've seen it used in other situations when people wanted a non-parametric way to remove batch effects.

  • @MrWater2
    @MrWater2 7 หลายเดือนก่อน

    Pros and cons of this normalization?

    • @statquest
      @statquest  7 หลายเดือนก่อน +1

      Pros, no worries about outliers. Cons? You loose a lot of nuance in the data.

    • @MrWater2
      @MrWater2 7 หลายเดือนก่อน

      @@statquest Yep! But what I don't understand is that the data (values) after the trasnformation is the same across variables? It has no sense to me probably I missunderstood something

    • @statquest
      @statquest  7 หลายเดือนก่อน +1

      @@MrWater2 In this case, what is important is the relative position and ranking of each measurement, rather than it's actual value. Lots of non-parametric statistical tests can be performed on ranks.

    • @MrWater2
      @MrWater2 7 หลายเดือนก่อน

      Aha, perfect. But I can't use as a preprocessing step in statiscal learning I guess because the transformed matrix must be I'll conditioned. Right?

  • @kinzarian8926
    @kinzarian8926 ปีที่แล้ว +2

    Merci !

    • @statquest
      @statquest  ปีที่แล้ว +1

      Hooray!!! Thank you for supporting StatQuest!!! BAM! :)

  • @ayoubbakar7907
    @ayoubbakar7907 5 ปีที่แล้ว +2

    triple baaaam

    • @statquest
      @statquest  5 ปีที่แล้ว

      That's right! :)

  • @Barbirose
    @Barbirose 4 ปีที่แล้ว

    Ez pz so ez ur explenation sucks tho