Explaining nonparametric statistics, part 1

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 พ.ค. 2024
  • The only thing statisticians know how to relax is their assumptions.
    Stay updated with the channel and some stuff I make!
    👉 verynormal.substack.com
    👉 very-normal.sellfy.store

ความคิดเห็น • 61

  • @ln8416
    @ln8416 หลายเดือนก่อน +18

    What a time to be alive... just open TH-cam and get educational quality content to procrastinate from your statistic lectures. Thank you!

  • @2nd_ndr
    @2nd_ndr หลายเดือนก่อน +20

    nonparametrics sounds like a branch of the SCP Foundation

  • @Sarwaan001
    @Sarwaan001 หลายเดือนก่อน +2

    I minored in Statistics and I always wondered how we would handle data that doesn’t follow a certain distribution. I’m glad I stumbled on this video

  • @christian7559
    @christian7559 หลายเดือนก่อน +35

    Bootstrap is life

    • @ilusoriob
      @ilusoriob หลายเดือนก่อน +1

      Bootstrap is love.

    • @isaacnewtonstolemyjoy
      @isaacnewtonstolemyjoy หลายเดือนก่อน

      ​@@ilusoriob Bootstrap is joy

  • @prod.kashkari3075
    @prod.kashkari3075 21 วันที่ผ่านมา

    Cool! Excited for part 2.

  • @sujathaontheweb3740
    @sujathaontheweb3740 6 วันที่ผ่านมา

    You're a great teacher!

  • @chemistrycapital
    @chemistrycapital หลายเดือนก่อน +3

    Loving the pharma twist to this video

  • @blessedowo1958
    @blessedowo1958 หลายเดือนก่อน +1

    Thank you bro. This is helpful

  • @jeffreychandler8418
    @jeffreychandler8418 หลายเดือนก่อน +10

    I've heard some people argue that rank based nonparametric methods are not very useful because you aren't measuring the data, but the ranks of the data, which is a fundamentally different problem.
    What do you make of this debate?
    Ive seen the wasserman "all of nonparametric statistics" cited as providing alternatives and support for that contention.

    • @very-normal
      @very-normal  หลายเดือนก่อน +11

      Disclaimer: I have not thought a lot about this, but here’s my two cents.
      I think it’s a fair issue to bring up, especially when the specific values of the data have real-world meaning. I.e I’d want a hypothesis test on my blood pressure to say something about my blood pressure, not its rank relative to others.
      Overall, it’s still a valuable tool for people because I view working with a transform of the data to be better than totally ignoring the assumptions of a hypothesis test.

    • @robertwilsoniii2048
      @robertwilsoniii2048 15 วันที่ผ่านมา

      ​​​​​The answer is that signed ranks allow us to determine whether or not there are statistically significant effects, even when minorities are present in samples. This is something the Central Limit Theorem can't handle, because when using the Central Limit Theorem you can't tell whether or not an unlikely sample mean is caused by minorities or caused by statistically significant effects. This forces you to either decide that the minorities don't matter or that the majority doesn't matter -- it forcibly discriminates against groups that are different from one another. Signed ranks solve this problem, so that you can test hypothesis without discriminating against minorities.​@@very-normal
      You need to make the judgment call on which to use based on the situation. In personal medicine, parametric tests make sense because your own body doesn't benefit from signed ranks. But anything involving diverse communities of several people of different backgrounds would benefit from non parametric techniques.
      Likewise, machine learning involving classification of diverse objects would benefit from non parametric techniques -- such as k-means clustering.

  • @thiagonunes3183
    @thiagonunes3183 หลายเดือนก่อน +1

    great video

  • @fadhlyazka
    @fadhlyazka 28 วันที่ผ่านมา

    Thanks man. Keep it up.

  • @robertwilsoniii2048
    @robertwilsoniii2048 15 วันที่ผ่านมา

    The Central Limit Theorem *always* applies. But, it *also* marginalizes different groups and minorities in the population. And for that reason, I do prefer non-parametric models.

  • @t0mcc
    @t0mcc หลายเดือนก่อน

    very helpful!

  • @byronwatkins2565
    @byronwatkins2565 หลายเดือนก่อน +1

    The ACTUAL name of the sgn(x) function is 'signum.' It is Latin for something that has a sign or a signature.

  • @huhuboss8274
    @huhuboss8274 28 วันที่ผ่านมา

    Will you cover Dempster-Shafer theory in the future?

  • @Neptoid
    @Neptoid หลายเดือนก่อน +1

    What if you need to watch a TH-cam tutorial? It still would count as a non-work site wouldn’t it?

  • @maloevain5857
    @maloevain5857 หลายเดือนก่อน +4

    Excellent video but at min 4.15 it should not be the density distribution fonction rather than the cdf ? Because the cdf is strictly increasing.

    • @very-normal
      @very-normal  หลายเดือนก่อน +4

      Yeah you’re right, the notation is for a general CDF. I chose to show the PDF instead since it’s easier to see the symmetry but I should have had another bit of notation there to connect that

    • @maloevain5857
      @maloevain5857 หลายเดือนก่อน +1

      @@very-normal yes it's just a detail, anyway the video is super clear

  • @prod.kashkari3075
    @prod.kashkari3075 21 วันที่ผ่านมา

    You should also cover nonparametric regression stuff, like smoothing

    • @very-normal
      @very-normal  21 วันที่ผ่านมา

      I think that would be cool! You mean something like kernels or splines, yeah?

    • @prod.kashkari3075
      @prod.kashkari3075 21 วันที่ผ่านมา

      @@very-normal yes

  • @zacsanchez7520
    @zacsanchez7520 หลายเดือนก่อน

    I'd like to know more about where such a statistic was derived from, I'm not an expert but it seems like a sort of intuitive way(almost back of the envelope-ish) to get the behaviour you described at 6:47

  • @Minisynapse
    @Minisynapse 28 วันที่ผ่านมา +1

    Would love some content on complex linear models, mixed linear models and all that. But maybe you'd have to start with general linear models first.

    • @very-normal
      @very-normal  28 วันที่ผ่านมา +1

      Yeahhh, it might be a while before I get to the more complex linear models, but I’ll definitely get to them since they’re so commonly used

    • @Minisynapse
      @Minisynapse 28 วันที่ผ่านมา

      @@very-normal Subscribed so I can catch those, keep up the good work, love the format of the videos!

  • @georgessakr1
    @georgessakr1 หลายเดือนก่อน

    opinion abput all of nonparametric statistics by Wasserman?
    Also any suggestions on bayesians / monte carlo methods??

    • @very-normal
      @very-normal  หลายเดือนก่อน

      I haven’t read all of it, but I have it as a reference! I like his work overall though.
      Not sure about Monte Carlo, but my usual rec for Bayesian stuff is Bayesian Data Analysis by Gelman

  • @Inter_Are
    @Inter_Are หลายเดือนก่อน +1

    Question!! How could you test if the “typical non-work watch time” was either significantly less than or greater than the 60 min?
    (Let’s say you get mad at your employees for watching on the clock, but in reality they watch near 0 min which is causing the low p-value)

    • @very-normal
      @very-normal  หลายเดือนก่อน +1

      I could specify in wilcox.test that I’d like a one sided test via one of its arguments. By default, it goes with a two-sided test

    • @Inter_Are
      @Inter_Are หลายเดือนก่อน

      @@very-normal Faster response time than most of my professors! I appreciate you and your amazing stats content!! Thanks :)

  • @OneDSystems
    @OneDSystems หลายเดือนก่อน

    which SW do you use to show the formulas with the animations and the graphs, curves, etc?

    • @very-normal
      @very-normal  หลายเดือนก่อน +1

      I use manim for those!

  • @lordzekrom2
    @lordzekrom2 หลายเดือนก่อน +3

    Pretty sure I get an entire class on these and SEMs next semester

    • @very-normal
      @very-normal  หลายเดือนก่อน +1

      Good luck! SEM was rough for me when I took it 💀

    • @samcs8927
      @samcs8927 29 วันที่ผ่านมา

      What is SEM?

    • @very-normal
      @very-normal  29 วันที่ผ่านมา +1

      It stands for “structural equation modeling”, it’s often used with latent variables, which are common in fields like psychology

  • @johanngambolputty5351
    @johanngambolputty5351 28 วันที่ผ่านมา

    Wait a second, was hypothesis testing P(param | data) is proportional to P(data | param) (by bayes) all along? Makes sense I suppose, you do that in maximum likelihood estimation I think, this seems like the instantaneous version, where you're judging one case before moving to a more likely param candidate? (single cost evaluation rather than whole optimisation?)

  • @lexinwonderland5741
    @lexinwonderland5741 หลายเดือนก่อน

    What would you recommend for students who can't afford a license to use R or MATLAB environments?

    • @very-normal
      @very-normal  หลายเดือนก่อน +6

      R is free tho! Also you could go with Python

    • @jeffreychandler8418
      @jeffreychandler8418 หลายเดือนก่อน +4

      R is completely free for everyone, same with Rstudio
      edit: so is Python, Julia, and VSCode

  • @joelbaptista9725
    @joelbaptista9725 หลายเดือนก่อน

    I don't have a lot of knowledge in statistics, so this question might sound dumb. The only thing we've assumed about the distribution to perform this test is that the distribution is symmetric, right?

    • @very-normal
      @very-normal  หลายเดือนก่อน

      Yes! And also that it’s continuous

  • @MKhan-zo8xo
    @MKhan-zo8xo หลายเดือนก่อน +4

    for the algo!!

    • @very-normal
      @very-normal  หลายเดือนก่อน

    • @ufuoma833
      @ufuoma833 หลายเดือนก่อน

      for the algo!!!!

  • @piaveipvsenlawp7402
    @piaveipvsenlawp7402 หลายเดือนก่อน

    Love your vids, but isn't it theta-nought, as in zero, not theta-not

    • @very-normal
      @very-normal  หลายเดือนก่อน

      lol yeah I know, I tried going for a pronounciation type thing but I don’t think it worked out 😅

    • @jamesdavis3851
      @jamesdavis3851 หลายเดือนก่อน

      @@very-normal I took it as a choice reflective of the no-frills, and approachable non-elitist attitude toward a difficult subject where the content is what matters

  • @mop4193
    @mop4193 หลายเดือนก่อน

    Nonparametric is not applied a lot practically and does not seem to be preferable that much in research papers. Why? Parametric analysis seem the go-to typically

    • @very-normal
      @very-normal  หลายเดือนก่อน

      This is my opinion, but I think a lot of it comes from unfamiliarity and unawareness that the methods even exist, especially among non-statistician researchers. I’ve seen some researchers use it, but it is not that common. There are other reasons concerning power & efficiency, but I think most people just don’t think about them

    • @tylersagendorf1453
      @tylersagendorf1453 หลายเดือนก่อน

      The interpretation of a non-parametric test also tends to be less intuitive and useful to researchers than a parametric test. Bootstrapping would be a good alternative, though most people tend to forget they have it in their bag of tools (including myself)

  • @qqq3230
    @qqq3230 หลายเดือนก่อน +2

    you are 1 week too late i already flunked my nonparametric statistics midterm exam 😅

    • @very-normal
      @very-normal  28 วันที่ผ่านมา +1

      my b, i gotchu for the final

  • @busbymath
    @busbymath 26 วันที่ผ่านมา

    fix your mic