The 5 Must-Know Distributions for Data Scientists (not what you think)

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 ก.ย. 2024

ความคิดเห็น • 65

  • @gsimmons4330
    @gsimmons4330 ปีที่แล้ว +22

    I love your channel’s intersection of higher level math with stats and data science! Feels like no one does it quite like you

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว +1

      Thanks 😊

    • @supersql8406
      @supersql8406 ปีที่แล้ว

      Yeah and he teaches very well, too! When I want to understand a specific section of advanced math... most channels over simplified the higher level where it's either become unusable or they explain it the same way like those text books where its just waaaay above most people's'' head.

  • @anishbhanushali
    @anishbhanushali ปีที่แล้ว +10

    dude I'm so grateful that this channel exists !!

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว +1

      Thanks! Grateful to you for watching

  • @DaneDuPlessis
    @DaneDuPlessis 2 หลายเดือนก่อน

    Great summary of the "main" distributions. Thanks.

  • @sharks1349
    @sharks1349 ปีที่แล้ว +8

    Teaching the intuition behind Data science and math in general, I find to be much more important than people might think

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว +1

      Thanks! I think so too

  • @platoh
    @platoh 3 หลายเดือนก่อน

    This is probably the best use of 8.5 minutes I'll see all day. Love the insights, concise and organized delivery, and relatable examples.

  • @zenith_journey
    @zenith_journey ปีที่แล้ว +1

    Love this channel too! I love discussions about intuitions… it’s so easy to get lost in statistical jargon and it’s refreshing to step back and put things into perspective.

  • @lashlarue7924
    @lashlarue7924 8 หลายเดือนก่อน

    Great video, it would be EXTREMELY helpful to me as a perpetually aspiring data scientist if you could show how you might go about fitting a distribution to your data and using it in a simulation exercise. (I have an idea of how I might go about doing this, but I'm acutely aware that others might have better insights!)

  • @AndyInTheAir
    @AndyInTheAir ปีที่แล้ว +1

    Excellent work. The casual discussion is great to explain the concepts for newbies in data science or even the old dogs who want to learn new tricks. The most knowledgeable presenters are the ones who can explain something to a 5 year old. I'm also glad you have some content that formalizes these concepts as well. Always very helpful and though provoking.

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      Thanks for the thoughtful words!

  • @maxvaessen
    @maxvaessen ปีที่แล้ว

    Awesome stuff, very useful! Thanks ❤

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      Thanks for watching!

  • @ching-tsungderontsai2750
    @ching-tsungderontsai2750 ปีที่แล้ว

    Amazing content that links stats and real world data. Greatly appreciate your work and clear examples!

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      Glad it was helpful!

  • @bin4ry_d3struct0r
    @bin4ry_d3struct0r ปีที่แล้ว

    This is the most informative video on the intuition behind distribution interpretation I ever watched!
    For the "pointy" distribution, I've just thought of them as Gaussians with low variances.

  • @pectenmaximus231
    @pectenmaximus231 ปีที่แล้ว

    I definitely turn noisy data into sensible data by making bins. This is especially true with frequency per day. At the daily level, picking out trend is difficult, but grouped to several months, or even several years, really helps create some
    worthwhile numbers.

  • @danieljaszczyszczykoeczews2616
    @danieljaszczyszczykoeczews2616 ปีที่แล้ว

    That video is really very useful! Please keep on telling about intuition behind the data distributions! That’s really hard to find such explainations in regular books or any other formal sources of data

  • @seanpitcher8957
    @seanpitcher8957 ปีที่แล้ว

    Love that last one. I use QQ plots more, makes more sense to me, but I've def seen these. Thanks for providing well explained content on a higher level than many do.

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      Thanks for the input and thanks for watching!

  • @bokehbeauty
    @bokehbeauty ปีที่แล้ว

    Im excited that you teach the message “what does it tell me” and explain by real life 🎉.

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      Thanks!

    • @bokehbeauty
      @bokehbeauty ปีที่แล้ว

      @@ritvikmath Under which of these types would you put the distribution of income in US, fat tail and big pick at the upper end?

  • @133839297
    @133839297 ปีที่แล้ว

    I like your teaching style.

  • @asjsingh
    @asjsingh ปีที่แล้ว

    Brilliant description of distributions

  • @sajanator3
    @sajanator3 ปีที่แล้ว

    I absolutely love this channel

  • @jfndfiunskj5299
    @jfndfiunskj5299 ปีที่แล้ว

    Another fantastic video. Nice job.

  • @karunamayiholisticinc
    @karunamayiholisticinc ปีที่แล้ว

    One of the best videos on Data science makes us understand data better

  • @НиколайНовичков-е1э
    @НиколайНовичков-е1э ปีที่แล้ว

    Thank you! I realy like how you can explain everything simple way

  • @shadowblack5455
    @shadowblack5455 ปีที่แล้ว +1

    I think that pointy distribution is modelled as a cauchy distribution and the skewed distribution is what you call a pareto distribution or an exponential distribution

    • @galenseilis5971
      @galenseilis5971 ปีที่แล้ว

      A Cauchy distribution looks appropriate in this case. There are other "pointy" distributions to keep in mind if a Cauchy does not fit well, such as the Laplace distribution.

  • @ireoluwaTH
    @ireoluwaTH ปีที่แล้ว

    Practicality and 'rule of thumb'... You excel at that sort of stuff.
    👌🏽

  • @enesdedovic
    @enesdedovic ปีที่แล้ว

    Pretty nice. Add another one on how to model those distributions.

  • @mindasb
    @mindasb ปีที่แล้ว +1

    Tweedie distribution baby! Can be seen in some regression datasets where the government / local authority restrics max salary/price or whatever (California housing).

  • @galenseilis5971
    @galenseilis5971 ปีที่แล้ว +1

    I don't think most data scientists have the additional time to delve into geometry, but geometry is very much about the "shape" of mathematical objects.

  • @abcpsc
    @abcpsc ปีที่แล้ว

    Just one realization of that pointy distribution from my work: it happened to a variable that is regulated but a not so powerful regulator. In my case, wind velocity in a tunnel (so signed and 1D) that is being regulated but some not so powerful fan

  • @pixeloverflow
    @pixeloverflow ปีที่แล้ว

    This was super helpful! Thanks for sharing!

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      Thanks for watching!

  • @16876
    @16876 ปีที่แล้ว

    would be nice if you'd expand on how to analyze these dists

  • @Joy_jester
    @Joy_jester ปีที่แล้ว

    Love the content. I started studying data science and your videos helped me a lot. A small suggestion/ request. For each concept/video that you are covering, can you also share some resource that you followed? Thanks

  • @ioannisnikolaospappas6703
    @ioannisnikolaospappas6703 ปีที่แล้ว

    Thank u for your work brother!🙏

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      Thanks for watching!

  • @galenseilis5971
    @galenseilis5971 ปีที่แล้ว

    1:44 I don't agree that a max GPA of 4 is a physical limitation in any usual sense of physics. If it is, then by what physical principle? Conservation of angular moment?
    But I appreciate the video overall. These are definitely common cases in data science. The video is both information and practical.
    Lately I have been thinking about counterfactual inference when there is an unknown upper bound on a facility's capacity. The bound will not change when intervening on the rates, but how the shape of the distribution will change with respect to the boundary and the expectation of the intervention distribution is non-obvious to me. From the modelling side I could derive a truncated distribution. Or I could derive the distribution of MAX(X, c) where c is a parameter or hyperparameter, although in NUTS/Gibb/MH sampling I find that such bounds are sampled poorly (i.e. lots of divergences) when they're treated as a parameter. Or you can have mixture distribution that transitions from "away-from-boundary behaviour" to "near-to-boundary behaviour".

  • @juaneshberger9567
    @juaneshberger9567 ปีที่แล้ว +1

    Can you make a video on data engineering vs machine learning engineering vs data scientist vs data analyst? Great vid btw!

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      Thanks for the suggestion!

  • @imtryinghere1
    @imtryinghere1 ปีที่แล้ว +1

    interesting ideas, but would be more helpful if you had a list of action items w/ each distribution.

  • @lorenzoplaserrano8734
    @lorenzoplaserrano8734 ปีที่แล้ว

    the power of this video 🔥

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      The power of my viewers 🔥

  • @Eta_Carinae__
    @Eta_Carinae__ ปีที่แล้ว

    Isn't the pointy one the plot of distances away from a SLR line with L1 cost? I can't precisely remember the name of the curve, but it's not the curve shown.

  • @LanteLuthuli
    @LanteLuthuli ปีที่แล้ว

    Has Bard/ChatGPT impacted your work in any way? How did you land up in DS?

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      Hey thanks for the questions! We will be covering those topics very soon in future videos

  • @azimuth4850
    @azimuth4850 ปีที่แล้ว

    👍

  • @anandiyer5361
    @anandiyer5361 ปีที่แล้ว

    always great to watch your videos! Is there a way to contact you directly @ritwikmath?