Jensen's Inequality : Data Science Basics

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ธ.ค. 2024

ความคิดเห็น •

  • @robharwood3538
    @robharwood3538 ปีที่แล้ว +52

    Couple of nitpicks:
    1) Technically, the type of the combination of x1 and x2 is not _just_ a 'linear' combination, it is specifically a 'convex' combination. A convex combination is a linear combination where a) the coefficients of the combination add up to 1, and b) all the coefficients are non-negative (i.e. >= 0). A general 'linear' combination does not have these restrictions in general, so it is helpful to use the more specific term, 'convex' combination, to reduce potential misunderstanding/misuse of the theorm/inequality. [* See Note below.]
    2) Although it is valid and correct to write the convex combination as you have, as t * x1 + (1-t) * x2, it is more common/typical to write it with the (1-t) part first, as (1-t) * x1 + t * x2. This way, it is very easy to see that when t=0, the value is x1, and when t=1, the value is x2. You can then think of t as like a 'slider' parameter, with t=0 representing the 'start', and t=1 representing the 'end', and sliding t between 0 and 1 gives a nice linear slide on the x-axis between x1 and x2. Even better, name them x0 and x1, and it makes the connection even more clear.
    [* Note:
    The in-between case, when you have a linear combination where all the coefficients add to 1, but they are *not* restricted to be non-negative, is called an Affine Combination. For example, in this case, the two coefficients, 1-t and t add up to:
    (1-t) + t = 1 + (t - t) = 1 + 0 = 1
    In an affine combination, t would be allowed to be negative for example, t = -3, then (1-t) would be 1-(-3) = 4, but the sum would still be: -3 + 4 = 1.
    Affine combinations are useful for example for writing a line equation in terms of two given points on the line, among many other uses. So:
    Convex Combo ⊆ Affine Combo ⊆ Linear Combo
    Meanings / Constraints:
    Linear: The coeffs are required to be only scalars (or scalar variables, such as t in this case).
    Affine: It's Linear *and* the coeffs all add up to exactly 1.
    Convex: It's Affine *and* the coeffs are all non-negative, >= 0.
    Took me a while to wrap my head around these three types of combinations when I first ran across them, so I thought it might help some folks to have them all spelled out.]

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว +19

      huge thanks for this comment! I really appreciate the nuanced explanation and comments on things that could have been improved.

    • @Aishiliciousss
      @Aishiliciousss 7 หลายเดือนก่อน

      This was helpful, thank you.

  • @eyuelmelese944
    @eyuelmelese944 11 หลายเดือนก่อน +1

    This channel is so underrated! I just graduated from an msc in DS, and still coming back here for concepts. Thanks!

  • @matthewkumar7756
    @matthewkumar7756 ปีที่แล้ว +3

    Incredibly accessible and insightful explanation. Keep the videos coming!

  • @husseinjafarinia224
    @husseinjafarinia224 ปีที่แล้ว +2

    I stunned how wonderfully you explained :D

  • @prashlovessamosa
    @prashlovessamosa ปีที่แล้ว +4

    Thanks for providing awesome knowledge is such a easy way.

  • @sergioLombrico
    @sergioLombrico ปีที่แล้ว

    Such a good explanation!

  • @davidmurphy563
    @davidmurphy563 ปีที่แล้ว +1

    6:00 Ah, in game dev we call this "lerping", short for linear interpolation. 0 = point A, 1 = point B and 0.5 is exactly in between. You use it soooo much. There are other sorts too, cubic is a nice one but it's not a straight line obvs.
    Anyway, great explanation so far. Subbed!

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      Thanks! Love hearing about applications in other fields

  • @Trubripes
    @Trubripes 8 หลายเดือนก่อน

    I proved it by defining delta as the change in KL resulting from shifting mass m from x1 to x2, then taking the second order derivative of that function.
    This is way better through.

  • @sftekin4040
    @sftekin4040 ปีที่แล้ว

    That was really cool! Thank you!

  • @starship9629
    @starship9629 4 หลายเดือนก่อน

    On your second sheet, you generalize Jensen's inequality to linear combinations which is incorrect. It specifically has to be convex combinations (of convex functions). I think I understand where this confusion arises from. You correctly stated that the inequality apples for the average, or expected value which itself is a not linear combination but is when applied on a random variable X.
    Consider a random variable
    X
    X that can take values 1, 3, and 5 with probabilities 0.2, 0.5, and 0.3, respectively. The expectation E(X)=1⋅0.2+3⋅0.5+5⋅0.3=0.2+1.5+1.5=3.2. Here, E(X) is a convex combination of the values 1, 3, and 5 with weights 0.2, 0.5, and 0.3 (they sum up to 1 and are between 0 and 1).
    Instead of saying Jensen's inequality applies for linear combinations, the most general way is to say it applies to all integrals suitable for measures on a general measure space. The expected value of a random variable is a special case where the integral is on a probability measure.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w ปีที่แล้ว

    Yes, I remember the chopstick helper function part of the proof. :)

  • @tmwtpbrent14
    @tmwtpbrent14 ปีที่แล้ว

    Jensen's is my favorite inequality.

  • @mohammadrezanargesi2439
    @mohammadrezanargesi2439 ปีที่แล้ว

    Gentleman,
    Can you explain how log of ratio of two pdf got to be convex?
    In theory the convex function of another non-decreasing function is convex, if I'm not mistaken.
    So there need to be a proof that the ratio of the pdfs have got a positive derivatives.
    Am I right?

  • @fran9426
    @fran9426 ปีที่แล้ว

    Another great video! Had never heard of KL or Jensen’s inequality. Would you say the latter is predominately useful for understanding proofs or do you ever use it as a Data Scientist in the course of building models? Watching the video I thought maybe it could be the case that Jensen’s inequality is useful in estimating wether or not the target (the model output) is convex; if it’s not convex it would tell us that we can’t assume a simple minimizer will work on this problem since it might get stuck in a local minima.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w ปีที่แล้ว

    Cool. Really helpful.

  • @counting1234
    @counting1234 ปีที่แล้ว

    Thanks for the great videos! Do you think you could share references? I'd like to share some of your videos with my team and any published references would be very helpful.

  • @ParijatPushpam
    @ParijatPushpam ปีที่แล้ว

    Can we say that constant function is a convex function? [Because it always follows jensens inequality ..(=)]

  • @phamminh9806
    @phamminh9806 8 หลายเดือนก่อน

    Could you recommend me some books related to data science

  • @bilalarbouch5849
    @bilalarbouch5849 ปีที่แล้ว

    Great video , would you mind making a video about james-stein paradox , thanks 🙏

  • @Hassan_MM.
    @Hassan_MM. ปีที่แล้ว

    Please make some Intro about Probability limit (Plim) ❓️❓️

  • @welcomethanks5192
    @welcomethanks5192 ปีที่แล้ว

    next topic fisher information?

  • @Set_Get
    @Set_Get ปีที่แล้ว

    thank you

  • @r00t257
    @r00t257 ปีที่แล้ว

    you are God! many your videos are gorgeous!

  • @shellycaldwell9328
    @shellycaldwell9328 ปีที่แล้ว

    Promo'SM