Intuitively Understanding the Cross Entropy Loss

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ม.ค. 2025
  • This video discusses the Cross Entropy Loss and provides an intuitive interpretation of the loss function through a simple classification set up. The video will draw the connections between the KL divergence and the cross entropy loss, and touch on some practical considerations.
    Twitter: / adianliusie

ความคิดเห็น • 66

  • @leoxu9673
    @leoxu9673 2 ปีที่แล้ว +35

    This is the only video that's made the connection between KL Divergence and Cross Entropy Loss intuitive for me. Thank you so much!

  • @jasonpittman7853
    @jasonpittman7853 2 ปีที่แล้ว +12

    This subject has confused me greatly for nearly a year now, your video and the kl-divergence video made it clear as day. You taught it so well I feel like a toddler could understand this subject.

  • @nirmalyamisra4317
    @nirmalyamisra4317 3 ปีที่แล้ว +13

    Great video. It is always good to dive into the math to understand why we use what we use. Loved it!

  • @ananthakrishnank3208
    @ananthakrishnank3208 ปีที่แล้ว

    Excellent expositions on KL divergence and Cross Entropy loss within 15 mins! Really intuitive. Thanks for sharing.

  • @kvnptl4400
    @kvnptl4400 ปีที่แล้ว +1

    This one I would say is a very nice explanation of Cross Entropy Loss.

  • @alirezamogharabi8733
    @alirezamogharabi8733 2 ปีที่แล้ว

    The best explanation I have ever seen about Cross Entropy Loss. Thank you so much 💖

  • @bo3053
    @bo3053 2 ปีที่แล้ว

    Super useful and insightful video which easily connects KL-divergence and Cross Entropy Loss. Brilliant! Thank you!

  • @yingjiawan2514
    @yingjiawan2514 5 หลายเดือนก่อน

    This is so well explained. thank you so much!!! Now I know how to understand KL divergence, cross entropy, logits, normalization, and softmax.

  • @Micha-ku2hu
    @Micha-ku2hu 7 หลายเดือนก่อน

    What a great and simple explanation of the topic! Great work 👏

  • @viktorhansen3331
    @viktorhansen3331 2 ปีที่แล้ว

    I have no background in ML, and this plus your other video completely explained everything I needed to know. Thanks!

  • @shubhamomprakashpatil1939
    @shubhamomprakashpatil1939 2 ปีที่แล้ว

    This is an amazing explanatory video on Cross-Entropy loss. Thank you

  • @ssshukla26
    @ssshukla26 3 ปีที่แล้ว +11

    And no one told me that (minimizing KL is almost equivalent to minizing CLE) in 2 years studying in a University... Oh man... thank you so much...

    • @DHAiRYA2801
      @DHAiRYA2801 ปีที่แล้ว +4

      KL = Cross Entropy - Entropy.

  • @hansenmarc
    @hansenmarc 2 ปีที่แล้ว

    Great explanation! I’m enjoying all of your “intuitively understanding” videos.

  • @TheVDicer
    @TheVDicer 2 ปีที่แล้ว

    Fantastic video and explanation. I just learned about the KL divergence and the cross entropy loss finally makes sense to me.

  • @matiassandacz9145
    @matiassandacz9145 3 ปีที่แล้ว +2

    This video was amazing. Very clear! Please post more on ML / Probability topics. :D Cheers from Argentina.

  • @allanchan339
    @allanchan339 3 ปีที่แล้ว

    It is a excellent explanation to make use of previous video of KL divergence in this video

  • @yfd487
    @yfd487 ปีที่แล้ว +2

    I love this video!! So clear and informative!

  • @chunheichau7947
    @chunheichau7947 5 หลายเดือนก่อน

    I wish more professors can hit all the insights that you mentioned in the video.

  • @mixuaquela123
    @mixuaquela123 2 ปีที่แล้ว +3

    Might be a stupid question but where do we get the "true" class distribution?

    • @patrickadu-amankwah1660
      @patrickadu-amankwah1660 ปีที่แล้ว +1

      Real world data bro, from annotated samples.

    • @飛鴻-q1c
      @飛鴻-q1c ปีที่แล้ว

      Human is the criteria for everything,so called AI

    • @AnonymousIguana
      @AnonymousIguana 8 หลายเดือนก่อน +1

      In the classification task, the true distribution has the value of 1 for the correct class and value of 0 for the other classes. So that's it, that's the true distribution. And we know it, if the data is labelled correctly. The distribution in classification task is called probability mass function btw

  • @whowto6136
    @whowto6136 3 ปีที่แล้ว

    Thanks a lot! Really helps me understand Cross Entropy, Softmax and the relation between them.

  • @hasankaynak2253
    @hasankaynak2253 2 ปีที่แล้ว

    The clearest explanation. Thank you.

  • @francoruggeri5850
    @francoruggeri5850 2 ปีที่แล้ว +1

    Great and clear explanation!

  • @thinkbigwithai
    @thinkbigwithai ปีที่แล้ว

    At 3:25
    why don't we model it as argmax Summ P* logP (without minus sign)?

  • @newbie8051
    @newbie8051 2 หลายเดือนก่อน

    oh wow this was simple and amazing thanks !

  • @PoojaKumawat-z7i
    @PoojaKumawat-z7i 6 หลายเดือนก่อน

    How does the use of soft label distributions, instead of one-hot encoding hard labels, impact the choice of loss function in training models? Specifically, can cross-entropy loss still be effectively utilized, or should Kullback-Leibler (KL) divergence be preferred?

  • @Darkev77
    @Darkev77 3 ปีที่แล้ว +3

    Brilliant and simple! Could you make a video about soft/smooth labels instead of hard ones and how that makes it better (math behind it)?

    • @SA-by2xg
      @SA-by2xg 2 ปีที่แล้ว

      Intuitively, information is lost whenever discretizing a continuous variable. Said another way, the difference between a class probability of 0.51 and 0.99 is very different. Downstream, soft targets allow for more precise gradient updates.

  • @blakete
    @blakete 2 ปีที่แล้ว

    Thank you. You should have more subscribers.

  • @LiHongxuan-ee7qs
    @LiHongxuan-ee7qs 10 หลายเดือนก่อน

    So clear explanation! Thanks!

  • @quantumjun
    @quantumjun 2 ปีที่แล้ว

    will the thing in 4:12 be negative if you use information entropy or KL divergence? are they both > 0?

    • @yassine20909
      @yassine20909 2 ปีที่แล้ว

      As explained in the video the KL divergence is a measure of "distance", so it has to be >0. There other prerequisites for a function to be a measure of distance like symmetry, and couple other things i forget about.

  • @yassine20909
    @yassine20909 2 ปีที่แล้ว

    Nice explanation, thank you.

  • @lebronjames193
    @lebronjames193 3 ปีที่แล้ว

    really superb video, you should record more !

  • @HaykTarkhanyan
    @HaykTarkhanyan 6 หลายเดือนก่อน

    great video, thank you!

  • @vandana2410
    @vandana2410 2 ปีที่แล้ว

    Thanks for the great video. 1 question though. What happens if we swap the true and predicted probabilities in the formula?

  • @MrPejotah
    @MrPejotah ปีที่แล้ว

    Great video, but only really clear if you know what the KL divergence is. I'd hammer that point to the viewer.

  • @dirtyharry7280
    @dirtyharry7280 ปีที่แล้ว

    This is so good, thx so much

  • @shchen16
    @shchen16 2 ปีที่แล้ว

    Thanks for this video

  • @mikejason3822
    @mikejason3822 2 ปีที่แล้ว

    Great video!

  • @kevon217
    @kevon217 2 ปีที่แล้ว

    Simple and helpful!

  • @fVNzO
    @fVNzO 3 หลายเดือนก่อน

    I skipped through the video but i don't think you managed to explain how the formula itself deals with the infinites that are created when inputting log(0). That's what i don't understand.

  • @sukursukur3617
    @sukursukur3617 2 ปีที่แล้ว

    Why dont we use just mean of (p-q)^2 instead of p*log(p/q) to understand dissimilarity of pdfs?

  • @jiwoni523
    @jiwoni523 11 หลายเดือนก่อน

    make more videos please , you are awesome

  • @madarahuchiha1133
    @madarahuchiha1133 9 หลายเดือนก่อน

    what is true class distribution?

    • @elenagolovach384
      @elenagolovach384 7 หลายเดือนก่อน

      the frequency of occurrence of a particular class depends on the characteristics of the objects

  • @sushilkhadka8069
    @sushilkhadka8069 ปีที่แล้ว

    This is so neat.

  • @yegounkim1840
    @yegounkim1840 ปีที่แล้ว

    You the best!

  • @omkarghadge8432
    @omkarghadge8432 3 ปีที่แล้ว +1

    Great! keep it up.

  • @shahulrahman2516
    @shahulrahman2516 2 ปีที่แล้ว

    Thank you

  • @starriet
    @starriet 3 ปีที่แล้ว

    essence, short, great.

  • @kutilkol
    @kutilkol 2 ปีที่แล้ว

    superb!

  • @jakobmiesner3995
    @jakobmiesner3995 18 วันที่ผ่านมา

    Thanks

  • @phafid
    @phafid 2 ปีที่แล้ว

    3:24 .The aha moment when you realize whta's the purpose of the negative sign in cross entrophy

    • @phafid
      @phafid 2 ปีที่แล้ว

      4:24. do you know how golden the statement is

  • @zhaobryan4441
    @zhaobryan4441 2 ปีที่แล้ว

    hello, handsome could you share the clear slides?

  • @tanvirtanvir6435
    @tanvirtanvir6435 ปีที่แล้ว

    0:08
    3:30
    P* is true prob

  • @genkidama7385
    @genkidama7385 8 หลายเดือนก่อน

    distirbution

  • @zingg7203
    @zingg7203 2 ปีที่แล้ว

    Volumn is low

  • @ajitzote6103
    @ajitzote6103 11 หลายเดือนก่อน +1

    not really a great explaination, so many terms were thrown in. that's not a good way to explain something.

  • @commonsense126
    @commonsense126 ปีที่แล้ว +1

    Speak slower please

    • @Oliver-2103
      @Oliver-2103 ปีที่แล้ว +1

      Your name is commonsense and you still don't use your common sense lol In every TH-cam application, there is the option to slow a video down to 75%, 50% or even 25% speed. If you have trouble with understanding his language, you should just select the 0.75 speed option.

    • @commonsense126
      @commonsense126 ปีที่แล้ว +1

      @@Oliver-2103 Visually Impaired people have problems seeing some of the adjustments one can make on a phone even when they know that they exist.