The Softmax : Data Science Basics

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 ธ.ค. 2024

ความคิดเห็น • 116

  • @wennie2939
    @wennie2939 3 ปีที่แล้ว +33

    I really love how you progress step by step instead of directly throwing out the formulas! The best video on TH-cam on the Softmax! +1

  • @DFCinBE
    @DFCinBE ปีที่แล้ว +5

    For a non-mathematician like myself, this was crystal clear, thanks very much!

  • @birajkoirala5383
    @birajkoirala5383 4 ปีที่แล้ว +39

    tutorials with boards noww...nice one dude...underrated channel I must say!

    • @ritvikmath
      @ritvikmath  4 ปีที่แล้ว

      Much appreciated!

    • @MrDullBull
      @MrDullBull 3 ปีที่แล้ว

      agreed. greetings from russia!

  • @akum007
    @akum007 3 หลายเดือนก่อน +4

    I love it! The choice of examples and the way you explain ... just bravo!

    • @ritvikmath
      @ritvikmath  3 หลายเดือนก่อน

      Thank you! 😃

  • @debapriyabanerjee8486
    @debapriyabanerjee8486 3 ปีที่แล้ว +10

    This is excellent! I saw your video on the sigmoid function and both of these explain the why behind their usage.

    • @ritvikmath
      @ritvikmath  3 ปีที่แล้ว +2

      Glad it was helpful!

  • @suparnaprasad8187
    @suparnaprasad8187 3 หลายเดือนก่อน

    This channel is literally the best.

    • @ritvikmath
      @ritvikmath  3 หลายเดือนก่อน

      Thanks!

  • @omniscienceisdead8837
    @omniscienceisdead8837 2 ปีที่แล้ว

    the person who is going to be responsible for me kick starting my ML journey with a good head on my shoulders, thank you ritvik, very enlightening

  • @marcusakiti7608
    @marcusakiti7608 2 ปีที่แล้ว +1

    Awesome stuff. Searched this video because I was trying to figure out why the scores/sum scores approach wouldn't work and you addressed it first thing. Great job.

  • @iraklisalia9102
    @iraklisalia9102 3 ปีที่แล้ว +5

    What a great explanation! Thank you very much.
    The why do we choose this formula versus this formula explanation is truly makes everything clear. Thank you once again :)

  • @okeuwechue9238
    @okeuwechue9238 9 หลายเดือนก่อน +1

    Thnx. Very clear explanation of the rationale for employing exponential fns instead of linear fns

    • @ritvikmath
      @ritvikmath  9 หลายเดือนก่อน

      Great to hear!

  • @ekaterinakorneeva4792
    @ekaterinakorneeva4792 ปีที่แล้ว

    Thank you!!! This is so much clearer and straighter than 2 20-minutes videos on Softmax from "Machine Learning with Python-From Linear Models to Deep Learning" from MIT! To be fair, the latter explains multiple perspectives and is also good in its sense. But you deliver just the most importaint first bit of what is softmax and what are all these terms are about.

  • @MORE2Clay
    @MORE2Clay 3 ปีที่แล้ว

    The introduction to softmax which explains why softmax exists helped me a lot understanding it

  • @zvithaler9443
    @zvithaler9443 2 ปีที่แล้ว +2

    Great explenations, your addition of the story to the objects really help understanding the material

  • @ManpreetKaur-ve5gw
    @ManpreetKaur-ve5gw 3 ปีที่แล้ว

    The only video I needed to understand the SOFTMAX function. Kudos to you!!

  • @somteezle1348
    @somteezle1348 4 ปีที่แล้ว +2

    Wow...teaching from first principles...I love that!

    • @ritvikmath
      @ritvikmath  4 ปีที่แล้ว

      Glad you liked it!

  • @zafarnasim9267
    @zafarnasim9267 2 ปีที่แล้ว

    Woooow ,really liked our teaching approach, awesome!

  • @vamshi755
    @vamshi755 4 ปีที่แล้ว

    Now i know why lot of your videos answers WHY question. You give importance to application not the theory alone. concept is very clear. thanks

  • @johnlabarge
    @johnlabarge หลายเดือนก่อน

    This is a glorious explanation.

  • @karimamakhlouf2411
    @karimamakhlouf2411 2 ปีที่แล้ว

    An excellent and straightforward way of explaining. So helpful! Thanks a lot :)

  • @cobertizo
    @cobertizo 4 ปีที่แล้ว

    I came for the good-looking teacher but stayed for the really clear an good explanation.

  • @rizkysyahputra98
    @rizkysyahputra98 4 ปีที่แล้ว +1

    Clearest explanation about softmax.. thank you

    • @ritvikmath
      @ritvikmath  4 ปีที่แล้ว

      Glad it was helpful!

  • @MLDawn
    @MLDawn 4 ปีที่แล้ว +1

    please note that the outputs of Softmax are NOT probabilities but are interpreted as probabilities. This is an important distinction! The same goes for the Sigmoid function. Thanks

  • @bryany7344
    @bryany7344 3 ปีที่แล้ว +2

    1:14, how is it a single dimensional for sigmoid? Shouldn't it be two dimensions?

    • @vahegizhlaryan5052
      @vahegizhlaryan5052 3 ปีที่แล้ว

      well after applying sigmoid you get only one probability p (the other one you can calculate as 1-p) so actually you only need one number in case of sigmoid

  • @grzegorzchodak
    @grzegorzchodak ปีที่แล้ว

    Great explanation! Easy and helpful!

  • @salmans1224
    @salmans1224 4 ปีที่แล้ว +1

    awesome man..your videos make me less anxious about math..

  • @YAlsadah
    @YAlsadah 2 ปีที่แล้ว

    What an amazing, simple explanation. thank you!

  • @AIPeerAcademy
    @AIPeerAcademy ปีที่แล้ว

    Thank you so much. I now understand why exp is used instead of simple calc.😊

  • @maralazizi
    @maralazizi 3 หลายเดือนก่อน

    Another great content, thank you so much!

    • @ritvikmath
      @ritvikmath  3 หลายเดือนก่อน

      My pleasure!

  • @jackshaak
    @jackshaak 3 ปีที่แล้ว +1

    Just great! Thanks, man.

  • @masster_yoda
    @masster_yoda 10 หลายเดือนก่อน

    Great explanation, thank you!

  • @eliaslara6964
    @eliaslara6964 4 ปีที่แล้ว +1

    Dude! I really love you.

  • @bittukumar-rv6rx
    @bittukumar-rv6rx หลายเดือนก่อน

    Thank s for uploading ❤❤❤

  • @kausshikmanojkumar2855
    @kausshikmanojkumar2855 ปีที่แล้ว

    Absolutely beautiful.

  • @shiyuyuan7958
    @shiyuyuan7958 3 ปีที่แล้ว

    Very clear explained , thank you, subscribed

  • @serdarufukkara7109
    @serdarufukkara7109 4 ปีที่แล้ว

    thank you very much, you are very good at teaching, very well prepared!

  • @michael88704
    @michael88704 2 ปีที่แล้ว

    I like the hierarchy implied by the indices on the S vector ;)

  • @fatemehsefishahpar3626
    @fatemehsefishahpar3626 3 ปีที่แล้ว

    How great was this video! thank you

  • @seojun2599
    @seojun2599 ปีที่แล้ว

    How to dealing with high Xi values? I got 788, 732 for Xi value, and if I exp(788) it gives error bcs it exp results near to infinity

  • @debaratiray2482
    @debaratiray2482 2 ปีที่แล้ว

    Awesome explanation.... thanks !!!

  • @anandiyer5361
    @anandiyer5361 2 ปีที่แล้ว

    @ritwikmath want to understand why you chose the subscript N to describe the features; they should be S_1..S_M isn't it?

  • @Nova-Rift
    @Nova-Rift 4 ปีที่แล้ว

    You're amazing. great teacher

  • @burger_kinghorn
    @burger_kinghorn หลายเดือนก่อน

    How does this relate to multinomial logistic regression?

  • @diegosantosuosso806
    @diegosantosuosso806 ปีที่แล้ว

    Thanks Professor!

  • @bhp72
    @bhp72 2 หลายเดือนก่อน

    really enjoyed that. thanks!

  • @karimomrane7556
    @karimomrane7556 2 ปีที่แล้ว

    I wish you were my teacher haha great explanation :D Thank you so much ♥

  • @igoroliveira5463
    @igoroliveira5463 3 ปีที่แล้ว

    Could you do a video about the maxout unit? I read it on Goodfellow's Deep Learning book, but I did not grasp the intuition behind it clearly.

  • @ManishaGupta-rj3bq
    @ManishaGupta-rj3bq 3 หลายเดือนก่อน

    Great tutorials!

    • @ritvikmath
      @ritvikmath  3 หลายเดือนก่อน

      Glad you like them!

  • @dragolov
    @dragolov 2 ปีที่แล้ว

    Bravo! + Thank you very much!

  • @oligneflix6798
    @oligneflix6798 3 ปีที่แล้ว

    bro you're a legend

  • @yingchen8028
    @yingchen8028 4 ปีที่แล้ว

    more people should watch this

  • @aFancyFatFish
    @aFancyFatFish 3 ปีที่แล้ว

    Thank you very much, clear and helpful to me as a beginer😗

  • @nehathakur8221
    @nehathakur8221 4 ปีที่แล้ว

    Thanks for such intuitive explanation Sir :)

  • @ridhampatoliya4680
    @ridhampatoliya4680 4 ปีที่แล้ว

    Very clearly explained!

  • @kausshikmanojkumar2855
    @kausshikmanojkumar2855 ปีที่แล้ว

    Beautiful!

  • @d_b_
    @d_b_ ปีที่แล้ว

    Maybe this was explained in a past video, but why is "e" chosen over any other base (like 2 or 3 or pi)...

  • @zacharydan7236
    @zacharydan7236 4 ปีที่แล้ว

    Solid video, subscribed!

  • @evagao9701
    @evagao9701 4 ปีที่แล้ว

    hi there, what is the meaning of the square summation?

  • @mrahsanahmad
    @mrahsanahmad 3 ปีที่แล้ว

    I am new to Data Sceince. However, why would a model output 100, 101 and 102 as three outputs unless the input had similarity to all three classes. Even in our daily lives, we would ignore 2 dollar variance on $100 think but complain if something which was originally free but now costs 2 dollars. Question is, why would we give up the usual practice and use some fancy transformation function here ?

  • @tm0209
    @tm0209 11 หลายเดือนก่อน

    What does dP_i/dS_j = -P_i * P_j mean and how did you get it? I understand dP_i/dS_i because S_i is a single variable. But dP_i/DS_j is a whole set of variables (Sum(S_j) = S_1 + S_2 ... S_n) rather than a single one. How are you taking a derivative of that?

  • @zahra_az
    @zahra_az 3 ปีที่แล้ว

    that was so much sweet and inspiring

  • @azinkatiraee6684
    @azinkatiraee6684 ปีที่แล้ว

    a clear explanation!

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      Glad you think so!

  • @sukursukur3617
    @sukursukur3617 4 ปีที่แล้ว +1

    3:18 very good teacher

  • @dikshanegi1028
    @dikshanegi1028 ปีที่แล้ว

    Keep going buddy

  • @shreyasshetty6850
    @shreyasshetty6850 3 ปีที่แล้ว

    Holy shit! That makes so much sense

  • @kavitmehta9143
    @kavitmehta9143 4 ปีที่แล้ว

    Awesome Brother!

  • @brendanamuh5683
    @brendanamuh5683 2 ปีที่แล้ว

    thank you so much !!

  • @jeeezsh4704
    @jeeezsh4704 2 ปีที่แล้ว

    You teach better than my grad school professor 😂

  • @ヨママ
    @ヨママ 4 ปีที่แล้ว

    Thank you so much! You made it very clear :)

  • @peterniederl3662
    @peterniederl3662 4 ปีที่แล้ว

    Very helpful!!! Thx!

  • @ZimoNitrome
    @ZimoNitrome 3 ปีที่แล้ว

    good video

  • @tsibulsky4900
    @tsibulsky4900 ปีที่แล้ว

    Thanks 👍

  • @johnginos6520
    @johnginos6520 4 ปีที่แล้ว

    Do you do one on one tutoring?

  • @jasonokoro8400
    @jasonokoro8400 2 ปีที่แล้ว

    I don't understand *why* it's weird that 0 maps to 0 or why we need the probability to be the same for a constant shift...

  • @Fat_Cat_Fly
    @Fat_Cat_Fly 4 ปีที่แล้ว +1

    👍🏻👍🏻👍🏻👍🏻👍🏻👍🏻

  • @mmm777ization
    @mmm777ization 3 ปีที่แล้ว

    4:00 I thank you have express it in a wrong way you wanted to say that we need to go into depth and not just focus on the application that is the façade which here's deriving formula

  • @suyashdixit682
    @suyashdixit682 ปีที่แล้ว +1

    Yet again an Indian dude is saving me!

  • @evgenyv5687
    @evgenyv5687 3 ปีที่แล้ว +1

    Hey, thank you for a great video! I have a question: in your example, you said that probabilities between 0,1 and 2 should not be different from 100, 101, and 102. But in the real world, the scale which is used to assess students makes difference and affects probabilities. The difference between 101 and 102 is actually smaller than between 1 and 2, because in the first case the scale is probably much smaller, so the difference between scores is more significant. So wouldn't a model need to predict different probabilities depending on the assessment scale?

    • @EW-mb1ih
      @EW-mb1ih 3 ปีที่แล้ว

      same question!

    • @imingtso6598
      @imingtso6598 2 ปีที่แล้ว

      My point of view is that the softmax scenario is different from sigmoid scenario. In the sigmoid case, we need to capture the changes in relative scale because subtle changes around the 1/2 prob. point result in significant prob. changes(turns the whole thing around, drop out or not); whereas in the softmax case, there are more outputs and our goal is to select the very case which is most likely to happen, so we are talking about an absolute amount rather than a relative amount(final judge). I guess that's why ritvik said" change in constant shouldn't change our model'.

  • @anishbabus576
    @anishbabus576 4 ปีที่แล้ว

    Thank you

  • @hezhu482
    @hezhu482 4 ปีที่แล้ว

    thank you!

  • @markomarkus8560
    @markomarkus8560 4 ปีที่แล้ว

    Nice video

  • @ayeddie6788
    @ayeddie6788 2 ปีที่แล้ว

    PRETTY GOOD

  • @y0n1n1x
    @y0n1n1x 2 ปีที่แล้ว

    thanks

  • @wduandy
    @wduandy 4 ปีที่แล้ว

    Amazing!

  • @yuchenzhao6411
    @yuchenzhao6411 4 ปีที่แล้ว

    Very good video

  • @ltang
    @ltang 3 ปีที่แล้ว

    Oh.. softmax is for multiple classes and sigmoid is for two classes.
    I get that your i here is the class. In the post below though, is their i observations and k the classes?
    stats.stackexchange.com/questions/233658/softmax-vs-sigmoid-function-in-logistic-classifier

  • @joelpaddock5199
    @joelpaddock5199 11 หลายเดือนก่อน

    Hello Boltzmann distribution we meet again, cool nickname

  • @matgg8207
    @matgg8207 2 ปีที่แล้ว

    what a shame that this dude is not a professor!!!!!!!!

  • @QiyuanSong
    @QiyuanSong ปีที่แล้ว

    Why do I need to go to school?

  • @gestucvolonor5069
    @gestucvolonor5069 4 ปีที่แล้ว +1

    I knew things were about to go down when he flipped the pen.

    • @mrahsanahmad
      @mrahsanahmad 3 ปีที่แล้ว

      are you crazy. the moment he did that, I knew it would be fun listening to him. He was focused. Like he said, theory is relevant only in context of practicality.

  • @jkhhahahhdkakkdh
    @jkhhahahhdkakkdh 4 ปีที่แล้ว

    Very different from how *cough* Siraj *cough* explained this lol

  • @srl2017
    @srl2017 2 ปีที่แล้ว +1

    god

  • @suryatejakothakota7742
    @suryatejakothakota7742 4 ปีที่แล้ว

    Binod stop ads

  • @fintech1378
    @fintech1378 ปีที่แล้ว

    minute 11-12.30 you are not very clear and going too fast

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      hey thanks for the feedback, will work on it