Softmax - What is the Temperature of an AI??

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ก.พ. 2023
  • One of the most important parameters of AI models like the one behind ChatGPT is temperature, but how can AI have a temperature and what is it good for? While we pursue this question we uncover a beautiful connection between machine learning and statistical thermodynamics.
    If you want to support my channel, subscribing, liking and sharing is an amazing help. Thank you! If you feel like this video added value to your life and you want to support MarbleScience even more, please consider becoming a patron.
    / marblescience
    On patreon.com you can pledge an amount of your choosing per video that I upload. Even small contributions add up.
    Thanks for considering!
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 54

  • @mikip3242
    @mikip3242 ปีที่แล้ว +18

    When I first heard about thermodynamics and statistical mechanics I thought that this had to be the most boring part of physics. I mean.... statistics!? Not clear cut facts about the fundamentals of reality? Thermodynamics also seemed to be only about the thing that made steam engines possible and perhaps make chemists happy when they saw a liquid transforming into a solid.
    But as I learned more about both topics I became amazed by the kind of worldview it gets you into. I now see these as the most philosophically stimulating parts of physics. You get to understand the nature of information, energy and entropy so deeply. You see thermodynamics playing a role in each and every aspect of reality, in each process, from darwinian natural selection to the stock market, from AI to card games, from crystal formation to the emergence of design in ant colonies. You start to see the Milky Way as a gas made of stars instead of atoms, with a specific temperature and processes kicking the gas out of equilibrium. You see crowds in metal concerts as making phase transitions in moshpits, and you start to see the relevance in patterns like flocks of birds or magnetization in a solid. It's crazy how little expectations we all have around these topics before we get to learn about them.
    Very nice video. I hope this chanel gets bigger and bigger, and in the end makes people aware of the profund consequences of Democritus and Lucretius ideas that everything and every process is explainable by simple mindless interacting atoms (marbles) in the void.

    • @MarbleScience
      @MarbleScience  ปีที่แล้ว +2

      Beautifully written! I couldn't agree more.

  • @onederrwoman6176
    @onederrwoman6176 14 วันที่ผ่านมา +2

    5:20 3 years in a data science degree mindlessly using softmax and I finally understand the intuition behind it today

  • @AtPrEd
    @AtPrEd ปีที่แล้ว +3

    What a great showcase, understood it in the first watch, great work and thanks for the video 👍

  • @MichaelPayPlus
    @MichaelPayPlus 11 หลายเดือนก่อน +2

    This was a VERY GOOD explanation of temperature and the visuals helped clarify the concept even more. Bringing in concepts from other domains and creating those analogies was the cherry on top!

  • @epicwin789
    @epicwin789 ปีที่แล้ว +4

    This channel is seriously underrated!
    Keep up the great work :)

  • @mateuszpiechowiak2891
    @mateuszpiechowiak2891 หลายเดือนก่อน +1

    This is an amazing explanation. Great value. Thank you so much!

  • @Efretpkk
    @Efretpkk ปีที่แล้ว

    Super clean video and explanation! Clear, satisfying to watch, just good. Looking forward to more

  • @LF1780
    @LF1780 ปีที่แล้ว

    Wow! Amazing video making surprising connections and explaining them brilliantly! Again!
    Love it!

  • @brunabuffara
    @brunabuffara 3 หลายเดือนก่อน

    Ohh, that was GREAT! And very ... straightforward lol

  • @voidzircon8438
    @voidzircon8438 10 หลายเดือนก่อน

    Thanks for an easy to understand explanation. Very helpful!

  • @ronaldb5245
    @ronaldb5245 5 หลายเดือนก่อน

    Very well explained! Thank you.

  • @bhardwajsatyam
    @bhardwajsatyam 10 หลายเดือนก่อน

    Your comparision of the Boltzmann distribution and the Softmax was like flipping a switch in my brain! It was so satisfying to finally understand what the Boltzmann distribution really is saying, now that I already have a strong grasp on what Softmax does.

  • @kevon217
    @kevon217 5 หลายเดือนก่อน

    Excellent comparison and visuals

  • @JonathonRiddell
    @JonathonRiddell ปีที่แล้ว

    Glad to see you posting again! :) Great video

  • @the_smart_home_maker
    @the_smart_home_maker ปีที่แล้ว

    Awesome explanation - thank you!

  • @sajandumas-havenel
    @sajandumas-havenel 5 หลายเดือนก่อน

    Very good, thank you for providing correct information!

  • @suryanarayanmohapatra643
    @suryanarayanmohapatra643 11 หลายเดือนก่อน

    Keep producing such awesome content... Your channel is an absolute gem of a channel.❤

  • @Eggile
    @Eggile 10 หลายเดือนก่อน +1

    With my temperature set to 0 it is inevitable to praise you for this W O N D E R F U L explanation!

  • @Geogeorgefrommelbourne
    @Geogeorgefrommelbourne ปีที่แล้ว +2

    One of the best AI related videos out there

  • @yannickkohl8264
    @yannickkohl8264 ปีที่แล้ว

    Still an incredibly underwatched channel. Thank you for making these videos!

  • @maj3735
    @maj3735 10 หลายเดือนก่อน

    I am surprised this channel is not famous. My friend, you are doing an excellent work. Keep nerding like we all do :)

  • @esdrassantos6831
    @esdrassantos6831 ปีที่แล้ว

    thanks for share!

  • @robocop4050
    @robocop4050 11 หลายเดือนก่อน

    I don't usually comment on videos, but man.. the visualizations made it so much easier to understand the concept. Thanks!

    • @MarbleScience
      @MarbleScience  11 หลายเดือนก่อน

      Thanks :) I appreciate it!

  • @nicolasantonio8053
    @nicolasantonio8053 6 วันที่ผ่านมา

    After all these years, finally found Wally!

  • @adampliszka4855
    @adampliszka4855 ปีที่แล้ว

    Nice little video and extremely good production value. It's rare to see quality like that in smaller channels. I haven't seen your channel before, and I hope it grows!
    One note from me, although this is obviously a question of preference. I just think some people (and definitely me) might like the videos better if you got a bit more into the slightly more "advanced" stuff and some more interesting examples - not the difficult stuff of course, but not the bare basics either. Some popular channels like 3blue1brown etc. sometimes seem to get into stuff I'd actually call advanced. As an example of a more advanced but approachable example here, maybe you could have talked about simulated annealing? It's very on topic and would follow naturally, it also has a cool physical intuition behind it, and it's one of the most important optimisation algorithms to ever exist. Maybe you could show an animated graph for the probability distribution change? Cause the formula could be pretty cool for people to understand, but I don't know how accessible it could be made. And if you could actually animate a 2-dimensional (as in, a 2 parameter NN or something) case of the simulated annealing, that would be unbelievably cool tbh.
    Especially with the recent hype around AI, I think there would be quite an audience for a video on SA, and it doesn't even seem to have any pop-science videos on TH-cam yet. But this is just an example, cause same probably goes for many other topics. For example in the Monte Carlo sims vid, I don't know much about the topic, but I'd have loved to see some cool maths problem that can counterintuitively be solved in a clever way (if there are any ofc).
    This isn't meant to be criticism or anything, cause the videos are cool, but it's just my opinion on how they could be even cooler. Don't know what's the direction you want to take your channel in, or what your potential viewers like, but I figured maybe you'd find input useful? And even if not, at least it won't hurt haha

    • @MarbleScience
      @MarbleScience  ปีที่แล้ว +2

      Thanks for the detailed feedback! I don't have a very good insight in how most of my audience feel about the depth of the videos. Maybe I should make a poll or something.
      Simulated annealing might actually be a good a topic for another video. I'll put that on my list of video ideas :)

  • @Moe5Tavern
    @Moe5Tavern ปีที่แล้ว

    Such a good video again with a very nice and interesting finishing thought! I hope the algorithm will pick it up for a broader audience soon. Maybe try adding hashtags to the description like "machine learning" and "artificial intelligence" to help it!

    • @MarbleScience
      @MarbleScience  ปีที่แล้ว

      Thanks :) Maybe I should try that, but actually I think the description isn't that important. For the subtitles TH-cam is automatically creating a complete transcript of the video. I guess there is no reason why the should worry about keywords in the description rather then looking for keywords in the transcript.

  • @fastslowboat
    @fastslowboat ปีที่แล้ว

    Great explanation as always, would love to see more AI videos!

  • @BTFranklin
    @BTFranklin ปีที่แล้ว

    This was a wonderful and clear explanation of how temperature actually works in an LLM.

  • @sheliarahman6685
    @sheliarahman6685 ปีที่แล้ว

    This is awesome

  • @Geogeorgefrommelbourne
    @Geogeorgefrommelbourne ปีที่แล้ว +5

    This video is going to blow up. Guaranteed

  • @charlesmerritt8127
    @charlesmerritt8127 หลายเดือนก่อน

    I would love to make animations like yours as I go through my graduate program, would love a tutorial on that.

  • @grownupgaming
    @grownupgaming 5 หลายเดือนก่อน

    After softmax is applied w.r.t. a certain temperature, do we always apply random value to actually sample the word?

  • @KarstenBreivik
    @KarstenBreivik หลายเดือนก่อน

    Amazing!!! A great explanation of the Softmax component of the Transformers model in the "Attention Is All You Need" article en.wikipedia.org/wiki/Attention_Is_All_You_Need. Really helps piecing the generative AI stuff together :)

  • @Darkev77
    @Darkev77 ปีที่แล้ว +1

    This is such an intuitive explanation, brilliant! However, since all the probabilities are divided by the same constant "T", our max probability will always remain the max. So, how will our model ever pick the non-highest probability, what's the mechanism?

    • @MarbleScience
      @MarbleScience  ปีที่แล้ว +1

      Thanks! I don't know how it is exactly implemented, but you could draw a random number between 0 and 1 an then loop over all probabilities and check whether your random number is smaller then the sum over all previous probabilities.

    • @Darkev77
      @Darkev77 ปีที่แล้ว

      @@MarbleScience Thanks for the insight. I guess a weighted sampler also works

  • @conradsmith9441
    @conradsmith9441 10 หลายเดือนก่อน

    I just watched a video of a guy who got AI playground to increase the temperature to 1.5... (IDK how he did it. Maybe he hacked it or changed the code on the page.)
    It was glitching HARDCORE. It made absolutely no sense and was super hilarious.

  • @protocol6
    @protocol6 ปีที่แล้ว

    Wouldn't it make sense to tune the temperature to the critical point? Seems like that should be something that could be done algorithmically. Or are there sometimes benefits to being subcritical or supercritical? Edit: If that's meaningless to you, search wikipedia for "ising model" and "critical exponent" for starting points.

    • @MarbleScience
      @MarbleScience  ปีที่แล้ว

      I am not sure what you mean. I know what a critical point is in thermodynamics, but what would the equivalent be for an AI model?

    • @protocol6
      @protocol6 ปีที่แล้ว

      @@MarbleScience Take a look at the Ising model and maybe Quanta Magazine's recent video "Could one Physics Theory Unlock the Mysteries of the Brain?" for an overview of the idea behind critical systems in general and how it might apply to neural networks. Ulam, Conway and Wolfram did a bit of work in this space, too, with so-called self organizing criticality with simple rulesets but it's also possible to deliberately tune for criticality.

    • @MarbleScience
      @MarbleScience  ปีที่แล้ว

      @@protocol6 Sounds interesting! I will have a look.

    • @MarbleScience
      @MarbleScience  ปีที่แล้ว

      @@protocol6 Thanks a lot for sharing this video! For me, it has stimulated a lot of thinking today. I watched the video this morning. Initially I thought it was really interesting. Now I am not sure if it rather is kind of a scam - the typical behavior of scientists blowing up something trivial to make their research seam more important.
      According to Wikipedia the mathematical definition of a critical point is "a point in the domain of the function where the function is either not differentiable or the derivative is equal to zero". With that in mind isn't it kind of ridiculous to act surprised that a lot of optimized systems (like a brain) operate close to a critical point? Finding the critical points (points with derivative equal to zero) is literally part of the recipe that everyone learns at school to find optima.
      In the softmax language model example there is certainly a critical point at T=0. Above zero the word with the highest rating is most likely, and below zero the word with the lowest rating becomes most likely. There might also be other critical points. For example the point where the generated text transitions from sentence structure to more or less random chains of words. However both these points do not seam to be very promising when it comes to the ideal temperature setting.
      Critical points aside, there is certainly not one ideal temperature setting for all use cases.

    • @protocol6
      @protocol6 ปีที่แล้ว

      @@MarbleScience Yes, I wasn't sure if the temperature you were speaking of was really analagous and now I think it isn't but I'd been thinking about and looking for what was when I ran across your video. I was more interested in the point where you start getting self-similarity and avalanches (chains of triggered activations that peter out) in ANNs. It'll mean optimizing for a specific function so identifying that function is the key. I'd expect it to be somewhat dependent on the average connectedness and possibly weight vs distance. Magnetic domains get that automatically as they are fully connected and weight varies with the inverse square but it'll look more like inverse cube at certain distaces due to dipole separation. The temperature, though, when you assume a given connectedness, to me seems like it'd be some kind of tuning parameter on the activation function. Perhaps skewing the sigmoid midpoint, adjusting the slope or both. No idea how to adjust those while still being able to train the network effectively but I'll probably be trying it in a NEAT model.

  • @christopherellis2663
    @christopherellis2663 ปีที่แล้ว

    Scarce does not rhyme with Farce . Iy rhymes with scare.fare, share rare snare care . bare ware.

  • @SuperFinGuy
    @SuperFinGuy 2 หลายเดือนก่อน

    The AI terminology is terrible, why call the token randomness temperature?? Why not call it randomness, stochasticity or even noise. The word temperature is just a loose analogy, that just makes it harder to understand. Don't even get me started on "attention", the paper should be called "context is all you need" but I guess that would be too obvious, right? tsk tsk