Knowledge Distillation with TAs

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ก.พ. 2025

ความคิดเห็น • 14

  • @keres993
    @keres993 5 ปีที่แล้ว +2

    You've been talking about knowledge distillation for a while, and now I know why! This will permit deployment on hardware of arbitrary specifications in exchange for a far less penalty on inference accuracy. Beautiful.

    • @BinuJasim
      @BinuJasim 5 ปีที่แล้ว +1

      Finally you got it!

    • @connor-shorten
      @connor-shorten  5 ปีที่แล้ว +1

      Awesome! Knowledge distillation definitely seems to be one of the most interesting / promising ideas in Deep Learning right now!

  • @user-wz6fp1vl8m
    @user-wz6fp1vl8m 4 ปีที่แล้ว

    Thank you for such a clear presentation! This is an amazing summary of the four papers and saves me a lot of time :)

  • @00DarkSoul
    @00DarkSoul 5 ปีที่แล้ว +4

    Thank you! Awesome explanation.

  • @PeterOtt
    @PeterOtt 5 ปีที่แล้ว +2

    This is awesome, I’m excited to try this on my raspberry pi’s

    • @connor-shorten
      @connor-shorten  5 ปีที่แล้ว +1

      You should make a video about that f you have time! I would be very interested in seeing if this could work on raspberry pis!

  • @sajjadayobi688
    @sajjadayobi688 3 ปีที่แล้ว

    great video about knowledge distillation, now I need to go to the lab

  • @MiottoGuilherme
    @MiottoGuilherme 4 ปีที่แล้ว +1

    It is interesting to look at noisy students together with knowledge distillation. But it is important to note that noisy students is NOT a distillation technique. It is exactly the opposite: they leverage on unlabeled data to make models larger and larger.

    • @connor-shorten
      @connor-shorten  4 ปีที่แล้ว +1

      Thank you! In this case we are just using "distillation" to describe predicting labeled output of another network, particularly with some kind of temperature smoothing in the softmax as well. I agree the term is misleading, self-training might be better.

    • @MiottoGuilherme
      @MiottoGuilherme 4 ปีที่แล้ว

      @@connor-shorten True. I like the term "knowledge transfer". I see it as an umbrella term for any student-teacher dynamic, regardless of whether the student is smaller than the teacher (e.g. distilation, compression) or larger (e.g. network morphisms, noisy students).

  • @nguyenminhoan7882
    @nguyenminhoan7882 5 ปีที่แล้ว +1

    thanks you!