Stanford Seminar - Information Theory of Deep Learning, Naftali Tishby

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 ก.ย. 2024

ความคิดเห็น • 29

  • @krasserkalle
    @krasserkalle 5 ปีที่แล้ว +127

    This is my personal summary:
    00:00:00 History of Deep Learning
    00:07:30 "Ingredients" of the Talk
    00:12:30 DNN and Information Theory
    00:19:00 Information Plane Theorem
    00:23:00 First Information Plane Visualization
    00:29:00 Mention of Critics of the Method
    00:32:00 Rethinking Learning Theory
    00:37:00 "Instead of Quantizing the Hypothesis Class, let's Quantize the Input!"
    00:43:00 The Information Bottleneck
    00:47:30 Second Information Plane Visualization
    00:50:00 Graphs for Mean and Variance of the Gradient
    00:55:00 Second Mention of Critics of the Method
    01:00:00 The Benefit of Hidden Layers
    01:05:00 Separation of Labels by Layers (Visualization)
    01:09:00 Summary of the Talk
    01:12:30 Question about Optimization and Mutual Information
    01:16:30 Question about Information Plane Theorem
    01:19:30 Question about Number of Hidden Layers
    01:22:00 Question about Mini-Batches

    • @clusteralgebra
      @clusteralgebra 5 ปีที่แล้ว

      Thank you!

    • @zhechengxu121
      @zhechengxu121 4 ปีที่แล้ว

      Bless your soul

    • @willjennings7191
      @willjennings7191 4 ปีที่แล้ว +1

      I have used your personal summary as a template for a section of my personal notes.
      Thank you very much!

  • @paritoshkulkarni6354
    @paritoshkulkarni6354 2 ปีที่แล้ว +11

    RIP Naftali!

  • @FlyingOctopus0
    @FlyingOctopus0 6 ปีที่แล้ว +13

    I wonder if based on this we can create better training algorithms. Like for example effectiveness of dropout may have a connection to this theory. The dropout may introduce more randomness in "diffusion" stage of training.

  • @alexkai3727
    @alexkai3727 4 ปีที่แล้ว +6

    I read another paper ON THE INFORMATION BOTTLENECK THEORY OF DEEP LEARNING by Harvard's researchers published in 2018, and they hold a very different view. Seems it's still unclear how neural network works.

    • @Checkedbox
      @Checkedbox 3 ปีที่แล้ว +2

      Is that the one he mentions at ~ 29:00 ?

  • @phaZZi6461
    @phaZZi6461 5 ปีที่แล้ว +2

    1:22:31 - thesis statement about how to choose mini batch size

  • @jaimeziratearzate
    @jaimeziratearzate 10 หลายเดือนก่อน

    does anybody know how to show the part that the gibbs distribution converges to the optimal IB bound?
    And what is the epsilon cover of an hypothesis class?

  • @paulcurry8383
    @paulcurry8383 3 ปีที่แล้ว +1

    Anybody know what a “pattern” is in information theory?

  • @applecom1de509
    @applecom1de509 6 ปีที่แล้ว +2

    Aah this is so relaxing.. Thank you!

  • @nickybutton2736
    @nickybutton2736 3 ปีที่แล้ว

    Amazing talk, thank you!

  • @zessazzenessa1345
    @zessazzenessa1345 6 ปีที่แล้ว +7

    "Learn to ignore irrelevant labels" yes intriguing..........

  • @amirmn7
    @amirmn7 5 ปีที่แล้ว +16

    Can he use deep learning to fix the audio problems of this video?

    • @DheerajAeshdj
      @DheerajAeshdj 3 ปีที่แล้ว +2

      probably not because there are none

    • @AZTECMAN
      @AZTECMAN 2 ปีที่แล้ว

      Seems like this was asked in jest, but it's actually a good question.

  • @AlexCohnAtNetvision
    @AlexCohnAtNetvision 3 ปีที่แล้ว +6

    such a loss… blessed be his memory

  • @hanchisun6164
    @hanchisun6164 2 ปีที่แล้ว +1

    This theory looks correct!
    When neural networks became popular, everybody in the scientific computation community eagerly wanted to describe it in their own languages. Many had achieved limited success. I think the information theory one makes the most sense, because it finds simplicity of the information from complexity of data. It is like how human thinks. We create abstract symbols that captures essence of the nature and conduct logical reasoning, which means that the dimension of freedom behind the world should be small since it is structured.
    Why did the ML community and industry not adopt this explanation?

  • @minhtoannguyen1862
    @minhtoannguyen1862 2 ปีที่แล้ว

    44:25

  • @absolute___zero
    @absolute___zero 4 ปีที่แล้ว

    oooo! so it is SGD ? If I wouldn't listen to the Q&A session I wouldn't understand it all. Now I do. Well, with second order algorithms (like Levenberg Marquard) you won't need all these balls floating to understand what's going on with your neurons. Gradient Descent is poor's man gold.

  • @binyuwang6563
    @binyuwang6563 6 ปีที่แล้ว +5

    If the theories are true, maybe we can compute the weights directly without iteratively learning them via gradient decsent.

    • @zessazzenessa1345
      @zessazzenessa1345 6 ปีที่แล้ว

      Binyu Wang oh

    • @prem4708
      @prem4708 5 ปีที่แล้ว +13

      How so?

    • @Daniel-ih4zh
      @Daniel-ih4zh 2 ปีที่แล้ว

      I've been thinking about this a lot too. The weights are partly function of the data of course, and we also have things like the good regulator theorem that kinda points towards it. Also, a latent code and the parameters learned aren't distinguished in Bayesian model selection.

  • @alexanderkurz2409
    @alexanderkurz2409 8 หลายเดือนก่อน

    11:30 "information measures are invariant to computational complexity"

  • @dexterdev
    @dexterdev 3 ปีที่แล้ว

    23:04

  • @julianbuchel1087
    @julianbuchel1087 6 ปีที่แล้ว +2

    When was this talk given? Has he published his paper yet? I found nothing online so far, but maybe I just didn't see it.

    • @Chr0nalis
      @Chr0nalis 5 ปีที่แล้ว +16

      1)Deep learning and the Information Bottleneck, 2) Opening the black box of Deep neural networks via Information