Lesson 5: Practical Deep Learning for Coders 2022

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 ก.ค. 2024
  • 00:00:00 - Introduction
    00:01:59 - Linear model and neural net from scratch
    00:07:30 - Cleaning the data
    00:26:46 - Setting up a linear model
    00:38:48 - Creating functions
    00:39:39 - Doing a gradient descent step
    00:42:15 - Training the linear model
    00:46:05 - Measuring accuracy
    00:48:10 - Using sigmoid
    00:56:09 - Submitting to Kaggle
    00:58:25 - Using matrix product
    01:03:31 - A neural network
    01:09:20 - Deep learning
    01:12:10 - Linear model final thoughts
    01:15:30 - Why you should use a framework
    01:16:33 - Prep the data
    01:19:38 - Train the model
    01:21:34 - Submit to Kaggle
    01:23:22 - Ensembling
    01:25:08 - Framework final thoughts
    01:26:44 - How random forests really work
    01:28:57 - Data preprocessing
    01:30:56 - Binary splits
    01:41:34 - Final Roundup
    Timestamps thanks to RogerS49 on forums.fast.ai.
    Transcript thanks to azaidi06, fmussari, wyquek, heylara on forums.fast.ai.

ความคิดเห็น • 47

  • @mattambrogi8004
    @mattambrogi8004 2 หลายเดือนก่อน +3

    Great lesson! I found myself a bit confused by the predictions and loss.backward() at ~37:00. did some digging to clear my confusion up which might be helpful for others:
    - At 37:00 minutes when we're creating the predictions, Jeremy says we're going to add up (each independent variable * coef) over the columns. There's nothing wrong with how he said this, it just didn't click for my brain: we're creating a prediction for each row by adding up each of the indep_vars*coeffs. So at the end we have a predictions vector with the same number of predictions as we have rows of data.
    - This is what we then calculate the loss on. Then using the loss, we do gradient descent to see how much changing each coef could have changed the loss (backprop). Then we go and apply those changes to update the coefs, and that's one epoch.

  • @GilesThomas
    @GilesThomas 4 หลายเดือนก่อน +7

    Awesome as always! Worth noting that the added columns (""Sex_male", "Sex_female", etc) are now bools rather than ints, so you need to explicitly coerce the df[indep_cols] explicitly at around @25.42 -- t_indep = tensor(df[indep_cols].astype(float).values, dtype=torch.float)

  • @goutamgarai9624
    @goutamgarai9624 2 ปีที่แล้ว +11

    Thanks Jeremy for this great tutorial.

  • @minkijung3
    @minkijung3 11 หลายเดือนก่อน +1

    Thank yo so much for this lecture, Jeremy🙏

  • @DevashishJose
    @DevashishJose ปีที่แล้ว

    Thank you for this lecture jeremy.

  • @tumadrep00
    @tumadrep00 ปีที่แล้ว +4

    What a great lesson given by the one and only Mr. Random Forests

    • @dhruvnigam7488
      @dhruvnigam7488 5 หลายเดือนก่อน

      why is he Mr. Random Forests?

  • @zzznavarrete
    @zzznavarrete 11 หลายเดือนก่อน +2

    Amazing as always Jeremy

    • @howardjeremyp
      @howardjeremyp  10 หลายเดือนก่อน +2

      Glad you think so!

  • @user-xn1ly6xt8o
    @user-xn1ly6xt8o 3 หลายเดือนก่อน

    it's awesome! thanks a lot

  • @senditco
    @senditco 4 หลายเดือนก่อน

    I might be cheating a lil because I've already done a deep learning subject at Uni, but this course so far is fantastic. It's really helping me flesh out what I didn't fully understand before.

  • @blenderpanzi
    @blenderpanzi 8 หลายเดือนก่อน +4

    1:02:38 Does trn_dep[:, None] do the same as trn_dep.reshape(-1, 1)? For me reshape seems a tiny bit less cryptic (though the -1 is still cryptic).

  • @tegsvid8677
    @tegsvid8677 ปีที่แล้ว +1

    Why we divide the layer1 with n_hidden?

  • @garfieldnate
    @garfieldnate ปีที่แล้ว +7

    It drives me absolutely batty to do matrix work in Python because it's so difficult to get the dimension stuff right. I always end up adding asserts and tests everywhere, which is sort of fine but I would rather not need them. I really want to have dependent types, meaning that the tensor dimensions would be part of the type checker and invalid operations would fail at compile time instead of run time. Then you could add smart completion, etc. to help get everything right quickly.

    • @howardjeremyp
      @howardjeremyp  ปีที่แล้ว +10

      You might be interested in hasktorch, which does exactly that!

    • @garfieldnate
      @garfieldnate ปีที่แล้ว

      @@howardjeremyp Hey that's pretty neat! Wish it worked in Python, though :D

    • @c.c.s.1102
      @c.c.s.1102 ปีที่แล้ว

      What helped me was reading the PyTorch source code with the `??` operator and thinking about the operations in terms of linear algebra. It's hard to keep all of the ranks in mind. At the end of the day I just have to keep hacking through the errors.

  • @michaelphines1
    @michaelphines1 ปีที่แล้ว +7

    If the gradients are updated inline, don't we have to reset the gradients after each epoch?

    • @LordMichaelRahl
      @LordMichaelRahl ปีที่แล้ว +1

      Just to note, this has been fixed in the actual downloadable "Linear model and neural net from scratch" notebook.

  • @blenderpanzi
    @blenderpanzi 8 หลายเดือนก่อน

    What does Categorify do? I looked it up and didn't understand. Is it converting names ("male", "female") to numbers (1, 2) or something?

  • @ansopa
    @ansopa 11 หลายเดือนก่อน +1

    coeffs =torch.rand(n_coeff)-0.5
    What is the use of subtracting 0.5 from the coefficients?
    Is there a problem that the values are just between 0 and 1?
    Thanks a lot.

    • @m-gopichand
      @m-gopichand 8 หลายเดือนก่อน

      torch.rand() generates random numbers range 0 to 1, subtracting 0.5 from the random coefficients is a simple technique to center the random values around zero, I believe that help in optimizing the gradient descent.

    • @emirkanmaz6059
      @emirkanmaz6059 8 หลายเดือนก่อน

      Shifting the range between -0.5, 0.5 so it can take positive and negative. There is different strategies you can google "weight initialization strategy" Libraries does this auto for relu or tanh etc

  • @rizakhan2938
    @rizakhan2938 9 หลายเดือนก่อน

    54:43 looks like its tough to control here.Good Question

  • @mustafaaytugkaya3020
    @mustafaaytugkaya3020 ปีที่แล้ว +1

    @howardjeremyp I haven't examined the best-performing Gender Surname Model for the Titanic dataset in detail, but something seems rather strange to me. Isn't using the survival status of other family members constituting a data leak? After all, at the time of inference, which is before the Titanic incident, I would not have this information.

    • @twisted_cpp
      @twisted_cpp ปีที่แล้ว +2

      Depends on how you look at it. If you're truing to predict whether a person has survived or not, and you already have a list of confirmed survivors and casualties then it's probably a good way to make the prediction, as in if Mrs X has died, then it's safe to assume that Mr X has died as well. Or if their children have died, then it's safe to assume that both their parents are dead if you consider that women and children board the lifeboats first.

  • @420_gunna
    @420_gunna 8 หลายเดือนก่อน +1

    Does anyone have an alternate way of explaining what he's trying to get across at 1:05:00?

    • @mdidactics
      @mdidactics 5 หลายเดือนก่อน

      If the coefficients are too large or too small, they create gradients that are either too steep or too gentle. When the gradient is too gentle, a small horizontal step won't take you down very far, and the gradient descent will take a long time. If the gradient is too steep, a small horizontal step will correspond to a big vertical drop and a big vertical swoop up the other side of the valley. So you might even get further away from the minimum. So what you want is something in between.

  • @thomasdeniffel2122
    @thomasdeniffel2122 2 หลายเดือนก่อน

    Adding a dimension at th-cam.com/video/_rXzeWq4C6w/w-d-xo.html is very important as otherwise the minus in the loss function, which then would do incorrect broadcasting leading do an model, which achieves at most 0.55 accuracy. The error is silent, as the mean in the loss function hides this.

  • @blenderpanzi
    @blenderpanzi 8 หลายเดือนก่อน +4

    Semi off topic: What I really dislike about Python is the lack of types (or that type hints are optional). It really makes it difficult to understand things if you learn complicated new stuff like this. Is that argument a float or a tensor? What is the shape of the tensor? If that would be in a type of the function argument it would make reading the code much more easy when learning this stuff.

    • @Kevin-mw6kc
      @Kevin-mw6kc 7 หลายเดือนก่อน +2

      If python had a strong type system it would be misaligned with it's purpose.

    • @noahchristie5267
      @noahchristie5267 5 หลายเดือนก่อน

      You can force more typing and type restrictions with different external sources and scripts

  • @alyaaka82
    @alyaaka82 4 หลายเดือนก่อน

    I was wondering why you replaced the NaN in the data frame with the mode not the mean?

    • @user-ic9oi8qo3g
      @user-ic9oi8qo3g หลายเดือนก่อน

      what would be the mean of names ?

  • @navrajsharma2425
    @navrajsharma2425 6 หลายเดือนก่อน

    27:18 Why don't we have a constant in our model? How can we know that there's not going to be a constant in the equation? Can someone explain this to me?

    • @Deco354
      @Deco354 5 หลายเดือนก่อน

      I think the dummy variables effectively act as a constant because they’re either 1 or 0

  • @iceiceisaac
    @iceiceisaac 5 หลายเดือนก่อน

    I get different results after the matrix section.

  • @jimshtepa5423
    @jimshtepa5423 ปีที่แล้ว +1

    why use sigmoid and not just round the absolute values of predictions to either 1 or 0?

    • @hausdorffspace
      @hausdorffspace 9 หลายเดือนก่อน +1

      Clipping the values might not be too bad if they are mostly in the range of 0 to 1, but if they were evenly spread out between 0 and 1000 (for example) then most of the values would get clipped to 1. You would have to scale them down first, and that means you would need to know how much to scale them by. With the sigmoid, it doesn't matter how big the numbers get, they will be squashed to the range of 0 to 1. Also, the sigmoid is differentiable, which makes it easy to calculate a gradient.

  • @VolodymyrBilyachat
    @VolodymyrBilyachat 3 หลายเดือนก่อน

    Instead of splitting code to cells I like to run notebook in VsCode and i can debug as normal

  • @hausdorffspace
    @hausdorffspace 9 หลายเดือนก่อน +1

    This is going to sound very pedantic, you use the word "rank" where I think "order" would be more correct. Rank usually means the number of independent columns in a matrix. At about 1:02:00, you say that the coefficients vector is a rank 2 matrix, but I would say its rank is 1 and its order is 2.

  • @thomasdeniffel2122
    @thomasdeniffel2122 2 หลายเดือนก่อน +1

    In `one_epoch` at 44.09, there is a `coeffs.grad.zero_()` missing :-)

  • @anthonypercy1770
    @anthonypercy1770 6 หลายเดือนก่อน

    Simply brilliant workshop... I had to change/add dtype=float e.g pd.get_dummies(tst_df, columns=["Sex","Pclass","Embarked"], dtype=float) to get it to work maybe due to a later version of pandas?

  • @leee5487
    @leee5487 5 หลายเดือนก่อน

    torch.rand(n_coeff, n_hidden)
    How does one set of coeffs, output 20 n_hidden values? I mean, mathematically, a single set of coefficients multiplied by a specific set of values will alway equal the same thing right?

    • @bobuilder4444
      @bobuilder4444 4 หลายเดือนก่อน

      Im assuming you are in the section about NN (before deep learning). The term n_hidden is a bad variable name. Its only 1 hidden layer, but the hidden layer is the linear combination of n_hidden relu's.
      Each of the relus have coefficients to learn which we store in a matrix size n_coeff by n_hidden.