*My takeaways:* *1. Why image classification is hard **3:00* *2. Data-driven approach is more scalable than hand-crafted approach **6:48* *3. Nearest Neighbor classifier **9:11**: remember all training images, and comparing test image with every single one of them; transfer the label from the training image (that is most similar to the test image) to the test image* 3.1 How to measure similarity? 10:51: L1 distance (Manhatten distance) 3.2 Classification speed depends on the size of the training data 12:50 3.3 Speed up classification 14:06 3.4 How to measure similarity? 14:23: L2 distance (Euclidean distance) 3.5 The choice of distance is a hyperparameter 15:00 *4. k-Nearest Neighbor **15:17**: similar to Nearest Neighbor classifier, but instead of finding the most similar training image, kNN finds k most similar images, and has them to vote on the label* 4.1 Distance and k are both hyperparameters 18:33, we have to try different values to see what works best 4.2 Training/validation/test data, and cross-validation 18:56 4.3 kNN on images never used 24:00 4.4 Summary 25:20 *5. Linear classification **25:59* 5.1 Parametric approach 29:47 5.2 Linear classifier 31:21 5.3 Interpreting a liner classifier 40:01 5.4 What would be a very hard test dataset for liner classifier? 47:33 5.5 The needs for a loss function 52:25
I was around 12 when I used to watch your speedcubing videos and got into sub-20s. I am now 20 and watching your CNN videos. Your explanation style like before, is extremely lucid. Time flies so fast! Welcome back to TH-cam, Mr. Badmephisto.
really like the part of interpreting simple models in various ways :) looking forward for more such intuitive interpretations for more complicated networks
At 22:22, a student asks if the reason the larger K values lead to poorer accuracy is due to high bias (i.e., underfitting), and the instructor says he can’t say that for sure. But it seems to me the student was correct: that a too high K value leads to underfitting, at least as to that particular dataset. (Conversely, a too low K value leads to overfitting). Anyone agree?
in 33:33 why W equal (3, 4), I know that we have 3 classes but there is just one example which output must be (1, 1). so what if I have 200 example , 20 classes and 1000 nx, should w be (20, 200, 1000) ?
To clarify, he says about situation when dog in the center is one class, and in the corner is different class. Then it is not a problem for linear classifier, it can update weights the way to distinguish these classes, that differ by position (image can be the same, but position defines class)
I do not understand the images which would be hard to classify by a linear classifier, and also interpreting a linear classifier in a high dimensional space. How is the classifier drawing boundaries around clusters of 'dog' images when projected into the high dimensional space. I imagined the images as 3x1 pixel images and projected into the 3D space. Multiple images can be plotted here and lets say the images with high brightness in pixel 2 gets clustered on the y axis with different x and z variables, now a 2D plane (xz) can be drawn separating the area above and below these brightness clumps, which can be treated as a separation boundary. Is this interpretation correct ?
Q: What does the linear classifier do, in English? A: Create a dart board with N high score areas. When you throw a new dart, it will tell your dart is an airplane.
I summarized his awesome lecture: we tried to comprehend linear function in linear classification: f(x)=ax+b (more detail at: en.wikipedia.org/wiki/Linear_function). Linear classification is starting point toward convolution neural network (CNN). In my opinion, linear classification 's in 1 word is "trend" , if it 's in full explanation : this is line of railway that will take all objects near or inside it a label or the name of object.
How is the L2 distance the same between the original and distorted images? Wouldn't the distortion lead to a non-zero difference in the pixel values, making them different?
One of the questions that Would it be problem if image at centre or image at the right? I guess the answer is No, i.e the linear classifier would handle that but I'm not sure it's correct
To clarify, he says about situation when dog in the center is one class, and in the corner is different class. Then it is not a problem for linear classifier, it can update weights the way to distinguish these classes, that differ by position (image can be the same, but position defines class)
Great stuff. Just FYI in case you didn't notice: the video resolution is a bit low (360p). The video for the first lecture was available in up to 1080p which is sometimes helpful for reading small details. Thanks a lot anyway.
+Daniel de Freitas Adiwardana The original video is recorded at 1080p. It takes TH-cam a few hours to fully process it (which is why it starts at 360p)
Nice to have videos this time, I'm looking forward to more. One nitpick, you referred to a linear classifier as "parametric" and nearest neighbor as "nonparametric" but these terms seem to have a different meaning in statistics. Parametric refers to parameterized probability distributions, e.g. the normal distribution is defined by the mean and variance parameters. In this sense, even if we have a neural network approximating a normal distribution the model would be nonparametric since it wouldn't explicitly use the mean and variance parameters. To avoid this confusion, even though it's a mouthful, I prefer the term "parameterizable" to distinguish models that do and don't use any parameters.
On slide 51 at 33:12 the ship score is incorrect for the numbers shown. It should be 60.75, the value of 61.95 comes from calculating with the last ship weight as 0.3, and not -0.3.
andrej has the most unique way of telling things and generally they are more intuitive, what a maniac!
yeah completely agree he is just excellent!
*My takeaways:*
*1. Why image classification is hard **3:00*
*2. Data-driven approach is more scalable than hand-crafted approach **6:48*
*3. Nearest Neighbor classifier **9:11**: remember all training images, and comparing test image with every single one of them; transfer the label from the training image (that is most similar to the test image) to the test image*
3.1 How to measure similarity? 10:51: L1 distance (Manhatten distance)
3.2 Classification speed depends on the size of the training data 12:50
3.3 Speed up classification 14:06
3.4 How to measure similarity? 14:23: L2 distance (Euclidean distance)
3.5 The choice of distance is a hyperparameter 15:00
*4. k-Nearest Neighbor **15:17**: similar to Nearest Neighbor classifier, but instead of finding the most similar training image, kNN finds k most similar images, and has them to vote on the label*
4.1 Distance and k are both hyperparameters 18:33, we have to try different values to see what works best
4.2 Training/validation/test data, and cross-validation 18:56
4.3 kNN on images never used 24:00
4.4 Summary 25:20
*5. Linear classification **25:59*
5.1 Parametric approach 29:47
5.2 Linear classifier 31:21
5.3 Interpreting a liner classifier 40:01
5.4 What would be a very hard test dataset for liner classifier? 47:33
5.5 The needs for a loss function 52:25
some questions students asked were exceptionally good...
32:25 Probably the first coherent explanation I've heard about what "bias" actually represents.
Thank god for making cats, they seems to be essential to understanding ML
I was around 12 when I used to watch your speedcubing videos and got into sub-20s. I am now 20 and watching your CNN videos. Your explanation style like before, is extremely lucid.
Time flies so fast! Welcome back to TH-cam, Mr. Badmephisto.
+World Music Cafe Is this really Badmephisto?? I never connected the dots but I used to watch him and knew that he was going to stanford
Yup, he is Badmephisto. I knew it early that his name was Andrej, but I never knew that he is going to be this famous. He is smart, after all.
Wow, I used to watch his speedcubing videos too and now I am here watching his CNN videos.
Haha, this thread is so cool.
Me too... He was the OG cubing TH-camr.
What happened there at 15:00 is called "Dropout", and it's also a regularization technique like L1 and L2.
After watching so many MOOC's , this is the first class I saw where students are asking so many questions. Just an observation.
really like the part of interpreting simple models in various ways :) looking forward for more such intuitive interpretations for more complicated networks
Terrific explanation. One hell of a faculty
I know! I want to study there so bad!
You're a very good lecturer keep it up. I'll be looking for more online courses from you buddy.
At 22:22, a student asks if the reason the larger K values lead to poorer accuracy is due to high bias (i.e., underfitting), and the instructor says he can’t say that for sure. But it seems to me the student was correct: that a too high K value leads to underfitting, at least as to that particular dataset. (Conversely, a too low K value leads to overfitting). Anyone agree?
Andrej actually said "I would be careful with that terminology."
@@GohOnLeedsBut he concluded with “It’s basically hard to say.” 22:41
Great explanation! Wish my prof was as good as explaining as you are!
thanks a lot, it will be great of there can be MOOC for this course
It'll be very good if we can get all the links displayed in the class in the description below.
What would be a good approach to gray scale? For example a radiograph looking for a know defect?
in 33:33 why W equal (3, 4), I know that we have 3 classes but there is just one example which output must be (1, 1). so what if I have 200 example , 20 classes and 1000 nx, should w be (20, 200, 1000) ?
Lecture starts at 3:03
What happened at 51:13?
thanks for uploading videos. great work.please keep it up
why does translation of object position in the image would not have large effects on the linear classifiers?
AFAIK, it does have huge effect on the performance of the linear classifier.
To clarify, he says about situation when dog in the center is one class, and in the corner is different class. Then it is not a problem for linear classifier, it can update weights the way to distinguish these classes, that differ by position (image can be the same, but position defines class)
I do not understand the images which would be hard to classify by a linear classifier, and also interpreting a linear classifier in a high dimensional space.
How is the classifier drawing boundaries around clusters of 'dog' images when projected into the high dimensional space.
I imagined the images as 3x1 pixel images and projected into the 3D space. Multiple images can be plotted here and lets say the images with high brightness in pixel 2 gets clustered on the y axis with different x and z variables, now a 2D plane (xz) can be drawn separating the area above and below these brightness clumps, which can be treated as a separation boundary. Is this interpretation correct ?
Q: What does the linear classifier do, in English?
A: Create a dart board with N high score areas. When you throw a new dart, it will tell your dart is an airplane.
Amazing Course ! Thank you!
I summarized his awesome lecture: we tried to comprehend linear function in linear classification: f(x)=ax+b (more detail at: en.wikipedia.org/wiki/Linear_function).
Linear classification is starting point toward convolution neural network (CNN). In my opinion, linear classification 's in 1 word is "trend" , if it 's in full explanation : this is line of railway that will take all objects near or inside it a label or the name of object.
47:40 same colors but different objects like ship and car of same color and sky and ocean
also quadratic or circular distinctions
How is the L2 distance the same between the original and distorted images? Wouldn't the distortion lead to a non-zero difference in the pixel values, making them different?
hello i'm here to feel how fast he is. from comments of Deep rl bootcamp 3rd lecture. wow. its was real.
One of the questions that Would it be problem if image at centre or image at the right? I guess the answer is No, i.e the linear classifier would handle that but I'm not sure it's correct
To clarify, he says about situation when dog in the center is one class, and in the corner is different class. Then it is not a problem for linear classifier, it can update weights the way to distinguish these classes, that differ by position (image can be the same, but position defines class)
Great stuff. Just FYI in case you didn't notice: the video resolution is a bit low (360p). The video for the first lecture was available in up to 1080p which is sometimes helpful for reading small details. Thanks a lot anyway.
+Daniel de Freitas Adiwardana The original video is recorded at 1080p. It takes TH-cam a few hours to fully process it (which is why it starts at 360p)
in linear classifier what does the images in training set do if we select the W matrix randomly?
How are templates for classes made?
Nice to have videos this time, I'm looking forward to more. One nitpick, you referred to a linear classifier as "parametric" and nearest neighbor as "nonparametric" but these terms seem to have a different meaning in statistics. Parametric refers to parameterized probability distributions, e.g. the normal distribution is defined by the mean and variance parameters. In this sense, even if we have a neural network approximating a normal distribution the model would be nonparametric since it wouldn't explicitly use the mean and variance parameters. To avoid this confusion, even though it's a mouthful, I prefer the term "parameterizable" to distinguish models that do and don't use any parameters.
Thanks for the awesome videos!
Recommendable, but a bit to trivial
0.75 speed is good for me.
WOW. You are really living on the edge here. I would go with 0.25 :D ;)
On slide 51 at 33:12 the ship score is incorrect for the numbers shown. It should be 60.75, the value of 61.95 comes from calculating with the last ship weight as 0.3, and not -0.3.
He is so smart.
Can anybody explain how vornai tessalation changes with k in k-nn? Or how to get decision boundaries for k-nn??
Very good lecture, Andrej. Also, the student who interrupted Andrej by saying 'that was what I said' didn't have a good tone and annoyed me.
How could people in Chinese area load vitual machine and assignment link?Thanks.
My teacher
watched the lectures in the hotel room at lake tahoe
before that everything I know can be easily traced to IOI and that gap year I spent in Bulgaria 2015-2016
Garcia Elizabeth Clark Mark Wilson Brian
这集的中文翻译不如lecture 1
cannot understand the machine learning :((((((((((
So prof does not know how to fix projector but know ai.
Who else thinks that he reminds you of Justin Timberlake....?
I believe the Chinese subtitle is translated by machine. Total disaster :(
And he talks so fast its really hard to catch up.
Good lecture though.
He talk so fast compared with prof. Li fei fei. Why prof.Li FF don't continue teaching cs231n??
+huynh huy nguyen I believe Andrej has always been the main instructor. Click on the CC button for the subtitles.
+huynh huy nguyen She mentions in the lecture 1 video that is going to have a baby sometime in the coming weeks.
+huynh huy nguyen You can adjust the speed with the little gear icon on the bottom of the video
@@NeigeFraiche Thank you
中文翻译太差劲了