The lecture starts at 2:00! Amazing explanation on how to pick the right algorithm for your dataset 27:10 otherwise cause bad ML choices! The lecture starts to approach the k-NN algorithm at 36:00 (before it's about the training, validation, and test set and about the minimazing the expected error).
“When you watch these from bed , they get boring”: sorry Professor, I’m rewatching this class the fifth time and it’s NEVER bored me Every time I rewatch , I get new appreciation for a new subtlety of the things you say. It’s gotten to the point that I kinda imitate you when I’m interviewing with companies. In my interviews, it kinda takes the pressure off when I just think of it as your class and me explaining what’s taught in class
I took a grad-level class in machine learning and got an A, but only now do I realize how crappy my professor was and how little I actually understood. I am really glad I am able to view these lectures for free. Thank you, Dr.Weinberger!
at 39 minutes Prof. Weinberger said "raise your hand if that makes sense" I actually did !! super high quality content here. That's the level of engagement being created across the world. Respect from India !!
These videos help people from other countries which for some reason can't have access to a get a degree in machine learning. , ...In my case now I know exactly why I should not split the data randomly with the datasets that I use at my work, thanks so much.
Want to get started on some machine learning studying and this is great! Easy to watch while performing menial tasks at work and I can review anything I have questions on at home. Having the notes available to read from ahead of time and then look at during and after the video is tremendous for understanding, thank you very much for providing everyone with such a great font of knowledge.
Professor Weinberger, I have taken two graduate-level courses in ML, and I believed I had an understanding until I started your course at eCornell. Man, Build your University! I’m speechless about the level of quality of your lectures! Thank you!
I have a prediction to make, " If the classroom has one the boards which u know move up and down, and the prof uses it over Projector, the class is going to be awesome." Thanks Dr. Kilian for the whole series.
2:00 Lecture begins - Recap of last lecture 7:50 Can’t split train/test any way we want 11:35 Very often people split train/validation/test. Take best on validation set 24:20 Question for class. as n goes to infinity… 26:15 Weak law of Large Numbers. The average of a random variable becomes the expected value in the limit 27:30 How to find the hypothesis class H. The Party Game. 36:00 k-Nearest Neighbors 41:45 Only as good as its distance metric
I just finished my first semester studying Data Science and today was supposed to be my first day of holidays, yet I have already watched three of the lectures and still going on. I knew how to apply some of the algorithms in R, but knowing the intuition behind them makes it much more clearer. Thank you professor Weinberger for the amazing content.
Thanks for the lecture! The party game example is really insightful and one that you for sure remember in the future. I also appreciate the jokes a lot, they make the lectures highly engaging!
He is Hermann Minkowski and was Einstein's teacher. Minkowski metric is the metric of flat space time and forms the backbone of special relativity. The ideas developed by Minkowski were later extended by Einstein to develop the theory of general relativity.
Hello Dr Kilian, Greetings from India ! I loved your videos. Could you please take up some modules/lectures specialized in deep learning. Will go binge watching on that too. 😃 Best, Sumit
Hello Prof Weinberger, I am really enjoying your lectures a lot. Wish I was there in-person in this Fall or next Spring. I was wondering if us viewers online could have access to some older homeworks or assignments for practice. That would be the best! Thanks!
Sir, u mentioned about 11 data points on a case @23:00 , how about we try Bootstrapping on it and then find best Hypothesis class and function subsequently?
Was there a paper on the medical problem with only 11 samples? I was doing a sample study for small sample size problems and was curious what sort of algorithm were used over such a small sized dataset.
Guys where is video lecture for 1-NN Convergence Proof Cover and Hart 1967[1]: As n→∞, the 1-NN error is no more than twice the error of the Bayes Optimal classifier.
Hi Sir. Can you please guide me as to where I should study the maths required for ML. I did a few courses but it only covered basic calculas and stuff. I had no clue about the weak law of large numbers u talked about at 26:50. Please help
Once again thanks Prof.Kilian Weinberger for the amazing lecture, one question in the lecture notes: In the 1-NN Convergence Proof section it is mentioned as, "Bad news: We are cursed!!" & the convergence proof is for n tends to infinity but after watching lecture cursed problem occurs when dimensions(d) tends to infinity/large . So did I misinterpret the statement of cursed when n tends to infinity
This is a great way of letting seekers study. However, is there a way to add questions raised by students in a link. The recording seems to have a noise which refrains from hearing the questions well. Adding questions will add more value and we will be able to relate our questions to theirs & we will have less doubts!
Just to make sure, the x and z in the distance function (at 42:50) are the rth dimensions of the position vectors of the two points being considered, right?
Thanks a lot for your enthusiasm! Coming back to the discussion you had early on concerning splitting the datasets into training, cross validation and test set...My understanding is that for a given dataset D with m values, the first. Step is to train the algorithm on the training set to obtain a parameter, evaluated each parameter on the cross validation one and pick up the smallest one and train the lowest one on a new training set (training and cross validation set) and finally, test it on the test set. Is that correct? Also, concerning the knn algorithm, do you obtain the k parameter on the training or the cross validation set? I am a bit confused. Best regard, Axel from Norway.
Yes, if by “smallest one” you mean the one that leads to the smallest error. For kNN you can even compute the leave-one-out error i.e. you go through each training sample, pretend it was a test sample, and check if you were to classify it correctly with k=1,3,5,7,..,K. After you have done this for the whole set, you pick the k that lead to the fewest misclassifications (and in case of a tie the smallest k). Hope this helps.
@@kilianweinberger698 Can you do hyperparameter tuning on the training-validation test for multiple algorithms like SVM and Random Forest and then compare results on the test set or the comparing the output of multiple model should also be done on the training-validation set. If you are reproducing it for a paper.
You said if it's iid data, split it uniformly at random. What should have been the correct approach for the spam filter case then? Is it iid? I think not since some mails might be similar to others. Thank you.
You have to split by time. Let's say you have 4 weeks worth of data, put the first 3 weeks into training and the last week into validation. This way you simulate the real application case, namely that you train on past data to predict the labels of future data.
We have a quiz question in lecture notes : "How does k affect the classifier? What happens if k = n? What happens if k = 1?" I do not think it is discussed in lectures. In my opinion, k is the only hyperparameter in this algorithm. For k = n, we are taking mode of the entire dataset labels as the output for test point, where as for k =1 , it will be assigned label that of the closest nearest neighbor. I have a doubt here, as we are using distance metric, what if we have 2 points(for simplicity) that are at equal distance to test point and have different labels. What happens in that case for k = 1? Similarly, for k = n, if we have equal proportion of binary class labels, how does mode works in that case?
Yes, for k=n it is the mode and k=1 is the nearest neighbor. If the label assignment is a draw (e.g. two points are equidistant) a common option is break ties randomly.
hmm, you may need to be patient. I would recommend you understand logistic regression and gradient descent. If you cannot wait, skip after that, but you are missing out on some important concepts.
@Kilian Weinberger I have the same situation. But, I can go farther than gradient descent. How far do your recommend before jumping to Deep Learning so that the Loss in understanding DL is minimized?
Love your lectures! You briefly mentioned metric learning in regards to finding a good distance function, do you know of any good primers or general reading advice on this topic?
Maybe read one of my first papers on Large Margin Nearest Neighbors ( papers.nips.cc/paper/2795-distance-metric-learning-for-large-margin-nearest-neighbor-classification )
Past 4780 exams are here: www.dropbox.com/s/zfr5w5bxxvizmnq/Kilian past Exams.zip?dl=0 Past 4780 Homeworks are here: www.dropbox.com/s/tbxnjzk5w67u0sp/Homeworks.zip?dl=0
@@kilianweinberger698 sir it will be very helpful if you share the assignments too because from your demonstrations i see they are much different than the general ones we get in other colleges and we can learn a lot from them.i learn a lot from your lectures . every video i saw is the best i have watched for that topic
Hi Professor, thanks for making these videos publicly available. In your formalisation of the algorithm you define a test point as x (presumably a vector), but in your specification of the conditions for points excluded from the k-NN you introduce y’ and y’’ which, to me, either seem redundant if x' and x'' are vectors or have not been consistently applied if a point is now a tuple (x,y) in which case the distance function should be applied to 2 tuples. Am I missing something?
I also dont understand that part. At 39:45 he uses (x',y'), Im not sure if he meant ordered pair or two vectors named x', y'. Is there a difference between a vector and a tuple?
@@bluejimmy168 Hi, yes the notation is a bit confusing in my opinion. I think there is a technical difference between a vector and a tuple; what I meant above was whether x represents the entire vector object or it represents a value in one co-ordinate in a 2 co-ordinate vector representation which I call a tuple - an ordered pair is a tuple, I think.
@@kilianweinberger698 Yes it's clear - just wanted to confirm that I hadn't missed anything. Your lectures are lucid on the whole. Many thanks for sharing.
Professor, in regards to your spam classifier example, instead of splitting train and test data by time, what if you eliminated all duplicate emails prior to splitting and training? would that work in this case? thank you and thanks for posting these!
The problem is that there may be new spam types that appear. E.g. imagine on Saturday spammers suddenly start sending out "lottery spam". Even if the emails are not identical, your spam filter would pick up on the word "lottery" as very spammy - but this is unrealistic, as in the real world you wouldn't have seen any such spam before. Hope this makes sense.
Professor Kilian, I am coming to cornell to enroll in a Ph.d. on civil engineering this fall. I have watched some of your lectures and find them really engaging. I have some understanding of most of the topics on this course but I would like to take some classes on ML. Would you reccomend me to enroll in this course or any other? is this a grad course?
Welcome to Cornell! This is a graduate course, offered every fall. It’s probably a good choice if you want to learn the basics in ML. It also “unlocks” several more specialized courses.
@@kilianweinberger698 Is this lecture Series along with implementing these with python libraries is enough . And so i can dive into Deep learning. Reply please!
Dear Profesor, This is the best ML lecture I've ever seen. Are you going to provide more that kind of materials? PS. Are you looking for any postdocs? ;)
the reason that "most people do not do right actually" is not by themself, but the gap btw theory model and practical situations is not described clearly in almost all of lectures in all classes in the world!!!! this gap makes students confused heavy super a lot including me :D:D:DD:D
My normal speed for most youtube lectures is 1.5X and sometimes 1.75X. I think you're speaking a bit fast cause 1.5X sounds way too fast and I had to switch to 1.25.
I raise my hand unconciously when you say "Raise your hand if you understood.". Best lectures ever!
your lectures aren't boring at all!!!
The lecture starts at 2:00!
Amazing explanation on how to pick the right algorithm for your dataset 27:10 otherwise cause bad ML choices!
The lecture starts to approach the k-NN algorithm at 36:00 (before it's about
the training, validation, and test set and about the minimazing
the expected error).
“When you watch these from bed , they get boring”: sorry Professor, I’m rewatching this class the fifth time and it’s NEVER bored me
Every time I rewatch , I get new appreciation for a new subtlety of the things you say.
It’s gotten to the point that I kinda imitate you when I’m interviewing with companies. In my interviews, it kinda takes the pressure off when I just think of it as your class and me explaining what’s taught in class
Haha, thanks, and good luck with your interviews!
I took a grad-level class in machine learning and got an A, but only now do I realize how crappy my professor was and how little I actually understood. I am really glad I am able to view these lectures for free. Thank you, Dr.Weinberger!
at 39 minutes Prof. Weinberger said "raise your hand if that makes sense" I actually did !! super high quality content here. That's the level of engagement being created across the world. Respect from India !!
you sir, are GOAT of ML
This lecturer has tremendous charisma!
These videos help people from other countries which for some reason can't have access to a get a degree in machine learning. , ...In my case now I know exactly why I should not split the data randomly with the datasets that I use at my work, thanks so much.
Absolutely!!
I have shared summaries of these lectures translated to Spanish. I live in the US but grew up in México
Want to get started on some machine learning studying and this is great! Easy to watch while performing menial tasks at work and I can review anything I have questions on at home. Having the notes available to read from ahead of time and then look at during and after the video is tremendous for understanding, thank you very much for providing everyone with such a great font of knowledge.
Welcome to 2020 where your entire college semester is done from your bedroom. :D
Professor Weinberger,
I have taken two graduate-level courses in ML, and I believed I had an understanding until I started your course at eCornell. Man, Build your University! I’m speechless about the level of quality of your lectures! Thank you!
Excellent intuition on why validation sets are needed: 13:20
I have a prediction to make, " If the classroom has one the boards which u know move up and down, and the prof uses it over Projector, the class is going to be awesome." Thanks Dr. Kilian for the whole series.
2:00 Lecture begins - Recap of last lecture
7:50 Can’t split train/test any way we want
11:35 Very often people split train/validation/test. Take best on validation set
24:20 Question for class. as n goes to infinity…
26:15 Weak law of Large Numbers. The average of a random variable becomes the expected value in the limit
27:30 How to find the hypothesis class H. The Party Game.
36:00 k-Nearest Neighbors
41:45 Only as good as its distance metric
I just finished my first semester studying Data Science and today was supposed to be my first day of holidays, yet I have already watched three of the lectures and still going on. I knew how to apply some of the algorithms in R, but knowing the intuition behind them makes it much more clearer. Thank you professor Weinberger for the amazing content.
I was looking for an answer that was quite technical in another video but I got hooked. Thank you so much for providing such great knowledge.
Thank you for speaking to the assumptions associated with different models and the chaos of data in the real world.
Many thanks for the systematic presentation of ML. You make it so easy to follow the subject.
thanks for the lessons and especially providing coursework, notes, and exams
Thanks for the lecture! The party game example is really insightful and one that you for sure remember in the future. I also appreciate the jokes a lot, they make the lectures highly engaging!
This series of lectures brought back my love of learning
My uncle recommended me this channel. Very very very great class!!!
I honestly have more respect for Cornell because of Professor Weinberger's lectures.
one of my favorite lectures on ML
Amazing lectures sir.. Loved them.. This was just the thing I was looking for and not able to find earlier.
He is Hermann Minkowski and was Einstein's teacher. Minkowski metric is the metric of flat space time and forms the backbone of special relativity. The ideas developed by Minkowski were later extended by Einstein to develop the theory of general relativity.
Thanks from Italy!!!
KNN starts at 35:57
"Choosing between your mama and papa or something, what are you gonna do? I like them both."
Hello Dr Kilian,
Greetings from India !
I loved your videos. Could you please take up some modules/lectures specialized in deep learning. Will go binge watching on that too. 😃
Best,
Sumit
Hello Prof Weinberger, I am really enjoying your lectures a lot. Wish I was there in-person in this Fall or next Spring. I was wondering if us viewers online could have access to some older homeworks or assignments for practice. That would be the best! Thanks!
Sir, u mentioned about 11 data points on a case @23:00 , how about we try Bootstrapping on it and then find best Hypothesis class and function subsequently?
Was there a paper on the medical problem with only 11 samples? I was doing a sample study for small sample size problems and was curious what sort of algorithm were used over such a small sized dataset.
Thanks for uploading these Lectures!
Thx a lot for this nice course. i think at 48:06 it's just 32 and not 32 to the power of 32. Am i missing something Dear @kilian
yep, you are right. well spotted :-)
Guys where is video lecture for 1-NN Convergence Proof
Cover and Hart 1967[1]: As n→∞, the 1-NN error is no more than twice the error of the Bayes Optimal classifier.
I want that too.
Hi Sir. Can you please guide me as to where I should study the maths required for ML. I did a few courses but it only covered basic calculas and stuff. I had no clue about the weak law of large numbers u talked about at 26:50. Please help
Maybe check out Khan Academy www.khanacademy.org/
It is pretty good.
Thanks a lot sir
@@kilianweinberger698 thank you!!!!!!!!!!!!!!!!
Who has access to attend this class and prefer to watch online, really ???
hello sir :) How are you? Hope you are doing well. This si 2022 and nothing can beat your ml lectures. Watching it again :)
Once again thanks Prof.Kilian Weinberger for the amazing lecture, one question in the lecture notes:
In the 1-NN Convergence Proof section it is mentioned as, "Bad news: We are cursed!!" & the convergence proof is for n tends to infinity but after watching lecture cursed problem occurs when dimensions(d) tends to infinity/large . So did I misinterpret the statement of cursed when n tends to infinity
14:46 What is the difference between a validation dataset and a testing dataset? I think they are the same
No, validation set is part of training set to build the model, and test set is used to analyse how well your model generalize,
knn starts at 36:02
What is the programming choice for writing the assignments and project
This is a great way of letting seekers study. However, is there a way to add questions raised by students in a link. The recording seems to have a noise which refrains from hearing the questions well. Adding questions will add more value and we will be able to relate our questions to theirs & we will have less doubts!
Just to make sure, the x and z in the distance function (at 42:50) are the rth dimensions of the position vectors of the two points being considered, right?
x and z are the two vectors and [x]_r is the r-th dimension of vector x. Hope this helps.
@@kilianweinberger698 Oh, I get it now. Thanks for the clarification, professor!
I look forward to coming to Cornell this fall
Thanks a lot for your enthusiasm! Coming back to the discussion you had early on concerning splitting the datasets into training, cross validation and test set...My understanding is that for a given dataset D with m values, the first. Step is to train the algorithm on the training set to obtain a parameter, evaluated each parameter on the cross validation one and pick up the smallest one and train the lowest one on a new training set (training and cross validation set) and finally, test it on the test set. Is that correct? Also, concerning the knn algorithm, do you obtain the k parameter on the training or the cross validation set? I am a bit confused. Best regard, Axel from Norway.
Yes, if by “smallest one” you mean the one that leads to the smallest error. For kNN you can even compute the leave-one-out error i.e. you go through each training sample, pretend it was a test sample, and check if you were to classify it correctly with k=1,3,5,7,..,K.
After you have done this for the whole set, you pick the k that lead to the fewest misclassifications (and in case of a tie the smallest k). Hope this helps.
@@kilianweinberger698 Can you do hyperparameter tuning on the training-validation test for multiple algorithms like SVM and Random Forest and then compare results on the test set or the comparing the output of multiple model should also be done on the training-validation set. If you are reproducing it for a paper.
You said if it's iid data, split it uniformly at random. What should have been the correct approach for the spam filter case then? Is it iid? I think not since some mails might be similar to others. Thank you.
You have to split by time. Let's say you have 4 weeks worth of data, put the first 3 weeks into training and the last week into validation. This way you simulate the real application case, namely that you train on past data to predict the labels of future data.
We have a quiz question in lecture notes : "How does k affect the classifier? What happens if k = n? What happens if k = 1?"
I do not think it is discussed in lectures. In my opinion, k is the only hyperparameter in this algorithm. For k = n, we are taking mode of the entire dataset labels as the output for test point, where as for k =1 , it will be assigned label that of the closest nearest neighbor.
I have a doubt here, as we are using distance metric, what if we have 2 points(for simplicity) that are at equal distance to test point and have different labels. What happens in that case for k = 1? Similarly, for k = n, if we have equal proportion of binary class labels, how does mode works in that case?
Yes, for k=n it is the mode and k=1 is the nearest neighbor. If the label assignment is a draw (e.g. two points are equidistant) a common option is break ties randomly.
@@kilianweinberger698 Thank you for the answer Prof. Weinberger and for this amazing series as well.!
perfect courses sir, thanks.
Is the D(validation) can be also define as a beta test for the h(x) ?
great lecture!
He is the best
Is the algorithm only affected by euclidean distance or the number of classified points also matter?
just so brilliant
Sir, I wanna learn Deep learning, can i skip the rest of classes, I watched first 3 classes, guide me please
hmm, you may need to be patient. I would recommend you understand logistic regression and gradient descent. If you cannot wait, skip after that, but you are missing out on some important concepts.
@@kilianweinberger698 Thank you so much Sir
@Kilian Weinberger I have the same situation. But, I can go farther than gradient descent. How far do your recommend before jumping to Deep Learning so that the Loss in understanding DL is minimized?
Love your lectures! You briefly mentioned metric learning in regards to finding a good distance function, do you know of any good primers or general reading advice on this topic?
Maybe read one of my first papers on Large Margin Nearest Neighbors ( papers.nips.cc/paper/2795-distance-metric-learning-for-large-margin-nearest-neighbor-classification )
Is it possible to get the questions of test?
is there link to homework, exam and solutions for the same... it would be helpful
Past 4780 exams are here: www.dropbox.com/s/zfr5w5bxxvizmnq/Kilian past Exams.zip?dl=0
Past 4780 Homeworks are here: www.dropbox.com/s/tbxnjzk5w67u0sp/Homeworks.zip?dl=0
@@kilianweinberger698 sir it will be very helpful if you share the assignments too because from your demonstrations i see they are much different than the general ones we get in other colleges and we can learn a lot from them.i learn a lot from your lectures . every video i saw is the best i have watched for that topic
Sir, Where I can find the project files.
Hi Professor, thanks for making these videos publicly available. In your formalisation of the algorithm you define a test point as x (presumably a vector), but in your specification of the conditions for points excluded from the k-NN you introduce y’ and y’’ which, to me, either seem redundant if x' and x'' are vectors or have not been consistently applied if a point is now a tuple (x,y) in which case the distance function should be applied to 2 tuples. Am I missing something?
I also dont understand that part. At 39:45 he uses (x',y'), Im not sure if he meant ordered pair or two vectors named x', y'. Is there a difference between a vector and a tuple?
@@bluejimmy168 Hi, yes the notation is a bit confusing in my opinion. I think there is a technical difference between a vector and a tuple; what I meant above was whether x represents the entire vector object or it represents a value in one co-ordinate in a 2 co-ordinate vector representation which I call a tuple - an ordered pair is a tuple, I think.
sorry, yes, I was a little sloppy there. :-/ I hope you can figure it out from the context.
@@kilianweinberger698 Yes it's clear - just wanted to confirm that I hadn't missed anything. Your lectures are lucid on the whole. Many thanks for sharing.
Thanks professor !!!
Thanks from Germany
Professor, in regards to your spam classifier example, instead of splitting train and test data by time, what if you eliminated all duplicate emails prior to splitting and training? would that work in this case? thank you and thanks for posting these!
The problem is that there may be new spam types that appear. E.g. imagine on Saturday spammers suddenly start sending out "lottery spam". Even if the emails are not identical, your spam filter would pick up on the word "lottery" as very spammy - but this is unrealistic, as in the real world you wouldn't have seen any such spam before. Hope this makes sense.
YOU ARE AWESOME!
this is amazing, thank u sir
Professor Kilian, I am coming to cornell to enroll in a Ph.d. on civil engineering this fall. I have watched some of your lectures and find them really engaging. I have some understanding of most of the topics on this course but I would like to take some classes on ML. Would you reccomend me to enroll in this course or any other? is this a grad course?
Welcome to Cornell! This is a graduate course, offered every fall. It’s probably a good choice if you want to learn the basics in ML. It also “unlocks” several more specialized courses.
@@kilianweinberger698 thank you, professor. I will try to enroll this fall.
@@kilianweinberger698 Is this lecture Series along with implementing these with python libraries is enough . And so i can dive into Deep learning.
Reply please!
Thank you sir
Dear Profesor,
This is the best ML lecture I've ever seen. Are you going to provide more that kind of materials?
PS. Are you looking for any postdocs? ;)
Thanks! Unfortunately not at the moment.
the reason that "most people do not do right actually" is not by themself, but the gap btw theory model and practical situations is not described clearly in almost all of lectures in all classes in the world!!!! this gap makes students confused heavy super a lot including me :D:D:DD:D
Now in 2020 all classes are online :(. I am an undergrad and I want to learn about machine learning
Day 3 ✅
You are not too fast. In fact, I am watching the playlist with a minimum speed of 1.75 (due my schedule) :D
After a certain time the students were trying to buy some more time by stalling the Professor from moving on... been there, done that
is this under grad or grad lectures
Mostly undergraduates, but cross listed for graduates.
I'm watching this lecture at home...
Coool!
👍
My normal speed for most youtube lectures is 1.5X and sometimes 1.75X. I think you're speaking a bit fast cause 1.5X sounds way too fast and I had to switch to 1.25.
hahaah, 60 nam ML :D:D
ahaaan.😅
Someone give this man some water
Minkowski? Is it really Minkowski? I wonder where he's from.. Russia? :P :P :P :P
ahhaha, ai noi ong la hoc o nha chan lam :D:D:D len loi de bi nhoi so ha :D:D dau co ngu :D:D:D:D
hi nice
who said Germans ain't funny lmao
awful lectures, very unclear.