Lecture 6 "Perceptron Convergence Proof" -Cornell CS4780 SP17

Kilian Weinberger

มุมมอง 38 303

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 18 ต.ค. 2024

ความคิดเห็น • 76

@sebastianb6982 4 ปีที่แล้ว ⁺⁵⁰
These lectures are gold! Thank you so much for putting them online! :-)
@rodas4yt137 4 ปีที่แล้ว ⁺¹²
Love that you ask to raise hands "if you understand". It really shows the will of teaching's there.
@marijastojchevska9193 4 ปีที่แล้ว ⁺⁸
Very comprehensive and clear! Thanks for sharing this video with us.
@dantemlima 5 ปีที่แล้ว ⁺¹⁶
Great Teacher! I've never heard of any of this a week a go and I'm able to keep up at each step. Danke schön prof. Weinberger. Is it possible to make the placement exam available? Thank you.
@mikkliish ปีที่แล้ว
this is the best expanation of th e convergence proof on youtube at the moment
@tonightifeellikekafka 4 ปีที่แล้ว ⁺⁹
I'm really hoping you still view the comments on these videos.
Is there any way to know what the programming projects involved? The assignments and lecture notes are obviously incredibly useful, but I don't feel confident without doing any coding. Would appreciate it so much if the programming projects, at least the description of them, were made available
@kbkim-f4z 5 ปีที่แล้ว ⁺⁴
Great lecture! Now i can understand it. Great thanks from South Korea
@junjang7020 ปีที่แล้ว
currently taking 4780, and I still come home and watch your videos!
@hypermeero4782 ปีที่แล้ว
Professor Kilian, I really wish I get to meet you someday, I can't express how much I appreciate you and value those lectures
@jamespearson5470 ปีที่แล้ว
Great lecture! Thank you Dr. Weinberger!
@samarthmeghani2214 2 ปีที่แล้ว
Thanks a lot sir for making this video. I loved the way you explain each and every step of proof in an easy way. Again Thank you @Kilian Weinberger Sir.
@omalve9454 ปีที่แล้ว
This proof is beautiful!
@j.adrianriosa.4163 3 ปีที่แล้ว ⁺¹
Amazing explanation!!
@shuaige2712 3 ปีที่แล้ว ⁺¹
great lectures. great teacher
@rezasadeghi4475 3 ปีที่แล้ว ⁺¹
I'm really enjoying your lectures professor. Is there anyway I can access the projects?
@arddyk 2 ปีที่แล้ว
It was amazing professor. really helpfull
@connorfrankston5548 2 ปีที่แล้ว
The intuition (for me) is that wTw* and wTw both grow (at least and at most) linearly in the number of updates M, but wTw* is linear in w while wTw is quadratic in w.
@sankalpbhamare3759 2 ปีที่แล้ว
Amazing lectures!!
@shashihnt 3 ปีที่แล้ว ⁺¹
Really enjoy watching your lecture, thank you very much. Do you plan to put this course (along with projects) on Coursera or any online platform.
@kilianweinberger698 3 ปีที่แล้ว ⁺¹
Cornell offers an online version through their eCornell program. ecornell.cornell.edu/
@shashihnt 3 ปีที่แล้ว
@@kilianweinberger698 Thank you very much. I will have a look.
@vatsan16 4 ปีที่แล้ว ⁺⁷
Where is the valentines poem of the prooff??!?!?!
@blackstratdotin 3 ปีที่แล้ว
brilliant lecture indeed!
@hdang1997 4 ปีที่แล้ว ⁺¹
17:54 "The HOLY GRAIL weight vector that we know actually separates the data"
@jumpingcat212 7 หลายเดือนก่อน
Hi Professor Weinberger, it looks like this lecture is about a smart algorithm created by smart people which can classify datas into two classes. But in the first introduction lecture you mentioned that machine learning is about computer learning to design a program by itself to achieve our goal. So I' confused what's the relationship between this perceptron hyperplane algorithm with machine learning? It looks like we human just design this algorithm and code it into a program and feed it into a computer to solve the classification problem...
@kilianweinberger698 5 หลายเดือนก่อน
So the Perceptron algorithm is the learning algorithm which is designed by humans. However, given a data set, this learning algorithm generates a classifier and you can view this classifier as a program that is learned from data. The program code is stored inside the weights of the hyperplane. You could put all that in automatically generated C code if you want to and compile it. Hope this helps.
@jpatel0924 2 หลายเดือนก่อน
Bro MITs AI professor told in one of his video, something is intelligent as long as we don't know how it works, if we get to know how it works (fixed algorithms) all intelligence vanishes...
That was really philosophical 😅😅,
I thought I might share that with you.
@consumidorbrasileiro222 4 ปีที่แล้ว
there's one thing I couldn't get: why is gamma defined from the "best" hyperplane? if M is bounded by 1/gamma² and gamma could be arbitrarily close to zero (if you pick the worst possible hyperplane for instance), then the proof is spoiled.
@consumidorbrasileiro222 4 ปีที่แล้ว
oh okay I get it. finding other bigger bounds for M says nothing about the lowest bound you found.
@abs4413 ปีที่แล้ว
Hi Professor. At 32:04, you write that w.T dot w_star = abs( w.T dot w_star ). How does it follow that the dot product of those two vectors is necessarily positive? My intuition says that the first update of w will point w in the direction of w_star making the dot product positive. It makes sense, but it does not seem a trivial statement to me. w.T and w_star could be pointing in opposite directions and thus yield a negative dot product. What am I missing? :) Thanks.
@abs4413 ปีที่แล้ว
I have somehow figured out the answer, minutes after posting this question. w starts as the zero vector and w.T dot w_star can only increase after each iteration, by at least gamma. Thus, making w.T dot w_star positive and making the following statement true: w.T dot w_star = abs( w.T dot w_star )
@kirtanpatel797 5 ปีที่แล้ว ⁺²
Does anyone know which 5 inequalities professor is talking about ?
@nipo4144 4 ปีที่แล้ว
is it possible to detect divergence of algorithm during learning, i. e. the case when data is not linearly separable? Can we infer gamma from data to check wheter we exceeded 1/gamma^2 updates?
@yangyiming1985 3 ปีที่แล้ว ⁺²
love the last story haha
@MrWhoisready 3 ปีที่แล้ว ⁺¹
I can't understand something:
M is positive, Gamma is positive then M times Gamma is positive.
After 1 update, M = 1.
(w^T)*(w^*) can be negative, since (w^T) might have started pointing the opposite direction of (w^*)
Then how can (w^T)*(w^*), a negative number be greater than a positive number of 1 times gamma (M*gamma)?
@tama8092 2 ปีที่แล้ว
Here, w is initialized as 0 vector, so, (w^T)*(w^*) would be 0. Thus, after first update, it will be at least gamma. And like this (atleast) gamma keeps getting added at each update, thus making it a positive value.
Refer, th-cam.com/video/vAOI9kTDVoo/w-d-xo.html, to see how it converges when w is initialized randomly.
@jiviteshsharma1021 4 ปีที่แล้ว ⁺²
how is y^2xTx smaller than one when we know that y^2 is equal to 1 , so if the xTx term is less than one but positive the whole term becomes greater than 1 and the inequality that the expression is greater than
@consumidorbrasileiro222 4 ปีที่แล้ว ⁺¹
y^2 = 1
xTx y^2xTx
@jiviteshsharma1021 4 ปีที่แล้ว ⁺²
@@consumidorbrasileiro222 Oh yea I misse the fact that were raising the whole term to a power and if xTx is less than 1 the whole term will be
@AhmadMSOthman 5 ปีที่แล้ว ⁺¹
In 27:05 you write:
2y(w^T • x) < 0
Why is it not
@kilianweinberger698 5 ปีที่แล้ว ⁺²
Oh yes, good catch,
@dariannwankwo2718 5 ปีที่แล้ว ⁺²
2y(w^T * x) < 0 ==> that the data point was classified incorrectly. Even if it was
@bumplin9220 2 ปีที่แล้ว
Thank you sir
@tudormanoleasa9439 2 ปีที่แล้ว
32:39 what are the other 4 inequalities that everyone should know?
@lkellermann 4 ปีที่แล้ว
I am watching these lectures and wondering if it will be any moment on data science journey where the matrices will be self adjoints
@hrt7180828 2 ปีที่แล้ว
if our data is sparse, after scaling it to a circle with a radius of 1, won't it shift to dense data distributions and cause problems When we scale our data?
@omkarchavan2259 3 ปีที่แล้ว
why didnt you wrote second constraint as wT(w+yx) instead of (w+yx)T(w+yx)? im confused
@Alien-cr1zb ปีที่แล้ว
did anyone manage to find the projects or anything related to this class
@ghoumrassi 4 ปีที่แล้ว ⁺²
I understand up to the point that w^tw* increases by at least gamma and w^tw increases by at most 1 but I do not understand how this proves that w necessarily converges to w*, could someone help me out please?
@shivammaheshwary2570 4 ปีที่แล้ว ⁺³
I think he means that if the second condition is true then the only way the inner product of w and w* increases is if they align themselves better than before( cos theta increases), so w is indeed moving towards w*.
@hrushikeshvaidya1786 4 ปีที่แล้ว
At 8:38, why do we rescale w star? Can we not just leave it with a norm of 1?
@KulvinderSingh-pm7cr 5 ปีที่แล้ว ⁺⁸
What are other 4 inequalities in computer science ??
@StarzzLAB 4 ปีที่แล้ว
Wondering the same thing
@HhhHhh-et5yk 4 ปีที่แล้ว ⁺¹
Professor or anyone please tell why we need to consider the effect of update on
w transpose w star ,
w transpose w.
Please reply!
@XoOnannoOoX 4 ปีที่แล้ว ⁺²
wTw* increasing means that they are getting more similar. But there is another case, in which w is just scaled. by showing that wTw is not increasing, we can show that w is not being scaled, so wTw* must be getting more similar
@HhhHhh-et5yk 4 ปีที่แล้ว
@@XoOnannoOoX whoah! Thank u so much 💯
@vincentxu1964 5 ปีที่แล้ว
I have a question about convergence. From my understanding, since there are different satisfiable margin, there would be a set of w*. So w* is a random variable, which means, if there exists a set of hyperplane, the algorithm will converge w to a random variable w*, but not a fixed w*. Not sure if I understand correctly.
@architsrivastava8196 3 ปีที่แล้ว
Why is the minimum distance between the point and the hyperplane = inner product between the point and w*?
@kilianweinberger698 3 ปีที่แล้ว ⁺¹
Because (w^*)T(w^*)=1. (For details check out the detailed proof here: www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote03.html )
@architsrivastava8196 3 ปีที่แล้ว
@@kilianweinberger698 Thank you Professor!! You're a legend!!!
@gregmakov2680 2 ปีที่แล้ว
why dont we have Nobel prices for Computer Science??? This algorithm is worth of 10 Nobel prices, indeed.
@TrackLab 5 ปีที่แล้ว ⁺¹
0:40 Hang on, Mr. Weinberger are you german? You`re last name might gives a hint but you never know. But that German was perfect! Not only by the choosen words, also by the way you pronounced them all was absolutley perfect German.
@kilianweinberger698 5 ปีที่แล้ว ⁺⁵
Ja, ich bin in Bayern aufgewachsen. :-)
@gregmakov2680 2 ปีที่แล้ว
hahahaha, see! sitting in the class is not always the optimal choice :D:D:D:D thay dan dat my dan nhoi so mot hoi deo ai hieu gi het :D:D:D bang chung hung hon, chinh thay nhan thay luon nha, deo phai tui pd noi xau thay nha :D:D:D
@DommageCollateral ปีที่แล้ว
wäre cool wenn du auch einen deutschen kurs hättest
@abunapha 5 ปีที่แล้ว ⁺¹
Starts at 0:53
@yrosenstein 5 ปีที่แล้ว
you defined the margin wrongly.
xt•w is the projection of the vector xt on w.
the distance is ||x-w||
@vincentxu1964 5 ปีที่แล้ว
I think what he said was that margin is the minimum distance of x to hyper plane. Since the direction of hyper plane is w, so xTw is projection of x on w, which is the distance from x to hyperplane.
@yrosenstein 5 ปีที่แล้ว
@@vincentxu1964 only assuming that ||w||=1
@vincentxu1964 5 ปีที่แล้ว
@@yrosenstein Yeah I think so. You can take a look at lecture 14. I think he redefines the margin with any w.
@ivanvignolles6665 5 ปีที่แล้ว
he said the distance to the hyperplane defined by w, not the distance to w itself, now the distance of x to the hyperplane is equal to the projection of x on w
@sudhanshuvashisht8960 4 ปีที่แล้ว
No, Prof is correct. The margin is defined correctly as well (i.e. the distance of shortest point from hyper plane). Read the proof here: www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote09.html#:~:targetText=Margin,closest%20point%20across%20both%20classes.
@roronoa_d_law1075 4 ปีที่แล้ว
plot twist
gamma = 0
M
@mrobjectoriented 5 ปีที่แล้ว
At 43:02, that face on the blackboard

ต่อไป

เล่นอัตโนมัติ

Lecture 5 "Perceptron" -Cornell CS4780 SP17