Thanks Jay! This is relevant whenever information is gathered over a graph neighborhood. Graph neural network techniques and Label Propagation are examples. I hope this helps. A follow up video will expand on this to show how this same approach is used by Graph Convolutional Networks.
It’s a great video, one thing (if I understood correctly) is to take different values for nodes features vector instead of nodes initial range positions. I didn’t catch that from the first time
I have this free course: www.graphneuralnets.com/p/basics-of-gnns, and this paid course: www.graphneuralnets.com/p/introduction-to-gnns. And I hope to start making videos again, but needed a little (or not so little) break :)
@@welcomeaioverlords Thanks for the reply! I read the video description later, and I have already enrolled. Would you have any plans of making the also the full course self-paced? I have a project that will involve GNNs but I'm on a postdoc that involves learning a lot of things, and that leaves me little room for separating a few months to practically only take the course.
7:35 I didn't get this: so why, when we sandwich "A", between D^(-1/2) A D^(-1/2) we know that "D" on the right side is related to the destination node?
Because if it's on the right, it multiplies the columns of A and if it's on the left, it multiplies the rows of A. To see this, do it by hand. It will also make it easier to recall that D only has non-zero entries along the diagonal.
@@welcomeaioverlords Thanks a lot. I think that I will buy your course, because I saw the content and you provide a good GNN explanation, also I thought that GNN's propagation is done by BFS algorihm, but you show the "waterdrop" approach, where did you read it? I tried to find good, readable GNN sources, but I saw only difficult-to-read scientific articles.
Great video!! I am currently working on an algorithm that builds an adjacency matrix from features extracted on a convolutional layer for a set of images (deep features). However, the formula just states that A is softmax(F)@softmax(F).T. This leaves me with an adjacency matrix with real numbers. Is this valid for message passing algorithms? I could imagine that it is some sort of “weighted edges”? Thanks!
(1,2,3,4,5) is irrelevant to the node degrees, the degrees stay as they are. (1,2,3,4,5) was just an arbitrary assignment of values to each node, to demonstrate node features. You might as well assign any other value to each node, the idea is that when you perform the matrix multiplication, for each node in a row only the neighbors of that node are able to propagate its feature value.
Even though I’ve almost finished my master’s, the generalization of simple scalar arithmetic such as multiplication and division to vectors still surprise me. For example going from a/d to the inv(D)*A formula. Which seems even harder to comprehend if D wouldnt be diagonal. Can anyone help me/point to any respurces that explain(not define) these fundamental matrix operations?
D is always diagonal by construction. I'm not sure this is what you're asking, but I'm not claiming that element-wise division can always be straight-forwardly expressed by an inverted matrix. In this particular case, since D is diagonal and we have the definition: D*D_inv = 1, I think it's straightforward to see (do it by hand for an example) that the inverse of a diagonal matrix is simply 1/elements of that matrix.
Why wouldn't message passing work without scaling with respect to the destination, and without the square root? i.e. just by using the initial formulation(D * A) with self-loop edges in A (and hence accounted in D), why can't I do the message passing?
It’s not that it wouldn’t work. It might work as well or better, depending on the problem. Sometimes no normalization is better (i.e. just a sum of messages). But the paper used this parameterization because it aided in the derivation and is numerically stable.
Is it possible to create a adjacent matrix from an image of a molecule or from just coordinates without knowng which components are connected ?. Thanks in advance.
Thanks so much for your class; I am a Ph.D. student, and your channel is helping me - way to go and Salute from Brazil.
Excellent! Thanks very much. That's outstandingly clear.
The resulting system implements diffusion with conservation of total, um, node stuff.
I enjoyed the video. Thank you! Would be awesome to mention example applications of message passing.
Thanks Jay! This is relevant whenever information is gathered over a graph neighborhood. Graph neural network techniques and Label Propagation are examples. I hope this helps. A follow up video will expand on this to show how this same approach is used by Graph Convolutional Networks.
@@welcomeaioverlords wonderful! Thanks for the answer! Looking forward to it!
It’s a great video, one thing (if I understood correctly) is to take different values for nodes features vector instead of nodes initial range positions. I didn’t catch that from the first time
Thanks! :) Enjoyed the video. Well explained and great examples.
Loving your channel. Too few likes for such a good video.
great video, thank you so much
Great video! Very clear.
Great explanation!
Wonderful video! Pity that it looks like you have stopped making them. If that were part of a Udemy course, I'd definitely take it.
I have this free course: www.graphneuralnets.com/p/basics-of-gnns, and this paid course: www.graphneuralnets.com/p/introduction-to-gnns.
And I hope to start making videos again, but needed a little (or not so little) break :)
@@welcomeaioverlords Thanks for the reply! I read the video description later, and I have already enrolled. Would you have any plans of making the also the full course self-paced? I have a project that will involve GNNs but I'm on a postdoc that involves learning a lot of things, and that leaves me little room for separating a few months to practically only take the course.
great explanation . Thanks
Brilliant
Great video, but that t-shirt tho!
edit: seriously, thanks for the animations as well!
\m/
7:35 I didn't get this: so why, when we sandwich "A", between D^(-1/2) A D^(-1/2) we know that "D" on the right side is related to the destination node?
Because if it's on the right, it multiplies the columns of A and if it's on the left, it multiplies the rows of A. To see this, do it by hand. It will also make it easier to recall that D only has non-zero entries along the diagonal.
@@welcomeaioverlords Thanks a lot. I think that I will buy your course, because I saw the content and you provide a good GNN explanation, also I thought that GNN's propagation is done by BFS algorihm, but you show the "waterdrop" approach, where did you read it? I tried to find good, readable GNN sources, but I saw only difficult-to-read scientific articles.
Great video!! I am currently working on an algorithm that builds an adjacency matrix from features extracted on a convolutional layer for a set of images (deep features). However, the formula just states that A is softmax(F)@softmax(F).T. This leaves me with an adjacency matrix with real numbers. Is this valid for message passing algorithms? I could imagine that it is some sort of “weighted edges”? Thanks!
I couldn't understand why you changed degree of each node (1,2,3,1,1) to (1,2,3,4,5). Maybe I missed some point but I am curious to answer.
(1,2,3,4,5) is irrelevant to the node degrees, the degrees stay as they are. (1,2,3,4,5) was just an arbitrary assignment of values to each node, to demonstrate node features. You might as well assign any other value to each node, the idea is that when you perform the matrix multiplication, for each node in a row only the neighbors of that node are able to propagate its feature value.
Even though I’ve almost finished my master’s, the generalization of simple scalar arithmetic such as multiplication and division to vectors still surprise me. For example going from a/d to the inv(D)*A formula. Which seems even harder to comprehend if D wouldnt be diagonal.
Can anyone help me/point to any respurces that explain(not define) these fundamental matrix operations?
D is always diagonal by construction. I'm not sure this is what you're asking, but I'm not claiming that element-wise division can always be straight-forwardly expressed by an inverted matrix. In this particular case, since D is diagonal and we have the definition: D*D_inv = 1, I think it's straightforward to see (do it by hand for an example) that the inverse of a diagonal matrix is simply 1/elements of that matrix.
Why wouldn't message passing work without scaling with respect to the destination, and without the square root? i.e. just by using the initial formulation(D * A) with self-loop edges in A (and hence accounted in D), why can't I do the message passing?
It’s not that it wouldn’t work. It might work as well or better, depending on the problem. Sometimes no normalization is better (i.e. just a sum of messages). But the paper used this parameterization because it aided in the derivation and is numerically stable.
Thanks a ton! You're awesome! :)
Is it possible to create a adjacent matrix from an image of a molecule or from just coordinates without knowng which components are connected ?. Thanks in advance.
You might dig into "Latent Graph Learning".
In order to derive an adjacency matrix from a coordinate file you would need to define pairwise bond lengths as threshold
I didn't understood the part 1,2,3,1,1
@ 4:57
As explained in 3:01 - node 1 has degree 1, node 2 degree 2, node 3 degree 3, node 4 degree 1, node 5 degree 1