Stanford CS224W: Machine Learning with Graphs | 2021 | Lecture 6.3 - Deep Learning for Graphs

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ต.ค. 2024
  • For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/3v...
    Jure Leskovec
    Computer Science, PhD
    In this lecture, we’ll give you an introduction of architecture of graph neural networks. One key idea covered in the lecture is that in GNNs, we’re generating node embeddings based on local network neighborhood. Instead of single layer, GNNs usually consist of arbitrary number of layers to integrate information from even larger contexts. We then introduce how we use GNNs to solve the optimization problems, and its powerful inductive capacity.
    To follow along with the course schedule and syllabus, visit:
    web.stanford.ed...

ความคิดเห็น • 23

  • @cumin_side
    @cumin_side 6 หลายเดือนก่อน +3

    After watching this lecture, I literally had an epiphany and got the whole picture about graph neural networks. Finally
    Thank you, Jure!

  • @TravelIsLove-22
    @TravelIsLove-22 2 ปีที่แล้ว +10

    the best set of videos about graph neural networks, thanks a lot.

  • @richardyim8914
    @richardyim8914 2 ปีที่แล้ว +5

    This is exactly what I needed. What a useful idea.

  • @BorisVasilevskiy
    @BorisVasilevskiy ปีที่แล้ว +2

    Nice lecture.
    One follow-up thought. I suppose, at each GNN layer we can calculate not one, but several hidden values for each node. Just like ConvNets produce several channels per a pixel. In other words, aggregated

    • @tcveatch
      @tcveatch ปีที่แล้ว +1

      I think you are referring to the dimensions of Wl and Bl. In old school NNs those are just one dimensional, representing the activation level of a node. Now nodes carry many dimensions of information and those are transformed at each level L by the matrices Wl and Bl.

  • @andreaurgolo1870
    @andreaurgolo1870 2 ปีที่แล้ว +1

    cross entropy loss in 30:15 should be minus the summation ( blah, blah, blah...)

  • @surbhisoni6340
    @surbhisoni6340 ปีที่แล้ว +2

    the explanation was so clear and very useful, thanks a lot

  • @NadavBenedek
    @NadavBenedek ปีที่แล้ว +1

    Shouldn’t the loss function in 31:42 have a minus sign?

  • @maksimkazanskii4550
    @maksimkazanskii4550 ปีที่แล้ว

    In the main vectorized formula we have W matrix. Would this matrix depend on the number of neighbors. If so we cannot write it in such a way. In current form the matrix multiplication assumes that if we change the nodes, the matrix would be changed.

    • @xintongbian
      @xintongbian 8 หลายเดือนก่อน

      In practice a GNN algorithm will sample node's neighbourhood so not all neighbours participate the aggregation process, in my practice, a neighbourhood sample of size 10 or 15 are often used. So if a node has more than 10 neighbours only 10 neighbours will be used, if a node has less than 10 neighbours then it's padded. You can refer to the PinSage paper for industrial practice details.

  • @tcveatch
    @tcveatch ปีที่แล้ว

    Wl and Bl are independent of nodes and edges. In a regular NN edge weights scale input nodes before summation in the next-level node, and learning the edge weights matrix is the goal of training. Here, not so; there is no weighting of inputs before summation, or rather the weight is identically 1/d where d is input degree of the node, for all the edges entering a node, so the accumulation step is an average. Instead the significance of a predecessor node is in the product Bl times its own value. Basically the edge information is pushed back into the predecessor node’s value, which will be broadcast uniformly to all its successor nodes rather than variably weighted according to edge weights. Wl and Bl merely weight the dimensions of the node embedding, I.e. they linearly transform the embedding space at each level, so that the kind of information being pushed around is transformed. For example pixels here might become low level features at the next level, and then higher level features as the evaluation depth, l, increases. If I understand correctly. Then the GNN learning system is not learning computation graph weightings (like Hebbian learning where usage increases edge weighting) but instead learns information transformations within the given graph structure, such that the output is best predicted. This is a totally different concept and should not be called neural networks which is a term that came to machine learning through the Hebbian model. Graph constrained information transformation models, is what these are.

    • @shubhampatel6908
      @shubhampatel6908 8 หลายเดือนก่อน

      I think the prof. clearly stated at 31:55 that W and B are improved, yes we decide the aggregator function and the loss function is not directly connected to W and B and instead connected to embedding but we are improving embedding for better outputs and embedding are the one which are being learnt in the explained example. I don't think its only transformation of information, the model is learning the embedding in similar way the standard neural nets learn. Please correct me if I am wrong, I am pretty new to GNNs. and not expert in Deep learning either.

  • @maksimkazanskii4550
    @maksimkazanskii4550 ปีที่แล้ว

    Why we do not use NN after we get the encodings for the nodes? Currently I assume we use classification weights and some non-linearity. It would be logical to have more hidden layers after we have so much of efforts in creating the embeddings.

    • @xintongbian
      @xintongbian 8 หลายเดือนก่อน

      The idea is that GNN, just like CNN, is a "feature extractor" of non-euclidean data, so think of what he talks about here as the feature extraction process, then you can use those graph embeddings in whatever downstream architecture. For example, I stack a DSSM-like structure after the GNN in my recommendation matching models.

  • @gcvictorgc
    @gcvictorgc 2 ปีที่แล้ว +1

    What are de dimensions of the matrix Wl? at 21' 40"

    • @pinkpanther7637
      @pinkpanther7637 2 ปีที่แล้ว +1

      The shape of W_l is |h_^{l+1}| × |h^l|.

  • @karanbania2785
    @karanbania2785 ปีที่แล้ว

    Thanks for the video! I have this one question,
    If our loss functions still target the same similarity notion as in node2vec, how are we learning different node embeddings? Maybe some contribution from the features but this will still be very similar to previous embeddings; then why not just use node2vec? My main concern is that using this similarity notion will lead to the same problems as with node2vec, i.e. structural information will be lost. (We do overcome other two limitations.)

    • @alexwong1120
      @alexwong1120 ปีที่แล้ว

      After you train a model, you can apply the trained model (with W, B) to unseen nodes, classification, predictions, and etc, as mentioned. Node2Vec only returns embedded matrix, which can only represent the information from the original graph but not be used on new unseen nodes.

    • @karanbania2785
      @karanbania2785 ปีที่แล้ว

      @@alexwong1120 thanks for this reply; however I did write in my question that we do overcome the other two limitations, which includes this, but my main concern is that I think the embeddings cannot capture similarity between nodes that are separated yet similar.

    • @alexwong1120
      @alexwong1120 ปีที่แล้ว +1

      @@karanbania2785
      Because the model is learning "weights", "bias", and any trainable parameters, not the embedding. A NN model is tarketing the embedding through optimizing the parameters.
      e.g. y = wx + b, where y is the given targeted embedding, x is a given feature. If y=10 and x=6, what the model do is to input x and learn different w and b to match the value of y.
      After training, e.g. w = 1 and b = 4 then you will get (1)*(6) + (4) -> 10 (which matched the value of target y).
      Like you said, now this example and the given target embedding y have exactly the same embedding result (probably overfit, haha), which is y = 10, but what the model return is the two parameters w and b. With the trained w and b, now you can classify and evaluate the similarity between new nodes and the nodes in original graph by inputting their features. Without this model result, if you use Node2Vec only, you will have to rerun the Node2Vec everytime a new node joins the graph. (Not sure if this is what you mean "separated yet similar")
      Rather than "learning different node embedding", you might also interpret this is a "different embedding method", which could capture the high dimensions pattern among features.

  • @docteurlambda
    @docteurlambda 6 หลายเดือนก่อน

    THX

  • @vinaykumardaivajna5260
    @vinaykumardaivajna5260 ปีที่แล้ว

    Great Explanation...