To represent the graph why not take a singular value decomposition of your node-embedding lookup matrix? Use the eigenvector associated with the largest eigenvalue.
@@AlexAlex-ij6eo Good point. It might be decent if we concatenate the first few eigenvectors into one long vector. Or concatenate all of the eigenvectors into one long vector. [EV1, EV2, ..., EVN] (EV = eigenvector)
I guess the main reason is the computational cost. Sampling a walk may be expensive per each iteration, but overall the asymptotic complexity is lineal. Computing the largest eigenvector with the power method is at least quadratic since it involves matrix products. And if you want to find several eigenvectors the cost is cubic or almost cubic depending on the approximations you make. For deep learning you always want linear methods, because any quadratic method applied to a problem with millions of samples is simply unfeasible, even with parallelism.
@@joseperezcano4954 Yeah I see that. But I think the cost for embedding one graph with SVD is not expensive but I can see the cost growing if you are embedding thousands of molecules. I tried to find the order of complexity for SVD algorithm but it was too difficult.
Anonymouns walk embeddings looks like a sophisticated but potentially fruitful topic. I didn't understand the benefit of the delta-size window idea. It seems that random walks doesn't depend on each other and their order is random as well. I.e. I don't see much graph information in the fact that specific random walk instances go consequently; it more looks like a random thing. Therefore I'm particularly curious about seeing a demo: what embeddings are learnerd on an example graph.
The reason why anonymous walks are useful is and is deterministic of graph information is because it is an unbiased way of looking at what specific structures are present in the graph, while ignoring the source node. It would be a truly random thing if all paths had equal probability. But in practice, all possible random walk paths don't have equal probability. Therefore leading to a variety of possibilities in embedding space. Now for your \delta question, please look at the Cont. Bag of Words model to get the significance of \delta. This has to do with overfitting and space and time complexity. Now the advantages of this ARW approach - 1. By forgetting the node identity, we are giving graph, the ability to find different local structures arising out of possibly the same node. Note how Node2Vec has a fixed and biased probability for a given node. So even if some alternate interesting structure is present arising out of the given node, it will be dominated by the bias provided in p and q. 2. Given the length of anon random walks, we can tell the number of anon random walks we need to sample to encode accurately graph's anon rand. walk probabilities. This is a big advantage!
nit: Slide at 0:58 shall read "Run a standard node embedding ..."
To represent the graph why not take a singular value decomposition of your node-embedding lookup matrix? Use the eigenvector associated with the largest eigenvalue.
What if the explained variance of the largest eigenvalue is low, let's say 10-20%. Then it might be not a very good representation.
@@AlexAlex-ij6eo Good point. It might be decent if we concatenate the first few eigenvectors into one long vector. Or concatenate all of the eigenvectors into one long vector. [EV1, EV2, ..., EVN] (EV = eigenvector)
I guess the main reason is the computational cost. Sampling a walk may be expensive per each iteration, but overall the asymptotic complexity is lineal. Computing the largest eigenvector with the power method is at least quadratic since it involves matrix products. And if you want to find several eigenvectors the cost is cubic or almost cubic depending on the approximations you make. For deep learning you always want linear methods, because any quadratic method applied to a problem with millions of samples is simply unfeasible, even with parallelism.
@@joseperezcano4954 Yeah I see that. But I think the cost for embedding one graph with SVD is not expensive but I can see the cost growing if you are embedding thousands of molecules.
I tried to find the order of complexity for SVD algorithm but it was too difficult.
@@tag_of_frank To my knowledge the time complexity of SVD is somewhere around cubic, O(min(mn^2, nm^2)) for a matrix of size m x n.
Approach 3 has been excluded from the slides as of March 2023. Should we ignore it?
Any ideas based on random walks is used only for small graphs.. Or I'm wrong?
Anonymouns walk embeddings looks like a sophisticated but potentially fruitful topic.
I didn't understand the benefit of the delta-size window idea. It seems that random walks doesn't depend on each other and their order is random as well. I.e. I don't see much graph information in the fact that specific random walk instances go consequently; it more looks like a random thing. Therefore I'm particularly curious about seeing a demo: what embeddings are learnerd on an example graph.
The reason why anonymous walks are useful is and is deterministic of graph information is because it is an unbiased way of looking at what specific structures are present in the graph, while ignoring the source node. It would be a truly random thing if all paths had equal probability. But in practice, all possible random walk paths don't have equal probability. Therefore leading to a variety of possibilities in embedding space.
Now for your \delta question, please look at the Cont. Bag of Words model to get the significance of \delta. This has to do with overfitting and space and time complexity.
Now the advantages of this ARW approach -
1. By forgetting the node identity, we are giving graph, the ability to find different local structures arising out of possibly the same node. Note how Node2Vec has a fixed and biased probability for a given node. So even if some alternate interesting structure is present arising out of the given node, it will be dominated by the bias provided in p and q.
2. Given the length of anon random walks, we can tell the number of anon random walks we need to sample to encode accurately graph's anon rand. walk probabilities. This is a big advantage!