I needed to learn about KD Trees and Ball Trees for a specific application but after just 2 videos I am hooked! Now I am going to watch the whole course, thank you so much Kilian for making it public :)
Could anyone explain to me why ball trees are better in higher dimensional space (with a lower-dimensional manifold) than kd trees? I find it hard to imagine/understand when Prof. explains (12:55) it is because of the axis-aligned splits that kd trees are bad even in this setting (low-dimension manifold in high ambient dimension)
Because in higher dimensional space the boxes are more 'crushed up' together. In higher dimensional space points become equally far apart so their bounding boxes become more indistinguishable. One of the earlier lessons covers this.
Question about Ball Tree Complexity. To construct the ball tree, we need to perform argmax distance(x, xi) for xi in S, therefore the algorithm still need to go through all the data points and compute their distance to the random chosen point. In this sense, comparing to KNN, I don't see any advantage using ball tree, since the complexity is almost the same. Or it is because we consider constructing ball tree as a pre-computed process before testing, and we don't add that part of running time to the final running time? Great thanks!
If aligned axes is the issue , would random axis work , why did we create spheres instead of random aligned hyperplanes , any particular advantages to having spheres vs hyperplanes ? any pointers would help . thanks
Hi I got a slightly different understanding of KD tree from python scikit learn implementation. It says it uses or switches to brute Force method in the final search table to find the k nearest neighbor. The documentation does not talk about going up the tree and checking for neighbors hiding in other partitions. Not sure if I am able state my confusion Your ball tree example seems good. But scikit learn is quiet abt ball tree although it supports it for knn algorithm
What you describe sounds like Approximate Nearest Neighbor search. Here, one variant is to also build a tree (ball-tree, KD-Tree, or any other variant), however during test-time you descent into the leaf into which the test-point falls and return the nearest neighbor only from this leaf. There is no guarantee that you will indeed return the exact nearest neighbor in a global sense, but often this is good enough and can be very fast in practice.
What if we incorporate the label as an additional dimension and begin splitting from that first? Wont that always ensure that our leaf has only 1 label?(because in a way we are trying to explain the variability in labels through variability in dimensions/features so the variability in labels must be significant)
@Saket Dwivedi that wouldn't help because during testing your feature vector will not have that dimension of label that you just added because that's what we want to predict.
A huge thanks to you Prof. Kilian!!! Big fan of your teaching. You really kill the topic and it seems nothing more is left to know about the topic. 😀 I have a little question- why ball trees are not that famous, given they work so well and are invariant of the dimensionality of feature space?
They have gotten a little out of fashion, because they are not that suitable for GPUs. On GPUs it is often more effective to perform approximate nearest neighbor search with brute force parallelism.
A sphere is typically a tighter bound on the distance. In these data structures you want to bound the distance between the test point and a potential training point, and see if it could possibly be your nearest neighbor. In KD trees you say the distance from the test point to the training point is _at least_ the distance to the hyper-plane. In ball-trees you say, it is at least the distance to the sphere. Typically the latter is larger, i.e. a better approximation and allows you to prune away more points. Hope this helps.
No, KD trees only split along features and essentially place the data into boxes. Ball trees fit the data into little spheres. Another way of thinking about it is, imagine you have a data set and you rotate it. KD-trees will now create a very different data structure, because the axis have changed. Ball-trees will be exactly the same. In general, KD-trees tend to be better in very low dimensional spaces (2D, 3D) and are often used in computer graphics to find nearest neighbors for meshes. Ball-trees tend to be better in medium sized dimensionalities (e.g. 10D-50D).
I always tried to help, each time i did the cult took advantage now its their problem, and i dont know if i can help anybody everything has changed since last night
I needed to learn about KD Trees and Ball Trees for a specific application but after just 2 videos I am hooked! Now I am going to watch the whole course, thank you so much Kilian for making it public :)
🐘? 😂
Decision Trees at 35:14
this guy inspires me. thank you
Thank you for uploading this video,
Really great content. And very much appreciated! Thanks :)
Helpful. Sololearn uses KD trees in the machine learning tutorial.
Great
1:20 I thought he will say, "if something does not make sense then contact me" 😂😂😂
thanks a lot :) for the very unique explanation!
Could anyone explain to me why ball trees are better in higher dimensional space (with a lower-dimensional manifold) than kd trees? I find it hard to imagine/understand when Prof. explains (12:55) it is because of the axis-aligned splits that kd trees are bad even in this setting (low-dimension manifold in high ambient dimension)
Because in higher dimensional space the boxes are more 'crushed up' together. In higher dimensional space points become equally far apart so their bounding boxes become more indistinguishable. One of the earlier lessons covers this.
Question about Ball Tree Complexity. To construct the ball tree, we need to perform argmax distance(x, xi) for xi in S, therefore the algorithm still need to go through all the data points and compute their distance to the random chosen point. In this sense, comparing to KNN, I don't see any advantage using ball tree, since the complexity is almost the same.
Or it is because we consider constructing ball tree as a pre-computed process before testing, and we don't add that part of running time to the final running time?
Great thanks!
Yes. It is precomputed.
If aligned axes is the issue , would random axis work , why did we create spheres instead of random aligned hyperplanes , any particular advantages to having spheres vs hyperplanes ? any pointers would help . thanks
Hi
I got a slightly different understanding of KD tree from python scikit learn implementation. It says it uses or switches to brute Force method in the final search table to find the k nearest neighbor. The documentation does not talk about going up the tree and checking for neighbors hiding in other partitions.
Not sure if I am able state my confusion
Your ball tree example seems good. But scikit learn is quiet abt ball tree although it supports it for knn algorithm
What you describe sounds like Approximate Nearest Neighbor search. Here, one variant is to also build a tree (ball-tree, KD-Tree, or any other variant), however during test-time you descent into the leaf into which the test-point falls and return the nearest neighbor only from this leaf. There is no guarantee that you will indeed return the exact nearest neighbor in a global sense, but often this is good enough and can be very fast in practice.
What if we incorporate the label as an additional dimension and begin splitting from that first? Wont that always ensure that our leaf has only 1 label?(because in a way we are trying to explain the variability in labels through variability in dimensions/features so the variability in labels must be significant)
@Saket Dwivedi that wouldn't help because during testing your feature vector will not have that dimension of label that you just added because that's what we want to predict.
A huge thanks to you Prof. Kilian!!! Big fan of your teaching. You really kill the topic and it seems nothing more is left to know about the topic. 😀 I have a little question- why ball trees are not that famous, given they work so well and are invariant of the dimensionality of feature space?
They have gotten a little out of fashion, because they are not that suitable for GPUs. On GPUs it is often more effective to perform approximate nearest neighbor search with brute force parallelism.
thanks for shared this lectures. :)
question : why construct sphere instead of hyperplanes like kd-tree but not axes aligned ?
A sphere is typically a tighter bound on the distance. In these data structures you want to bound the distance between the test point and a potential training point, and see if it could possibly be your nearest neighbor. In KD trees you say the distance from the test point to the training point is _at least_ the distance to the hyper-plane. In ball-trees you say, it is at least the distance to the sphere. Typically the latter is larger, i.e. a better approximation and allows you to prune away more points. Hope this helps.
6:20 It's like the bottom up method huh
Thanks !!!
That answer was really psych. Damnnnn!!!
there goes the professor recruiting arthur for a phd :P
Hi. Is ball trees and KD trees the same?
No, KD trees only split along features and essentially place the data into boxes. Ball trees fit the data into little spheres. Another way of thinking about it is, imagine you have a data set and you rotate it. KD-trees will now create a very different data structure, because the axis have changed. Ball-trees will be exactly the same.
In general, KD-trees tend to be better in very low dimensional spaces (2D, 3D) and are often used in computer graphics to find nearest neighbors for meshes. Ball-trees tend to be better in medium sized dimensionalities (e.g. 10D-50D).
I always tried to help, each time i did the cult took advantage now its their problem, and i dont know if i can help anybody everything has changed since last night
Decision Tree starts from: th-cam.com/video/E1_WCdUAtyE/w-d-xo.html