excellent video, thanks .. I have a question if the sorting takes O(n log n), then how many I/Os are needed for the sort-merge join? how does buffering help in improving performance?
Thanks! O(n log n) is due tothe assumption that (main-memory) sorting has that complexity, the actual merge is only O(n). So for the relational sort-merge join shown in this video the runtime complexity mostly depends on the complexity of the sorting method used. However, the overall complexity changes if you use a different sorting algorithm which has a runtime complexity different from O(n log n). In addition, the complexity of sort-merge join may change as well if this algorithm is generalized to use more complex "sweep lines" (maybe I do a video on this one day). O-Notation has additional problems for performance measurements in data management which I explain in this video: watch?v=9M2ErfVimJo
Hi Jens, Thanks you very much for your lectures. I have bought the electronic version of your 'Patterns in Data Management' book. Is it possible to access Labs, mentioned both in the Book and youtube lectures? For me, this is the only thing that I am lacking in your wonderful course. Thanks.
I think there is a small bug in the pseudo code at 10:00 . If PR.x == PS.x, we output. Then, the PX.x PS.x AND PS.x == (PS + 1).x ? Or in other words: in PS we would see the same value in the next row, but in PR the next row has a larger value.Then we should have incremented PS, but not PR. My proprosed fix can be found at pastebin.com/BnHmzt0P . The idea is to have a 1-lookahead into either of the columns.
+Immanuel Haffner Yes, with a lookahead you could fix this problem as well. I discuss this problem of this sort-merge implementation in 10:25. For a normal equi-join along foreign key constraints, this issue simply shouldn't happen as long as you alyways move the foreign key column (the one having duplicates) first.
I mainly looked at the logical part before 6:40. Very clear and useful. Thanks for sharing.
thanks!
excellent video, thanks .. I have a question if the sorting takes O(n log n), then how many I/Os are needed for the sort-merge join? how does buffering help in improving performance?
this is a good question. did you figure out the answer?
Excellent video! Would be great if you talked more about the time complexity of the merge sort join O(nLogn), particularly why its O(nLogn)
Thanks! O(n log n) is due tothe assumption that (main-memory) sorting has that complexity, the actual merge is only O(n). So for the relational sort-merge join shown in this video the runtime complexity mostly depends on the complexity of the sorting method used. However, the overall complexity changes if you use a different sorting algorithm which has a runtime complexity different from O(n log n). In addition, the complexity of sort-merge join may change as well if this algorithm is generalized to use more complex "sweep lines" (maybe I do a video on this one day). O-Notation has additional problems for performance measurements in data management which I explain in this video: watch?v=9M2ErfVimJo
+Jens Dittrich Is the complexity with CoGroups (Having no keys) also O(n*log(n)) or will the grouping itself increase the complexity?
Excellent Video.. very useful
Hi Jens,
Thanks you very much for your lectures. I have bought the electronic version of your 'Patterns in Data Management' book. Is it possible to access Labs, mentioned both in the Book and youtube lectures?
For me, this is the only thing that I am lacking in your wonderful course.
Thanks.
I think there is a small bug in the pseudo code at 10:00 . If PR.x == PS.x, we output. Then, the PX.x PS.x AND PS.x == (PS + 1).x ? Or in other words: in PS we would see the same value in the next row, but in PR the next row has a larger value.Then we should have incremented PS, but not PR.
My proprosed fix can be found at pastebin.com/BnHmzt0P . The idea is to have a 1-lookahead into either of the columns.
+Immanuel Haffner Yes, with a lookahead you could fix this problem as well. I discuss this problem of this sort-merge implementation in 10:25. For a normal equi-join along foreign key constraints, this issue simply shouldn't happen as long as you alyways move the foreign key column (the one having duplicates) first.
+Jens Dittrich that makes sense. thank you.
What if cityId is not an number value? How to determine (PR.x
a bit late but for the 'key' column to be sorted, it must be a comparable value. The details of the comparison is dependent on the data type.
Best explanation ever. Thank you
Fehlt da nicht ein PR++ unter dem output?
Sprich müsste es nicht so sein:
if PR.x == PS.x:
output ((PR, PS));
PR++;
nee, da steht kein "else": das if in der nächsten Zeile "if PR.x
@@jensdit Ah okay, danke :)
Thank you🌹😊
Great video helped me a lot, the accent and the hand writing was a bit rough on me since English is not my native language .
awesome lessons.
THANK YOU SO MUCH. It was truly helpful.
Very good explanation!!! Thank you!
PERFECT ANIMATION!!!
Obrigado, este video foi muito instrutivo para mim, vai me ajudar bastante em meu projeto interdisciplinar.
Quality content
excellent video..
+Madhu Kashyap thanks!
eyvallah hacı