Hey man, would be interested to make a video about your learning journey about the technologies, languages, framework and other system design topics? Just wondering how a massive chad sigma 10x programmer journey would look like...
Hello Sir, I’ve got to be honest sort then shuffle is still shuffling my brain. I’m not seeing how sorting is helping here. From what I understand, after sorting, we hash the key to decide its partition and then add the key-value pair. But wouldn’t the merge process be the same even if we skipped the sorting foreplay? Or am I missing some magical sorting powers here? Or Are we merging after the sort itself and then this sorted and merged value are then again merged on the partition?
Think about the time complexity of sorting merged lists versus sorting unmerged lists. In the shuffle phase we're still sending the data in sorted form to each reducer.
Just curious, where did you source your information for this particular video? DDIA doesn't do a super clear job explaining how the different joins work imo, and other online sources I've looked at are equally vague.
I worked pretty much exclusively on big data pipelines for a bit haha. But besides that, most of these lessons from DDIA you can think through yourself. When would it help me to perform a sort merge join? If my data isn't already sorted, what would be the penalty there?
"10k subscriber mark which I should hit and probably I don't know one to two years from now" . Congrats boi 28.7k subs done. Miles to go ahead.
🚀🚀
@@jordanhasnolife5163 30.7k 🚀, 2k in 2 weeks
Hey man, would be interested to make a video about your learning journey about the technologies, languages, framework and other system design topics? Just wondering how a massive chad sigma 10x programmer journey would look like...
Ah well, 10k should be coming up soon... Perhaps then :)
@@jordanhasnolife5163 Are you asking me to buy 300 subscribers? [insert "bombastic side eye dog pic"]
@@recursion. I'm certainly not asking you not to! (*cute anime girl expression*)
@@jordanhasnolife5163 I don't know what you want (300 sub otw🤤)
@@jordanhasnolife516380 less subs 😩😩 (please make my wish come true tho)
Hello Sir, I’ve got to be honest sort then shuffle is still shuffling my brain. I’m not seeing how sorting is helping here. From what I understand, after sorting, we hash the key to decide its partition and then add the key-value pair. But wouldn’t the merge process be the same even if we skipped the sorting foreplay? Or am I missing some magical sorting powers here?
Or Are we merging after the sort itself and then this sorted and merged value are then again merged on the partition?
Think about the time complexity of sorting merged lists versus sorting unmerged lists. In the shuffle phase we're still sending the data in sorted form to each reducer.
What does it mean that the merging can be done entirely on disk? My little mind just can’t seem to comprehend it!
You don't need to load the whole dataset in memory. Just a one entry at a time from each data set that you're merging.
Just curious, where did you source your information for this particular video? DDIA doesn't do a super clear job explaining how the different joins work imo, and other online sources I've looked at are equally vague.
I worked pretty much exclusively on big data pipelines for a bit haha. But besides that, most of these lessons from DDIA you can think through yourself. When would it help me to perform a sort merge join? If my data isn't already sorted, what would be the penalty there?
Solid video my guy. What resources do you use to learn this stuff?
DDIA, random TH-cam videos/websites, my own experience
@@jordanhasnolife5163 care to share any of those YT channels/sites?
so whens the onlyfans starting
The second I get fired (imminent)