Stop messing up your stream processing joins! | Systems Design Interview 0 to 1 with Ex-Google SWE

Jordan has no life

มุมมอง 4 854

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 15 ส.ค. 2023
Remember ladies and gents - when someone presents you with their stream, make sure you keep it in your memory.
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 24

@art4eigen93 9 หลายเดือนก่อน ⁺²
Congratulations for 10K Jordan!
@vankram1552 9 หลายเดือนก่อน ⁺⁴
Congratz with your data intensive applications youtube channel
@jordanhasnolife5163 9 หลายเดือนก่อน ⁺³
Thank you Martin kleppman has no idea
@zachlandes5718 7 หลายเดือนก่อน ⁺¹
Can you review some cases where we’d do table to table joins with streams? Is it mainly to offload work from the db(separation of consumers and db)? Or improve performance?
@jordanhasnolife5163 7 หลายเดือนก่อน ⁺¹
If you want realtime joins that actually update for you, it would be useful.
Think of a books table and an author's table. They're both big, so you don't want to query them both many times over.
Maybe a new book gets added, and it turns out many authors had written to it, some of which were already in the table. Let's do a join.
Maybe a new author gets added, and it turns out they contributed to many books in the table. Now we can only fetch the books we need without having to redo a full join.
@LeoLeo-nx5gi 9 หลายเดือนก่อน ⁺¹
Hi thanks for mentioning the Issues in depth at the end, was about to post a comment asking for how will InMemory in all Consumers work and more.
Just as a side note if this problem was to be done without Flink or any tool as such, is there are any other approach?
@jordanhasnolife5163 9 หลายเดือนก่อน ⁺¹
I think you'd probably end up reinventing the wheel - taking distributed snapshots is really expensive so flink found a great way to do it without hugely impacting performance
@LeoLeo-nx5gi 9 หลายเดือนก่อน
@@jordanhasnolife5163makes sense
@shibhamalik1274 หลายเดือนก่อน ⁺¹
Hi @jordan Awesome video is there a video on CDC deep dive? Is it a queue push after db save ? and if yes then it is similar to 2-phase commit , isnt it?
@jordanhasnolife5163 หลายเดือนก่อน
Similar, however keep in mind that the push to the queue is *from* the db! And we don't need that push to happen for the write to be committed to the database. If the db goes down, or can't communicate to the queue, it's ok if we place those writes there later. That's the main difference.
@indraneelghosh6607 5 หลายเดือนก่อน ⁺¹
Would it make sense to have to query the db as an option for data that is not in memory and there is no CDC event in the queue for that ID? If you have several TBs of data in DB, maintaining such a large number of consumers may be rather costly, right(As RAM is costly)? Is there a more cost-effective solution?
@jordanhasnolife5163 5 หลายเดือนก่อน
You could theoretically store the flink data on disk I believe, but yeah if latency isn't a main concern you could always just query a db
@Rahul-pr1zr 2 หลายเดือนก่อน ⁺¹
So if the in-memory tables are huge you mentioned we can partition the info coming into multiple queues - does this mean we need to maintain multiple consumers with each consuming from a specific queue?
@jordanhasnolife5163 2 หลายเดือนก่อน ⁺¹
That's correct!
@sushantsrivastav 9 หลายเดือนก่อน ⁺³
I noticed that a majority of your designs lean heavily towards kafka + Flink combination. Is this a personal choice, or do you run your designs by Senior Engineers? Don't get me wrong, these designs are not textbook-y, they are real and heavily tend towards what is "in" now (as against Alex xu's designs which seem, for lack of a better word, a bit dated.
I have 17 years of experience in the industry (I am a fossil), but I get to learn many things from your discussions. This is incredibly rare for someone with 2 years of experience. Thanks for everything that you do!
@jordanhasnolife5163 9 หลายเดือนก่อน ⁺⁴
Mainly a personal choice, but I experience the pitfalls every day at work of using non replayable message brokers, so I'm definitely pro log based mq's and stateful consumers whenever I notice the opportunity to use them.
That being said, I understand in practice this may have a high cost to implement due to storage and or latency and many people might compromise and opt for fewer partitions/simpler solutions. My designs are generally pretty idealized and not at all optimized for cost, which is why you don't see stuff like this too much IRL.
@sushantsrivastav 9 หลายเดือนก่อน
@@jordanhasnolife5163 On the contrary, I honestly believe these *are* real world and not textbook-y and cookie-cutter like "Grokking". If someone were to make these systems in 2023, they would choose this tech, as against say 2017-18 when the "system design" questions became mainstream.
@pushpendrasingh1819 9 หลายเดือนก่อน ⁺¹
do you make videos after waking up and smoking one joint?
@jordanhasnolife5163 9 หลายเดือนก่อน ⁺²
No I do them right between when I smoke crack and go to bed
@tamarapensiero8048 9 หลายเดือนก่อน ⁺²
congrats on 10k hottie
@jordanhasnolife5163 9 หลายเดือนก่อน ⁺²
I literally spontaneously combusted
@yrfvnihfcvhjikfjn 9 หลายเดือนก่อน ⁺¹
where do i buy the foot picks?
@jordanhasnolife5163 9 หลายเดือนก่อน
You don't actually buy them, I just give them out in exchange for job referrals
@salmanrizwan9730 9 หลายเดือนก่อน ⁺²
video from jordan finaaaaaaaaaaaaaaaaaaalllllllllyyyyyyyyyy😍😍😍😍😍

ต่อไป

เล่นอัตโนมัติ

Apache Flink - A Must-Have For Your Streams | Systems Design Interview 0 to 1 With Ex-Google SWE