Watched all your videos over the holidays and locked in my first faang offer. Just wanna say thanks for the goated content, and keep up the good work! I genuinely learned a ton from this channel. Congrats on 25k!📈📈
Brief Outline 00:00:51 Reddit Comments 00:01:58 Problem Requirements 00:03:37 Capacity Estimates 00:06:07 Main Consideration 00:07:56 "Source of Truth" Table 00:08:48 Base Table Options - Native Graph Database 00:12:22 Base Table Options - Use a normal database with an index 00:15:28 Relational vs Graph Based Approach for "New" Ordering 00:17:31 Top Comments 00:19:19 Graph DB for Top Comments 00:21:56 In Memory Graph for Top Comments 00:24:55 Controversial Comments 00:27:04 Hot Comments 00:29:41 Synchronous Updates vs Derived Data 00:31:07 Casual Consistency 00:33:58 Change Data Capture in Practice 00:35:54 Change Data Capture 00:38:42 Upvotes DB 00:41:16 Obligatory Caching Slide 00:53:12 Final Diagram - Reddit Comments Thanks, Jordan~
again, a super cool & great video. just thinking, we may put all the comments in the datalake(internally s3) & only use the ids as the form of metadata in the active database etc. comments kind of mostly (considering edits are negligible) a static data , not sure,, if this has any side affects other than may introduce two phase commit or some kind of distributed atomicity while writing.
You're saying keep the comment data in a data lake and just manipulate the metadata? Potentially a distributed join when reading, which I think slows us down quite a bit. We'd have to be smart about how we partition the actual comment data in the data lake.
Hey Jordan, great video. I believe that you did not explain one aspect in the final diagram though. What is this arrow from Spark Streaming to the New Comments/Source of truth DB?
Hi Jordan, thanks for another great video! This one is really interesting and beefy. I have a question about the "hot comments" section. It is clear to me that when there is a new upvote, the CDC triggers the +1 to corresponding nodes in the db. However, how do we know when it is time to expire some upvotes? Keep monitoring the end of each linkedlist is expensive and wasteful, while getting rid of stale upvotes only when new upvotes comes in cause inaccurate stale "hot" comment ranking. Neither of them look good to me.
I really don't think it's expensive to monitor each linked list, it's a single background thread that can be run on any configurable interval of your choosing.
could you share some example docs/links on how to implement the logic in the 37th minute that says - hey consume from a flink operator by the post+comment_id AND consume from CDC of the upvotesDB AND have Spark streaming process a comment and its corresponding score/upvotes only when comment's presence is true? Not having used these in practice makes it a bit hard to come up with OR do like a smallish tutorial on implementing this in a future video so that it feels more real, so to speak, lol Damn tho, your final solution is impressive, definitely what I was imagining as the most optimized way for upvotes by user and upvotes by post, who they came from (which cares about userid), and their aggregated counts (which cares about commentId and postId)
Thanks! Just pseudocoding here but: HashMap def processVote(CommentId, Vote): if vote is upvote: map[commentid] + 1 else: map[commentId] -= 1 if map[commentId] has comment: updateDatabases(commentId, score) -> here we rearrange our graph edges def processComment(commentId): map[commentId][commentPresent] = true if map[commentId][score] != 0: updateDatabases(commentId, score)
Could the CDC into Spark be used to calculate comment scores for each ranking system with each incoming vote? e.g. each comment would include: {hot_score: x, controversial_score: y, ...}. Since there are so few comments per post, instead of a derived view, the client could grab all the comments on a post (maybe with exceptions for extremely popular posts), each with all the scores, and handle the sorting client side.
Sure, this is a fair approach, but it depends how many comments there are. Doing client side sorting does imply doing a lot of repeat work client side, and for posts where we want to paginate the comments this now becomes unfeasible.
Hello Jordan! Thanks for another good video with detailed explanation. I have a question though after watching it: Why don't we use Cassandra here instead of MySQL if we expect a huge load on writes?
I wouldn't really say that I expect a huge load of writes, but moreso that I don't want to two phase commit a bunch of writes to four places. More importantly, reddit comments have parents which implies we need to maintain causal relationships which may be challenging in a leaderless setup.
Not related to this specific video though, Saga & TC/C are not covered in your system design concept series, do you think we need to learn about those for System Design Interviews? I do see several designs mentioned these two concepts when discussing distributed transactions, not sure whether I can just skip if not super important. Curious to hear your thoughts, thank you!
@jordan loving your videos, in lot of your videos you mentioned that have a cdc on database to get the up to date info to the flink while same flink is consuming from Kafka but let’s say if a user a recently did not have any activity on his users db and if the same user tweeted , you said now flink will have available with users friends from cdc and fanout his tweet to all his friends. In that case would flink pull details from the user db directly?
Flink ideally should not have any TTL on that user data, so it should already be there. I suppose what you're describing is more of a caching mechanism, where after a while we drop the data, and then go fetch it from the db, which is also a viable option.
@@jordanhasnolife5163 I wonder if @factsfuture9778 is asking the same thing I was wondering yesterday, which is the need to possibly "pre-populate" Flink's key store from the db you'd be CDC-ing with, since it's likely there will be some users that Flink hasn't heard any events from the db about those users yet and thus doesn't know their state "pre-initialization" I asked ChatGPT about this, and he said to define an initialization step that "using Flink's JDBC connectors, execute[s] a query against the MySQL database and fetch the current [applicable state] of all users" is this common? / does flink have an idiom for this?
Was wondering about the cost of keeping this in memory graph index for all the posts over the years and whether it's worth it to build a hot/cold tier sort of index. Half a billion posts per year * 100 comments per post * 10 years * 8bytes overhead per comment = 4TB of RAM. 4GB RAM in AWS with reserved instances = 200$ per year. So for 4TB, you pay 200K$. 100 comments per post on an average is a lot I feel - you can also reduce this 8 byte overhead to something less by making an assumption on the max # of comments per post - by a factor of 4. On a side note, persisting this index becomes potentially attractive to avoid having to re-stream all of this data in the event of index node failures. Maybe restreaming is not that bad from read replicas of comments and upvote DB.
One of the reasons we use flink is so that we don't have to restream all data and can restore from a snapshot. I see your point and this is an interesting idea to only keep things in flink for x amount of time and otherwise recompute the data on a batch job or something!
For ordering by "Hot comments", if a comment doesn't get any activity in x mins then some periodic job needs to update its score. It needs to do that for each and every comment every x mins. How would that scale?
Our flink node that is receiving upvotes and downvotes and is partitioned by (postId, commentId) is maintaining a linked list locally, and checks the head of the list every once in a while to see if the node is expired
@@jordanhasnolife5163 I don't understand how flink has the memory to store all this data. Its 8MB of data per post, if there are 100ks of posts this can grow pretty quickly. How do you make sure that there is enough memory in flink to store all comment trees across all posts.
around the 22nd minute mark, when you suddenly receive a large number of upvotes on a comment and it needs to be reordered putting it ahead of the formerly most upvoted comment, what sort of complexity are you looking at? oh ok missed the "maintaining a sorted list of edges" statement you make at 21:41
Yep ideally we have to sort those edges. It shouldn't be n log n time because the list is sorted, so just changing the existing sort order should be O(n) like an iteration of insertion sort.
@@jordanhasnolife5163 it could be simply logn if it is a treeset (as opposed to list), so that way you delete old count and insert new count both in log time
@@jordanhasnolife5163 hey jordan, thanks for the reply. I just started studying system design. So topics like real-time data/money aggregation w/ high load/peak, how to reprocess failed payments, and how to avoid hot spots if one charity is popular, other various failure scenarios in an Async design, etc
Also maybe going over little tedious stuff like API designs and DB schema/details for a noob like me? System design interviews seem to ask more in details about these than what I've seen in YT videos :( Apologies for a big ask
@@joonlee3863 Thanks for elaborating! I think that for a problem like this, it probably would be employing a lot of the other design patterns that we come across in other problems that I've covered. For example, real time money aggregation is similar to the real time counting or real time order handling that I might cover in the tinyURL or Amazon retail case
@@joonlee3863 As for this one, totally makes sense, I try to go over these where they are relevant and spent more time on them in my 1.0 series. That being said, I think that once you see the pattern in these, covering them in each video can take a decent amount of time and may also feel a bit redundant.
For this one unfortunately not really, was mostly shit I made up since I couldn't find any great resources for it. I think my design might not even be necessary for reddit's current needs, but hey it's a fun thought experiment.
Hey! Just wanted to use microbatching there, we're just writing a bunch of upvotes and downvotes to a particular comment id, and then potentially reordering the edges of the graph. That could be expensive, so I think batching them up makes for a nice improvement!
Not too many about reddit comment views if you look I see one for materialized path (which I have discussed), one for celko trees (which are dumb) Other than that, I know the tradeoffs of graph databases, the tradeoffs of normalized vs. denormalized data, and also the tradeoffs of how you choose to partition within a stream processing framework.
100% this channel helped me get a FAANG offer last week. Cheers dude! ❤
Congrats man!! You're a legend!
Congrats on 25k!!! You deserve it man
Thanks Phi!!
Watched all your videos over the holidays and locked in my first faang offer. Just wanna say thanks for the goated content, and keep up the good work! I genuinely learned a ton from this channel. Congrats on 25k!📈📈
You're HIM!! Thanks man, I appreciate it!
Brief Outline
00:00:51 Reddit Comments
00:01:58 Problem Requirements
00:03:37 Capacity Estimates
00:06:07 Main Consideration
00:07:56 "Source of Truth" Table
00:08:48 Base Table Options - Native Graph Database
00:12:22 Base Table Options - Use a normal database with an index
00:15:28 Relational vs Graph Based Approach for "New" Ordering
00:17:31 Top Comments
00:19:19 Graph DB for Top Comments
00:21:56 In Memory Graph for Top Comments
00:24:55 Controversial Comments
00:27:04 Hot Comments
00:29:41 Synchronous Updates vs Derived Data
00:31:07 Casual Consistency
00:33:58 Change Data Capture in Practice
00:35:54 Change Data Capture
00:38:42 Upvotes DB
00:41:16 Obligatory Caching Slide
00:53:12 Final Diagram - Reddit Comments
Thanks, Jordan~
again, a super cool & great video. just thinking, we may put all the comments in the datalake(internally s3) & only use the ids as the form of metadata in the active database etc. comments kind of mostly (considering edits are negligible) a static data , not sure,, if this has any side affects other than may introduce two phase commit or some kind of distributed atomicity while writing.
You're saying keep the comment data in a data lake and just manipulate the metadata? Potentially a distributed join when reading, which I think slows us down quite a bit. We'd have to be smart about how we partition the actual comment data in the data lake.
@@jordanhasnolife5163
yes, thanks for your insights on this thought.
It was a strong description for this task. Thank You
Hey Jordan, great video. I believe that you did not explain one aspect in the final diagram though. What is this arrow from Spark Streaming to the New Comments/Source of truth DB?
Sorry, brushed over that one! That's just to change the upvote/downvote count per comment in the source of truth table.
Hi Jordan, thanks for another great video! This one is really interesting and beefy.
I have a question about the "hot comments" section. It is clear to me that when there is a new upvote, the CDC triggers the +1 to corresponding nodes in the db. However, how do we know when it is time to expire some upvotes?
Keep monitoring the end of each linkedlist is expensive and wasteful, while getting rid of stale upvotes only when new upvotes comes in cause inaccurate stale "hot" comment ranking. Neither of them look good to me.
I really don't think it's expensive to monitor each linked list, it's a single background thread that can be run on any configurable interval of your choosing.
could you share some example docs/links on how to implement the logic in the 37th minute that says - hey consume from a flink operator by the post+comment_id AND consume from CDC of the upvotesDB AND have Spark streaming process a comment and its corresponding score/upvotes only when comment's presence is true? Not having used these in practice makes it a bit hard to come up with
OR do like a smallish tutorial on implementing this in a future video so that it feels more real, so to speak, lol
Damn tho, your final solution is impressive, definitely what I was imagining as the most optimized way for upvotes by user and upvotes by post, who they came from (which cares about userid), and their aggregated counts (which cares about commentId and postId)
Thanks!
Just pseudocoding here but:
HashMap
def processVote(CommentId, Vote):
if vote is upvote:
map[commentid] + 1
else:
map[commentId] -= 1
if map[commentId] has comment:
updateDatabases(commentId, score) -> here we rearrange our graph edges
def processComment(commentId):
map[commentId][commentPresent] = true
if map[commentId][score] != 0:
updateDatabases(commentId, score)
Could the CDC into Spark be used to calculate comment scores for each ranking system with each incoming vote? e.g. each comment would include: {hot_score: x, controversial_score: y, ...}. Since there are so few comments per post, instead of a derived view, the client could grab all the comments on a post (maybe with exceptions for extremely popular posts), each with all the scores, and handle the sorting client side.
Sure, this is a fair approach, but it depends how many comments there are. Doing client side sorting does imply doing a lot of repeat work client side, and for posts where we want to paginate the comments this now becomes unfeasible.
Hello Jordan!
Thanks for another good video with detailed explanation. I have a question though after watching it:
Why don't we use Cassandra here instead of MySQL if we expect a huge load on writes?
I wouldn't really say that I expect a huge load of writes, but moreso that I don't want to two phase commit a bunch of writes to four places.
More importantly, reddit comments have parents which implies we need to maintain causal relationships which may be challenging in a leaderless setup.
Not related to this specific video though, Saga & TC/C are not covered in your system design concept series, do you think we need to learn about those for System Design Interviews? I do see several designs mentioned these two concepts when discussing distributed transactions, not sure whether I can just skip if not super important. Curious to hear your thoughts, thank you!
Probably not but to be fair it's a pretty small topic you could probably learn it in like 10 minutes
Next video I'm going to play the Jordan has no life drinking game where you have to take a shot every time he mentions flink
I've gotta get them to sponsor me
flink fetish
@jordan loving your videos, in lot of your videos you mentioned that have a cdc on database to get the up to date info to the flink while same flink is consuming from Kafka but let’s say if a user a recently did not have any activity on his users db and if the same user tweeted , you said now flink will have available with users friends from cdc and fanout his tweet to all his friends. In that case would flink pull details from the user db directly?
Flink ideally should not have any TTL on that user data, so it should already be there. I suppose what you're describing is more of a caching mechanism, where after a while we drop the data, and then go fetch it from the db, which is also a viable option.
@@jordanhasnolife5163 I wonder if @factsfuture9778 is asking the same thing I was wondering yesterday, which is the need to possibly "pre-populate" Flink's key store from the db you'd be CDC-ing with, since it's likely there will be some users that Flink hasn't heard any events from the db about those users yet and thus doesn't know their state "pre-initialization"
I asked ChatGPT about this, and he said to define an initialization step that "using Flink's JDBC connectors, execute[s] a query against the MySQL database and fetch the current [applicable state] of all users"
is this common? / does flink have an idiom for this?
Was wondering about the cost of keeping this in memory graph index for all the posts over the years and whether it's worth it to build a hot/cold tier sort of index.
Half a billion posts per year * 100 comments per post * 10 years * 8bytes overhead per comment = 4TB of RAM. 4GB RAM in AWS with reserved instances = 200$ per year. So for 4TB, you pay 200K$.
100 comments per post on an average is a lot I feel - you can also reduce this 8 byte overhead to something less by making an assumption on the max # of comments per post - by a factor of 4.
On a side note, persisting this index becomes potentially attractive to avoid having to re-stream all of this data in the event of index node failures. Maybe restreaming is not that bad from read replicas of comments and upvote DB.
One of the reasons we use flink is so that we don't have to restream all data and can restore from a snapshot. I see your point and this is an interesting idea to only keep things in flink for x amount of time and otherwise recompute the data on a batch job or something!
For ordering by "Hot comments", if a comment doesn't get any activity in x mins then some periodic job needs to update its score. It needs to do that for each and every comment every x mins. How would that scale?
Our flink node that is receiving upvotes and downvotes and is partitioned by (postId, commentId) is maintaining a linked list locally, and checks the head of the list every once in a while to see if the node is expired
@@jordanhasnolife5163 I don't understand how flink has the memory to store all this data. Its 8MB of data per post, if there are 100ks of posts this can grow pretty quickly. How do you make sure that there is enough memory in flink to store all comment trees across all posts.
Question why not a separate topic and stream-stream join? rather than having two different producers publishing to the same kafka topic.
I'm not sure which part of the design you're talking about can you elaborate?
@@jordanhasnolife5163 I think in the end you do show different kafka topics, but th-cam.com/video/BO2gRisnBcA/w-d-xo.html
Loved it.
around the 22nd minute mark, when you suddenly receive a large number of upvotes on a comment and it needs to be reordered putting it ahead of the formerly most upvoted comment, what sort of complexity are you looking at? oh ok missed the "maintaining a sorted list of edges" statement you make at 21:41
Yep ideally we have to sort those edges. It shouldn't be n log n time because the list is sorted, so just changing the existing sort order should be O(n) like an iteration of insertion sort.
@@jordanhasnolife5163 it could be simply logn if it is a treeset (as opposed to list), so that way you delete old count and insert new count both in log time
why not simply use redis sorted set for top, controversial post ?
Well you could use sorted sets within sorted sets within sorted sets, at which point you've just built out the data structure I described
hey jordan, can you make a video on designing a donation system? e.g. donating to list of charities
Hey Joon, besides a normal server and database here what do you consider to be the hard part of this problem?
@@jordanhasnolife5163 hey jordan, thanks for the reply.
I just started studying system design. So topics like real-time data/money aggregation w/ high load/peak, how to reprocess failed payments, and how to avoid hot spots if one charity is popular, other various failure scenarios in an Async design, etc
Also maybe going over little tedious stuff like API designs and DB schema/details for a noob like me? System design interviews seem to ask more in details about these than what I've seen in YT videos :(
Apologies for a big ask
@@joonlee3863 Thanks for elaborating! I think that for a problem like this, it probably would be employing a lot of the other design patterns that we come across in other problems that I've covered. For example, real time money aggregation is similar to the real time counting or real time order handling that I might cover in the tinyURL or Amazon retail case
@@joonlee3863 As for this one, totally makes sense, I try to go over these where they are relevant and spent more time on them in my 1.0 series. That being said, I think that once you see the pattern in these, covering them in each video can take a decent amount of time and may also feel a bit redundant.
Binge watching all your content. It's content dense and fun to watch at the same time. Thanks for doing this!
This is amazing. A system design video with enough algorithmic depth. Can you share any links where I can dive into this furhter?
For this one unfortunately not really, was mostly shit I made up since I couldn't find any great resources for it.
I think my design might not even be necessary for reddit's current needs, but hey it's a fun thought experiment.
Another great video. Would love an occassional live system design session. Also, congratulations on the 25K!!!
Your videos are invaluable! Helped me get an offer from Microsoft!
Congrats man!!! Enjoy the new role and it's awesome to see the hard work pay off!
Hey Jordan, why use Spark Streaming and not just use flink for the whole setup with maintaining causal consistency on that second kafka topic?
Hey! Just wanted to use microbatching there, we're just writing a bunch of upvotes and downvotes to a particular comment id, and then potentially reordering the edges of the graph. That could be expensive, so I think batching them up makes for a nice improvement!
also in this setup would you keep the linked list in spark streaming instead to update the edge weights on the hot db?
@@Onislayer yep!
Flink can write in microbatch manner using window, but i agree it is not super intuitive.
hope references about the decision made in the video could be attached.
Not too many about reddit comment views if you look
I see one for materialized path (which I have discussed), one for celko trees (which are dumb)
Other than that, I know the tradeoffs of graph databases, the tradeoffs of normalized vs. denormalized data, and also the tradeoffs of how you choose to partition within a stream processing framework.
Can you do a video on API gateways
Will give it some consideration if there's anything non trivial to be done there