Your work is brilliant. You always include small details like "What is Fanout? Fanout is moving from a single point to different directions.". That level of detail is amazing because you are not taking things for granted (i.e. that everyone knows what a fanout is; I initially always got confused by the concept of fanout actually) it makes it much easier to grasp the material! Thanks!!!
It is almost impractical to cover each and every aspect of all components that make up a system in a single video, unless it is several hours long. However, Naren does a great job in putting the most important information across in a way that is simple and easy to understand. He also makes sure he uses the exact keywords and terms that interested users can later research on for a more holistic view into every design. Thanks a ton!!
There is lot of homework done in any of your video. You just not come up with a any generic solution which could be applied, but actually the solutions which are currently being used in the company (be it twitter/Netflix). Kudos man, great work! :)
@@TechDummiesNarendraL Thanks for your videos. Just wanted to know how can we be sure this is the technical stack used in the companies? Any checks you made with the developers of those company?
Read "Grokking the System Design Interview" could not understand much and then came here, and everything made sense. Thank you Narendra for these great videos.
Great explanations! The only additional things that I learned from other videos (of lesser pedagogical quality): - Regional distribution of cached data (for Reddis) is done thanks to the Writer API (that writes the tweet in the primary region for Reddis + additional regions). - in order to have quick response times between the client and the Write API, the client actually talks to a queue; and the write API picks up messages from that queue.
Small nit pick, you want to reverse the order of your redis keys. +- is less efficient than -, think of redis keys as multi-column indices, if you put the random id in front your you effectively double the binary search time. The way you have it the order of your keys might looks like Current Redis --------------------- 123-user 124-tweet 125-tweet 126-user Should Be ---------------- tweet-124 tweet-125 user-123 user-126 If the ratio of tweets to users is not 1:1 you will increase the user lookup time by a factor of the ratio
When I research the design for a particular product, I watched at least 3 - 4 videos that explain it on TH-cam. And your is always the most complete and relevant. Great job! I think you can consider write a book about system design focusing on interview purpose.
I think for every video you must have read lot of articles , book and compile all the information and delivered in 40 Mins. Keep up the great Job Naren.
Now I understand the importance of reading books. I remember this celebrity example was provided in the book name "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems". Great Work. More power to you,
28:35 Inverted Index search is the reason why twitter limits the length of tweet, as you mentioned earlier that we have enough memory now so limiting length of tweet doesn't make sense. I think computing index is the task which puts limit on tweet length.
I loved the way you keep the videos informative enough for the experienced to stay glued and simple enough for the beginners to understand. Great work Narendrea. Appreciate the way you are converting PAPERS to these great VIDEOS.
I came here after clicking on an TH-cam recommendation video. I am so glad I got this channel through the recommended videos. I referred so many videos and your video was the best explanation I came across for System Design interview especially taking Twitter as an example. Thanks so much for your videos. Hats off !!! great job. Such an amazing explanation. I wish I had teacher like you!!! Thanks a lot.
Just saw the Wpp video from 2 months before this. And the difference is enormous kkkkkk. This video is much better. Good work and thank you for the video
Redis itself has its own zookeeper called "Redis Sentinel" which monitors the redis nodes and if any node is down it will make the another node master.. the selection of the node is based on the priority of the nodes specified with in the redis.conf..
Miles to go before you sleep. Could you please prepare system design and LLD for the following: 1. Simulation of a cricket match, football match etc. 2. Implementation of Queue like Kafka 3. Ecommerce price drop notification system for 50M products 4. Amazon like website and order management system i.e. everything that happens after clicking checkout 5. Elevator system 6. Scrabble 7. Chess game 8. A library for evaluation of expression 9. Stock Trading System 10. Stock Exchange
One quick way I can think of fetching all Tweet related data for the users to display on the user timeline is to shard the user and tweet databases based on geo-location and building indexes on those shards to fetch the user data and tweet data associated with the user to display on the user timeline. Only when we have this I can guess then it makes sense to put this data into a Redis cache, otherwise for the Redis cache to hold this data you would first need to execute a huge Select query on top of the database. Please correct me if I am wrong in my thinking.
Absolutely love your work man. Gives so much more confidence when you mention what these companies are actually using. Highly, highly appreciated, thank you!
Narendra, I'm a big fan of your work. I want you to know that I really appreciate the time and effort that you've put into this. It has helped me a lot to improve my understanding of system design.
One question- if someone follows hundred celebrities , then to build home timeline for that person, hundred different queries are required to fetch all tweets as per your design. This does not look scalable design. Am I missing something?
As mentioned in the video there is a separate list of celebrities maintained that each user is following and while doing the processing for User Timeline that celebrities redis list is checked for new tweets. It might seem like lot of querying but I think this is happening in-memory so it is faster. Also with respect to scalability of this design I would say that the redis cluster that Twitter uses has many nodes so it is not that only one node is doing all the preprocessing for a user timeline.Pls let me know your thoughts.
This is so thorough and specific yet explained in a great way! Shows true mastery of the topic. Thank you very much for your work, really appreciated :D
@@sagaruv, so in DB we have source of truth (stored persistently), but in Redis we have derived ready to serve data, right? Then why do we need persistency in Redis at all? From Redis docs I can see that enabling persistency in Redis reduces its performance dramatically.
What's the exact use-case for WebSocket here? Is it for - 1. Send a response to the query for search timeline results? 2. Send notifications for new tweets? to my understanding, requests are just sent and then all responses are sent using a WebSocket connection
Just a questions, as we are putting a lof odata on Redis and using it cache, can we use Sliding window or LRU for Cache eviction strategy? Otherwise it will also become very slow
How to improve: you could provide a more generic architecture one can use, at least partially, during an interview. Current video is extremely detailed. I'd prefer something more generic with more time dedicated to discussing possible realizations, their pros and cons. This video is great as a studying resource, but not so good if you just want a more generic template you can use to architect a system and use it to discuss possible ways to implement a feature or two.
Hi Narendra, I have just a small suggestion for these system design vidoes. We should make videos a little more conversational. So rather giving exact solution, videos should enable viewers to think about the solution.
Amazing videos! Congrats! One suggestion could be to use a different microphone to reduce echo and noise and make your voice more clear since for me it seems that every word you say is important! Keep the good work!
Hi Naren, how do we store the huge tweet and followers data in SQL DB. please let me know if we can use sharding here, if yes then what would be the most appropriate sharding key.
Very nice video. I have a question. Redis cluster is scalable itself and can take care of data partitioning and scaling well using cluster of nodes. Do we really need zookeeper here?
at 16:41, when you explained the approach about not updating the million followers it makes sense, but the assumption seemed to be that the celebrity tweets will be after the ones which are already in the home timeline, which can very well not be the case. if the celebrity tweets have to be included in between the already created home timeline by the client it will cause latency. It will be great if you can tell me how do we handle total order?
Your videos are very helpful...just one doubt..15:55 user is going to check and get the celeb tweets that is not in-memory, what would be the approach!
I had a couple of doubts. The application is heavily dependent on redis for the timelines. So do we have options to shard redis or something because there will be a lot of data. Second question, at what frequency do we create the timeline in redis. Let's say, we have a timeline in redis already and user follows a new user. How does the old tweets come into the home timeline of the user?
As I know, Redis Cluster does not require Apache ZooKeeper. Redis Cluster has its own built-in mechanisms for node coordination, failure detection, and failover. Please correct me if I am wrong.
Hi, As you told the tweets are stored in Redis in memory database, How we will recover/recreate the Data in case of catastrophic failure.Because it will take a lot of time to build the time line data.
How feed solutions work with the pagination functionality. For how long are we gonna save the tweets in the cache? We have an infinite pagination, so, does that mean we have to save all tweets in the cache all the time?
Sir, I am entirely new to this topic of system design. For someone like me, it is difficult to follow when you "scales horizontally", or "we can use Redis" etc. So, can you suggest a starting point for us ? Where do we start reading from ? Where to begin ? Please help.
This video is not bad. Although I would have started and ended with the overall architecture diagram (starting at 30:40) and broken it down over the course of the video. It is also important to note, that this architecture is nothing more than a Writers, Readers, Sockets interface to the underlying backend data stores. Those data stores are each chosen for either their structural properties e.g., fast k/v Redis store for reads ... or for their intermittent processing abilities as found in Kafka Streams for trends data. Nice work.
You mentioned picking the recent tweets by a a celeb and putting it in the response but how are we going to query for that from Celebrity timeline? How to make sure we are not putting duplicate tweets of the celebrity?
A very basic question, when you say that "twitter writer sends a copy of tweet to search service, or say fan out", do you mean we are doing inter-service blocking API calls or are we doing asynchronous communication(put the tweet in queues each for the purpose of search service, fan out where workers on the other end serve the search , fan out respectively). In my opinion, it should be asynchronous communication as without it, tweet writer remains blocked until the cascaded operations are complete. Do you see any other fallacies in the asynchronous communication?
[Question] for the home timeline section. basically it's possible for a user to follow lots of celebrity users right? so when fetching the hometimeline data from celebrities, do we limit the number of celebrities? lets say this user follows hundreds of celebrities. thx
At 8:05, you mentioned that we save the entry for each user -> tweet_ids and tweet_id -> tweet_content in REDIS. Doesn't that effectively mean that we are having all the content that has ever been produced in twitter inside REDIS?
Most of the data get queried from REDIS but there must be limit on amount of data cache system can handle. Although Narendra didn't talk about flushing of cache once data reaches certain limits but there have to be a background job which should clear the cache (old twitts, old trends, follower data etc).
@@MegaKorth I agree to the fact that most of the data gets (and it should be) queried from REDIS. However, seeing one's own timeline doesn't sound like a very frequent scenario and thus could be fetched from the DB instead of keeping everyone's User timeline in the cache itself.
@@karttikmishra4291 Keeping users timeline data in cache is a + point, as we don't want user to feel slow loading of his own twitts, But yes we can keep limit number of twitts (per user) we need to keep in cache (like 3-5 page size of twitts and then if user wants to load more then we can fetch from DB)
Is usertimeline also in redis cache? Does that mean that in redis we are maintaining two timelines for every user, one is usertime and other is homeline? Thanks
Hey Narendra, It's really a great explanation and it couldn't be more better. Also this makes perfect sense to me. The problem which I'm facing right now is with microservice architecture. I mean if the same has to be designed in a proper microservice architecture, how we are going to do that? Like in this case the same Redis cluster is shared across different services, which means tight coupling. I would be really thankful if you make a system design video of any system by following microservice architecture. I mean each module will have a context bound and data ownership. And if you already have any video available for the same, please point me to that. Thank you, keep up the great work!
Great Video. The only part missing is storage estimation, at least for cache..since we are using it extensively in this design. Can you provide some data on storage please?
How do you determine the threshold for a user to be considered a "Celebrity"? 10K+ followers? 100K+? 1M+? At what point do those users decide to use Fanout versus cache lookup of the Celebrity's user timeline? At 9:16 you are saying this is DB heavy and won't work. But you also say that this work's for Celebrity Users who store those in a cache... I'm confused what is the difference?
Thanks for the excellent explanation, but I have one doubt, why would you want to go through the Timeline Service get to the Search Service? Why not go to the Search Service directly?
I understand that querying followers etc to create the Home time-line is expensive, and that REDIS can be the solution. However, how does the initial data enter the redis cache in the first place? You have to first do the slow query for everyone?
Are those user timeline caches supposed to accumulate (even after a user sees all the posts in that timeline cache)? I'd imagine so since we don't want the user querying the DB, but then does that mean we'd set a cache limit and if the user wants to access posts beyond what's in the cache, we'd query the DB?
Hi Narendra, How would you design schema for followers of a celebrity? Assuming there will be lot of celebs who have followers in millions and followers list could be very huge. How will you store followers in such a way that follow and unfollow ops are performed quickly? Thanks.
Hi Narendra, great video! I have a question about storing the data in Redis: Lets say user X has the hometimeline stored in the Redis database under 'user_x_hometimeline'. What kind of data is stored here? Only an array of all tweet ids, or also the metadata and the tweet itself? Or is the tweet content only stored in the main DB? And a rethorical question: how would you change the system design when you could change the content of a tweet? Facebook/LinkedIn offer the feature to change the content of a post
16:05 onwards ... What if a user follows 100 thousand celebrities, will the service go to all 100k celebrities which he follows one by one to check if there is a recent post , I don't think so. what should be a better alternative in this scenario
One thing I dont understand. If redis stores list of ids as values, does this mean the IDs still need to go back to the DB to get populated? For example, if user_tweets=[1,2,3] when does 1,2,3 get transformed back into tweets? Shouldn't we store the content itself in the cache instead of a DB reference?
Wonderful explanation, keep it up. Had a question RE Celebrity Tweet - When a celebrity tweet, you mentioned that it will be a dynamic query from User to Celebrity User Home to fetch the latest tweet, what happens if a million users sending this query at the same time? Would the design work seamlessly as you mentioned? Could it be a good design to use a combination of Fan-out/in based on the Users active state (from latest to say a day latest, to weeks latest)
In the given context, the problem of million users sending this query at the same time for the same query variables would be referred to as thundering herd. It is quite a common problem and can be easily taken care of at the service level, by sending only one request to the DB, while the other threads wait or serve stale data(if required, depending on the problem)
Great work . Few questions for you, Are these asked in amazon interviews? It would be helpful if we can get to know how you select these questions and have you encountered such in any company interviews ?
@nikash Kumar I am collecting questions asked by famous companies through few of my contacts/friends, and also few of my channel viewers have also started to send requests to make videos whenever they encounter system design questions in there interviews. Never the less understanding system design helps you to crack any interviews and also helps you to design better softwares at work.
I have one question, if user is following thousands of celebrities in that case we will have to go to each celebrities user timeline and check for recent tweets. Is that operation going to be costly cause even if it's in memory we have to do lot of processing. How can we tackle this situation if it's time consuming?
Your work is brilliant. You always include small details like "What is Fanout? Fanout is moving from a single point to different directions.". That level of detail is amazing because you are not taking things for granted (i.e. that everyone knows what a fanout is; I initially always got confused by the concept of fanout actually) it makes it much easier to grasp the material! Thanks!!!
It is almost impractical to cover each and every aspect of all components that make up a system in a single video, unless it is several hours long. However, Naren does a great job in putting the most important information across in a way that is simple and easy to understand. He also makes sure he uses the exact keywords and terms that interested users can later research on for a more holistic view into every design. Thanks a ton!!
There is lot of homework done in any of your video. You just not come up with a any generic solution which could be applied, but actually the solutions which are currently being used in the company (be it twitter/Netflix). Kudos man, great work! :)
Thanks @Atul :)
@@TechDummiesNarendraL Thanks for your videos. Just wanted to know how can we be sure this is the technical stack used in the companies? Any checks you made with the developers of those company?
There is a lot of work done! You are amazing sir
Aaayee captain! Nice to see you here, your content is awesome as well. Fun fact, I listen to your podcast on Spotify daily while evening walks :D
the best video on twitter system design I've ever seen before in my life
Read "Grokking the System Design Interview" could not understand much and then came here, and everything made sense. Thank you Narendra for these great videos.
I have seen more than 5 videos and read many more articles. But yours one is the best and well detailed. Thank You.
Great explanations! The only additional things that I learned from other videos (of lesser pedagogical quality):
- Regional distribution of cached data (for Reddis) is done thanks to the Writer API (that writes the tweet in the primary region for Reddis + additional regions).
- in order to have quick response times between the client and the Write API, the client actually talks to a queue; and the write API picks up messages from that queue.
links to those videos?
Small nit pick, you want to reverse the order of your redis keys. +- is less efficient than -, think of redis keys as multi-column indices, if you put the random id in front your you effectively double the binary search time. The way you have it the order of your keys might looks like
Current Redis
---------------------
123-user
124-tweet
125-tweet
126-user
Should Be
----------------
tweet-124
tweet-125
user-123
user-126
If the ratio of tweets to users is not 1:1 you will increase the user lookup time by a factor of the ratio
In Redis is it hash-based lookup from K-V pair or does it use the binary search to find the cache names?
Bikas Katwal that’s a good point. Redis runs in memory, any kv store that uses disk behind the scenes is usually using b-trees
I am preparing for an interview of Amazon, and your video taught me a lot. Thanks and keep it coming!
Shen Chen please help me too.. Am. Also preparing for same
Rekha Mor good luck!
When I research the design for a particular product, I watched at least 3 - 4 videos that explain it on TH-cam. And your is always the most complete and relevant. Great job! I think you can consider write a book about system design focusing on interview purpose.
I think for every video you must have read lot of articles , book and compile all the information and delivered in 40 Mins. Keep up the great Job Naren.
Now I understand the importance of reading books. I remember this celebrity example was provided in the book name "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems". Great Work. More power to you,
28:35 Inverted Index search is the reason why twitter limits the length of tweet, as you mentioned earlier that we have enough memory now so limiting length of tweet doesn't make sense. I think computing index is the task which puts limit on tweet length.
I loved the way you keep the videos informative enough for the experienced to stay glued and simple enough for the beginners to understand. Great work Narendrea. Appreciate the way you are converting PAPERS to these great VIDEOS.
I agree with the comment from Atul.K.Yadav. Great Job Narendra. You are exceptional in your explanation.
Thank you for putting up this tutorial! Study vidoes like this and then practice at Meetapro with mock interviews will help you land multiple offers.
I came here after clicking on an TH-cam recommendation video. I am so glad I got this channel through the recommended videos. I referred so many videos and your video was the best explanation I came across for System Design interview especially taking Twitter as an example. Thanks so much for your videos. Hats off !!! great job. Such an amazing explanation. I wish I had teacher like you!!! Thanks a lot.
Just saw the Wpp video from 2 months before this. And the difference is enormous kkkkkk.
This video is much better. Good work and thank you for the video
Brilliant. Well done managing the details+clarity in 35 mins!
Redis itself has its own zookeeper called "Redis Sentinel" which monitors the redis nodes and if any node is down it will make the another node master.. the selection of the node is based on the priority of the nodes specified with in the redis.conf..
Outstanding System Design video. I think this Twitter Design is more realistic than what other videos on the same topic show. Great work!
very well explained, I could get every bit of the system and much better than many other videos on youtube.
I will call you SIR. You are an amazing teacher. Best explanation.
Miles to go before you sleep.
Could you please prepare system design and LLD for the following:
1. Simulation of a cricket match, football match etc.
2. Implementation of Queue like Kafka
3. Ecommerce price drop notification system for 50M products
4. Amazon like website and order management system i.e. everything that happens after clicking checkout
5. Elevator system
6. Scrabble
7. Chess game
8. A library for evaluation of expression
9. Stock Trading System
10. Stock Exchange
On going through your lectures able to get horizontal approach on many new concepts . very good work. Thank you
Good job Naren. This design video is very well done
I nailed Amazon system design interview question by watching your videos. Just wanted to say thank you and thank you.
He is doing fantastic job. Ton to learn from the videos.
Thank you also for mentioning the concepts involved (eventual consistency, gather & scatter, fan out, ifti)
One quick way I can think of fetching all Tweet related data for the users to display on the user timeline is to shard the user and tweet databases based on geo-location and building indexes on those shards to fetch the user data and tweet data associated with the user to display on the user timeline. Only when we have this I can guess then it makes sense to put this data into a Redis cache, otherwise for the Redis cache to hold this data you would first need to execute a huge Select query on top of the database. Please correct me if I am wrong in my thinking.
The best video i have ever watched on twitter system design
Absolutely love your work man. Gives so much more confidence when you mention what these companies are actually using. Highly, highly appreciated, thank you!
Hey man! Great work! Well explained! Love to see more coming from you!
At 4:23, you said Redis is persistent. I think it should be said as non-persistent.
There is lot of preparation that you do to come up with these videos. Excellent work!
Thank you for explaining the overall system design in a such a clear, very easy to understand manner!!
Bravo. This is one of the better system design videos.
very good compilation Naren, really like your videos as they are detailed and good quality :)
Narendra, I'm a big fan of your work. I want you to know that I really appreciate the time and effort that you've put into this. It has helped me a lot to improve my understanding of system design.
Your videos are jammed with so much useful knowledge... It's just wonderful... Big thanks
One question- if someone follows hundred celebrities , then to build home timeline for that person, hundred different queries are required to fetch all tweets as per your design. This does not look scalable design. Am I missing something?
As mentioned in the video there is a separate list of celebrities maintained that each user is following and while doing the processing for User Timeline that celebrities redis list is checked for new tweets. It might seem like lot of querying but I think this is happening in-memory so it is faster. Also with respect to scalability of this design I would say that the redis cluster that Twitter uses has many nodes so it is not that only one node is doing all the preprocessing for a user timeline.Pls let me know your thoughts.
This is a really detailed walk-through of newsfeed system design. Thanks!
At 3:50 in the system diagram, there are no arrows coming from the Redis store?
This is so thorough and specific yet explained in a great way! Shows true mastery of the topic. Thank you very much for your work, really appreciated :D
Thanks for the detailed explanation.
I have a doubt. Redis already has Data persistency. Then why do we need to store those details in DB?
I have the same question.
You use DB to store data in a normalized form. In redis you are storing in a ready to serve form.
@@sagaruv, so in DB we have source of truth (stored persistently), but in Redis we have derived ready to serve data, right? Then why do we need persistency in Redis at all? From Redis docs I can see that enabling persistency in Redis reduces its performance dramatically.
What's the exact use-case for WebSocket here?
Is it for -
1. Send a response to the query for search timeline results?
2. Send notifications for new tweets?
to my understanding, requests are just sent and then all responses are sent using a WebSocket connection
Just a questions, as we are putting a lof odata on Redis and using it cache, can we use Sliding window or LRU for Cache eviction strategy? Otherwise it will also become very slow
How to improve: you could provide a more generic architecture one can use, at least partially, during an interview. Current video is extremely detailed. I'd prefer something more generic with more time dedicated to discussing possible realizations, their pros and cons.
This video is great as a studying resource, but not so good if you just want a more generic template you can use to architect a system and use it to discuss possible ways to implement a feature or two.
Hi Narendra,
I have just a small suggestion for these system design vidoes. We should make videos a little more conversational. So rather giving exact solution, videos should enable viewers to think about the solution.
Told all my friends. You are doing a great job.
Amazing videos! Congrats! One suggestion could be to use a different microphone to reduce echo and noise and make your voice more clear since for me it seems that every word you say is important! Keep the good work!
Hi Naren, how do we store the huge tweet and followers data in SQL DB. please let me know if we can use sharding here, if yes then what would be the most appropriate sharding key.
I am still at 7:45 but have to say you are very good technically ! Thanks a ton !
Very nice video.
I have a question. Redis cluster is scalable itself and can take care of data partitioning and scaling well using cluster of nodes. Do we really need zookeeper here?
at 16:41, when you explained the approach about not updating the million followers it makes sense, but the assumption seemed to be that the celebrity tweets will be after the ones which are already in the home timeline, which can very well not be the case. if the celebrity tweets have to be included in between the already created home timeline by the client it will cause latency. It will be great if you can tell me how do we handle total order?
Your videos are very helpful...just one doubt..15:55 user is going to check and get the celeb tweets that is not in-memory, what would be the approach!
I am short of words .. Awesome stuff man !!
I had a couple of doubts.
The application is heavily dependent on redis for the timelines. So do we have options to shard redis or something because there will be a lot of data.
Second question, at what frequency do we create the timeline in redis. Let's say, we have a timeline in redis already and user follows a new user. How does the old tweets come into the home timeline of the user?
I too thought about adding new follower scenario. Can anyone experienced answer this question please.
As I know, Redis Cluster does not require Apache ZooKeeper. Redis Cluster has its own built-in mechanisms for node coordination, failure detection, and failover. Please correct me if I am wrong.
I was truly amazed when i discovered your channel . Keep the good work mate!
Hi, As you told the tweets are stored in Redis in memory database, How we will recover/recreate the Data in case of catastrophic failure.Because it will take a lot of time to build the time line data.
You can insert into DB and cache
you are the gem master and your videos are gems
How feed solutions work with the pagination functionality. For how long are we gonna save the tweets in the cache? We have an infinite pagination, so, does that mean we have to save all tweets in the cache all the time?
Sir, I am entirely new to this topic of system design. For someone like me, it is difficult to follow when you "scales horizontally", or "we can use Redis" etc. So, can you suggest a starting point for us ? Where do we start reading from ? Where to begin ? Please help.
Quick question. What does twitter do for all the images/videos people are uploading? Where does that go in the workflow?
This video is not bad. Although I would have started and ended with the overall architecture diagram (starting at 30:40) and broken it down over the course of the video.
It is also important to note, that this architecture is nothing more than a Writers, Readers, Sockets interface to the underlying backend data stores.
Those data stores are each chosen for either their structural properties e.g., fast k/v Redis store for reads ... or for their intermittent processing abilities as found in Kafka Streams for trends data.
Nice work.
You mentioned picking the recent tweets by a a celeb and putting it in the response but how are we going to query for that from Celebrity timeline? How to make sure we are not putting duplicate tweets of the celebrity?
I am gonna wish you teachers day forever . Thanks for awesome tutorial :)
Best content so far I have come across in days. Really good!!
A very basic question, when you say that "twitter writer sends a copy of tweet to search service, or say fan out", do you mean we are doing inter-service blocking API calls or are we doing asynchronous communication(put the tweet in queues each for the purpose of search service, fan out where workers on the other end serve the search , fan out respectively). In my opinion, it should be asynchronous communication as without it, tweet writer remains blocked until the cascaded operations are complete. Do you see any other fallacies in the asynchronous communication?
[Question]
for the home timeline section. basically it's possible for a user to follow lots of celebrity users right? so when fetching the hometimeline data from celebrities, do we limit the number of celebrities? lets say this user follows hundreds of celebrities. thx
At 8:05, you mentioned that we save the entry for each user -> tweet_ids and tweet_id -> tweet_content in REDIS.
Doesn't that effectively mean that we are having all the content that has ever been produced in twitter inside REDIS?
Most of the data get queried from REDIS but there must be limit on amount of data cache system can handle. Although Narendra didn't talk about flushing of cache once data reaches certain limits but there have to be a background job which should clear the cache (old twitts, old trends, follower data etc).
@@MegaKorth I agree to the fact that most of the data gets (and it should be) queried from REDIS.
However, seeing one's own timeline doesn't sound like a very frequent scenario and thus could be fetched from the DB instead of keeping everyone's User timeline in the cache itself.
@@karttikmishra4291 Keeping users timeline data in cache is a + point, as we don't want user to feel slow loading of his own twitts, But yes we can keep limit number of twitts (per user) we need to keep in cache (like 3-5 page size of twitts and then if user wants to load more then we can fetch from DB)
Is usertimeline also in redis cache? Does that mean that in redis we are maintaining two timelines for every user, one is usertime and other is homeline? Thanks
Hey Narendra, It's really a great explanation and it couldn't be more better. Also this makes perfect sense to me. The problem which I'm facing right now is with microservice architecture. I mean if the same has to be designed in a proper microservice architecture, how we are going to do that?
Like in this case the same Redis cluster is shared across different services, which means tight coupling. I would be really thankful if you make a system design video of any system by following microservice architecture. I mean each module will have a context bound and data ownership.
And if you already have any video available for the same, please point me to that.
Thank you, keep up the great work!
Great Video. The only part missing is storage estimation, at least for cache..since we are using it extensively in this design. Can you provide some data on storage please?
Thank u so much for such a detailed explanation of each important topic
awesome video man ... ! this is relevant to any social media platform design like FB , Linkedin , Instagram.
How do you determine the threshold for a user to be considered a "Celebrity"? 10K+ followers? 100K+? 1M+?
At what point do those users decide to use Fanout versus cache lookup of the Celebrity's user timeline?
At 9:16 you are saying this is DB heavy and won't work. But you also say that this work's for Celebrity Users who store those in a cache... I'm confused what is the difference?
Thanks for the excellent explanation, but I have one doubt, why would you want to go through the Timeline Service get to the Search Service? Why not go to the Search Service directly?
Great clarity in the explanation.. demystifies so nicely
I understand that querying followers etc to create the Home time-line is expensive, and that REDIS can be the solution. However, how does the initial data enter the redis cache in the first place? You have to first do the slow query for everyone?
Are those user timeline caches supposed to accumulate (even after a user sees all the posts in that timeline cache)? I'd imagine so since we don't want the user querying the DB, but then does that mean we'd set a cache limit and if the user wants to access posts beyond what's in the cache, we'd query the DB?
Crystal clear system design. Thanks for your time and effort 👍
Should the HTTP Push websocket not be lying between the Load Balancer and the Timeline services?
it is present there actually.
fantastic work bro. You have scaled to a new level :)
Hi Narendra, How would you design schema for followers of a celebrity? Assuming there will be lot of celebs who have followers in millions and followers list could be very huge. How will you store followers in such a way that follow and unfollow ops are performed quickly? Thanks.
Hi Narendra, great video! I have a question about storing the data in Redis:
Lets say user X has the hometimeline stored in the Redis database under 'user_x_hometimeline'. What kind of data is stored here? Only an array of all tweet ids, or also the metadata and the tweet itself? Or is the tweet content only stored in the main DB?
And a rethorical question: how would you change the system design when you could change the content of a tweet? Facebook/LinkedIn offer the feature to change the content of a post
16:05 onwards ... What if a user follows 100 thousand celebrities, will the service go to all 100k celebrities which he follows one by one to check if there is a recent post , I don't think so. what should be a better alternative in this scenario
One thing I dont understand. If redis stores list of ids as values, does this mean the IDs still need to go back to the DB to get populated? For example, if user_tweets=[1,2,3] when does 1,2,3 get transformed back into tweets? Shouldn't we store the content itself in the cache instead of a DB reference?
Best Twitter system design video I've seen so far.
Really like all of your videos. Thank you for doing the hard work and sharing your knowledge with the community.
Fantastic. I have one doubt:
1. Can we take db as no sql db like dynamodb?Or it has to be mysql db?
Hello Sir,
For Searching tweets, I think Elastic Search which uses Inverted Index can be used.
Wonderful explanation, keep it up. Had a question RE Celebrity Tweet - When a celebrity tweet, you mentioned that it will be a dynamic query from User to Celebrity User Home to fetch the latest tweet, what happens if a million users sending this query at the same time? Would the design work seamlessly as you mentioned? Could it be a good design to use a combination of Fan-out/in based on the Users active state (from latest to say a day latest, to weeks latest)
In the given context, the problem of million users sending this query at the same time for the same query variables would be referred to as thundering herd. It is quite a common problem and can be easily taken care of at the service level, by sending only one request to the DB, while the other threads wait or serve stale data(if required, depending on the problem)
Great work . Few questions for you, Are these asked in amazon interviews? It would be helpful if we can get to know how you select these questions and have you encountered such in any company interviews ?
@nikash Kumar I am collecting questions asked by famous companies through few of my contacts/friends, and also few of my channel viewers have also started to send requests to make videos whenever they encounter system design questions in there interviews.
Never the less understanding system design helps you to crack any interviews and also helps you to design better softwares at work.
Maybe in the future blockchain could help to make Twitter less centralized and more democratic. Thanks for the helpful video!
Good video, most comprehensive. Kindly add subsection/bookmarks.
What is the use of persistent http websocket connection in Twitter?
I have one question, if user is following thousands of celebrities in that case we will have to go to each celebrities user timeline and check for recent tweets. Is that operation going to be costly cause even if it's in memory we have to do lot of processing. How can we tackle this situation if it's time consuming?