Really interesting series but I would like to see the interviewer engage more or even challenge the interviewee, most of the time you just agree with the decisions/design, but overall great job.
@@IGotAnOffer-Engineering Amazing interview but I agree with @SmartCoder89. Asking questions on why decided to use certain technology, as I have been asked in interviews, and such
The fastest way to learn is to watch someone do it. Seeing this senior Engineer go through system design from scratch is really inspiring. thanks for sharing.
I recently had a twitter system design interview, and i’m sorry to say that this is not a good mock. In a real interview, the interviewer ask the candidate challenging questions, interrupt and steering you to a direction that relevant for how the specific company see the role. This leaves you much less time to cover some of the items that shown in the interview, your time management need to be much more precise. In addition, a good portion of the interviews for staff level are with 2 interviewers, which add more to the challenge.
Often times it will be shorter than this, for me it was 30 minutes. just clarifying the assignment was already 10 minutes gone. so you don't need to go into detail about everything. the point of this is not to mirror a real time interview, I don't think, but for us to be able to extract information. If you watch 10 of these videos the chance that you will be able to answer a question about something in a way that it looks like you know what you're talking about will increase greatly. And let's be honest, in the job you won't be designing the system anyway so that is the main point.
I worked at Twitter and this design is hot garbage. Like he doesn't even talk about micro service architectures, service discovery and just glosses over components. What are the responsibilities of the "twitter processor" for example? He just plops down components and makes them magically do-alls. He goes into data flow and completely looses context. He also makes it appear that the entire thing stays in memory implicitly.
@@drunken_moose interesting. i have never done a system design interview before, and having one coming up next week. good to hear this might've been hot garbage and i so that i don't spend time one something not v useful. agree this does seem a lot like just plopping down components
IMHO this is lacking important things: - A message bus, and something in front of it (what I would use, in the real world you would want various services and a service oriented architecture is more scalable and adaptable, potentially have multiple sources of entry, ie Twitter used to and may still allow SMS created tweets, and then the can of worms that involves bc message bus and bc various services -- he uses a message bus but just for media). - websockets (and pub/sub for that matter). - I do not understand the choice of Cassandra. I would think a time series database would work better for fetching tweets if you want them ordered by time. If it was a service oriented architecture, you could even do things like have a service for determining a timeline for a given user, or groups of users. - I didn't catch any mention of a CDN. - This might be extreme, but I would think rate limiting and security as well. - Batching updates is something I would think to do, and at least mention minimizing deltas would be interesting. - Also with such massive load I would think Protobufs as a protocol work really well and would be worth mentioning. Storing keys is expensive at scale. Still a great interview. Just had some notes, and feel a little concerned I don't see websockets anywhere in the video or comments.
Thanks for sharing! But today i had System Design interview, where i was asked to design twitter, and it was much more difficult) Because: 1. Interviewer were interrupting me and asking lots of question (for example, why use cassandra, what type of replication should we use, etc) 2. The scheme must follow any kind of notation, C4 for example. 3. I had to count the numbers (you know, RPS, capacity etc.) and implement it into the scheme (how much CPU, RAM, HDD/SSD should be, how not to kill Databases or Queue managers with too many connections)
Great content. i wish the interviewer will challenge the interviewee about why he made the decision he made. i think the explanation of why he chose to use every component is very important. Thanks alot for your content :)
This is not bad but barely passable for L5 and certainly not an L6+ material. There are a lot of holes. Redis Pub/Sub for instance is very fragile part of the design. Also it would be very hard to get the people that a user follows quickly. There were bunch of hand wavy stuff, if we're partitioning by Tweet ID, why does it matter that the Tweet ID is ordered? If we're partitioning by the user and then by the tweet ID then each tweet will still go to a different server. What's the purpose of it? I mean there are some big holes. We did the capacity planning and what purpose did it serve? What did it help with? Just waste of time? The more I think about it, L5 hire is hard actually maybe L4.
@IGotAnOffer at 38.12 where it is proposed that every tweet will be posted into kafka now coming back to calculations velocity in q1 would be 6000 messages / sec as that is the number of tweets produced / sec second now since you are fetching the followers in the consumer which is 200 / user . so roughly there is 6000 qps on the user follower database which fetches all the followers now this consumer is publishing 200 messages one for each follower into the second queue , which will be 6000*200= 1.2 million messages / sec also all other services like in this case the redis would receive as many writes , this is a important issue , if we consider bandwidth also the secondary approach could be to batch these .
@crushingtecheducation - can you explain kafka work here - especially for timeline ? If it is going to get all the tweets in Kafka, what would the processor (consumer) do with all the tweets? are you going to have consumers for each user or active users in Kafka and prepare a data set and store it in cache? is that the purpose of kafka + timeline processor ?
@@crushingtecheducation Thanks for this video. I have couple of questions: 1) for tweets, why are you going with no sql DB? Tweet id, user id, tweet content (text) - can all be stored in relation DB, right? Media can be stored in nosql or blog DB, with reference in relational DB, probably as part of content itself. I thought that is how you described in the initial field design. 2) Why cassandra DB and wide rows - what does that mean ? tweets have a fixed "text" length -> translating to a fixed set of data size, right?
You should really challenge the interviewed person more, because without it the whole interview feels absolutely unnatural and you look like you have no idea what you are talking about, however I'm sure that you are proficient at system design. You just have to show it
the candidate is a bit weak I feel like you should grill him a bit to see if he knows what he's talking about or just regurgitating from a sys design book
2 inserts allow us to better scale the database since we can have 2 independent key-value tables/databases. Once to follower=> followee and the other one for followee=>follower. If we don't do it, we have to use an index on user_id (follower or followee) which is not optimal for billions of records.
I think this presentation is more for senior engineer interviews. For new graduate system design interviews I think this design is indeed an overkill lol
Get 1-on-1 coaching to ace your system design interview: igotanoffer.com/en/interview-coaching/type/system-design-interview?TH-cam&
Thanks Tom for having me on this mock interview.
@@crushingtecheducation Thanks was good to watch it. Also was wandering why @crushingtecheducation has no content?
Really interesting series but I would like to see the interviewer engage more or even challenge the interviewee, most of the time you just agree with the decisions/design, but overall great job.
fair point!
@@IGotAnOffer-Engineering Amazing interview but I agree with @SmartCoder89. Asking questions on why decided to use certain technology, as I have been asked in interviews, and such
@@IGotAnOffer-Engineering Exactly .. The interviewer should also ask some more questions about the why's of implementation , the trade offs,
The fastest way to learn is to watch someone do it. Seeing this senior Engineer go through system design from scratch is really inspiring. thanks for sharing.
Thanks Joshua!
Couldn't agree more!
I recently had a twitter system design interview, and i’m sorry to say that this is not a good mock. In a real interview, the interviewer ask the candidate challenging questions, interrupt and steering you to a direction that relevant for how the specific company see the role. This leaves you much less time to cover some of the items that shown in the interview, your time management need to be much more precise. In addition, a good portion of the interviews for staff level are with 2 interviewers, which add more to the challenge.
Well it depends. If you had an interviewer who was junior or lower mid-level, the interview might go as you see in this video.
Often times it will be shorter than this, for me it was 30 minutes. just clarifying the assignment was already 10 minutes gone. so you don't need to go into detail about everything. the point of this is not to mirror a real time interview, I don't think, but for us to be able to extract information. If you watch 10 of these videos the chance that you will be able to answer a question about something in a way that it looks like you know what you're talking about will increase greatly. And let's be honest, in the job you won't be designing the system anyway so that is the main point.
I worked at Twitter and this design is hot garbage. Like he doesn't even talk about micro service architectures, service discovery and just glosses over components. What are the responsibilities of the "twitter processor" for example? He just plops down components and makes them magically do-alls. He goes into data flow and completely looses context. He also makes it appear that the entire thing stays in memory implicitly.
@@drunken_moose interesting. i have never done a system design interview before, and having one coming up next week.
good to hear this might've been hot garbage and i so that i don't spend time one something not v useful.
agree this does seem a lot like just plopping down components
IMHO this is lacking important things:
- A message bus, and something in front of it (what I would use, in the real world you would want various services and a service oriented architecture is more scalable and adaptable, potentially have multiple sources of entry, ie Twitter used to and may still allow SMS created tweets, and then the can of worms that involves bc message bus and bc various services -- he uses a message bus but just for media).
- websockets (and pub/sub for that matter).
- I do not understand the choice of Cassandra. I would think a time series database would work better for fetching tweets if you want them ordered by time. If it was a service oriented architecture, you could even do things like have a service for determining a timeline for a given user, or groups of users.
- I didn't catch any mention of a CDN.
- This might be extreme, but I would think rate limiting and security as well.
- Batching updates is something I would think to do, and at least mention minimizing deltas would be interesting.
- Also with such massive load I would think Protobufs as a protocol work really well and would be worth mentioning. Storing keys is expensive at scale.
Still a great interview. Just had some notes, and feel a little concerned I don't see websockets anywhere in the video or comments.
Thanks for sharing! But today i had System Design interview, where i was asked to design twitter, and it was much more difficult) Because:
1. Interviewer were interrupting me and asking lots of question (for example, why use cassandra, what type of replication should we use, etc)
2. The scheme must follow any kind of notation, C4 for example.
3. I had to count the numbers (you know, RPS, capacity etc.) and implement it into the scheme (how much CPU, RAM, HDD/SSD should be, how not to kill Databases or Queue managers with too many connections)
Agree, all those videos actually give little to no help in preparation to the real interview where the candidate is getting grilled heavily
Great content. i wish the interviewer will challenge the interviewee about why he made the decision he made. i think the explanation of why he chose to use every component is very important.
Thanks alot for your content :)
What happened to Eugene's videos? I don't see them available anymore
I didn't follow the math. Max size of a utf-8 characters is 4 bytes. 280 x 4 = 1,020 bytes which is 1kb not 10kb.
Yes it should have been 1kb
Really good implementation and covered many topics which I would like to cover for the similar question. Thanks for the sharing.
Thanks!
This is not how typical system design interview go.
Why does the interviewer always look so disinterested 😂
He is my best in system design
Great video. Could you please share the name of the whiteboard application used in this mock interview?
Thank you! If I remember rightly, I think its Excalidraw
very detailed video. thanks a lot for posting.
You're welcome :)
Thanks for the feedback!
This is not bad but barely passable for L5 and certainly not an L6+ material. There are a lot of holes. Redis Pub/Sub for instance is very fragile part of the design. Also it would be very hard to get the people that a user follows quickly. There were bunch of hand wavy stuff, if we're partitioning by Tweet ID, why does it matter that the Tweet ID is ordered? If we're partitioning by the user and then by the tweet ID then each tweet will still go to a different server. What's the purpose of it? I mean there are some big holes. We did the capacity planning and what purpose did it serve? What did it help with? Just waste of time? The more I think about it, L5 hire is hard actually maybe L4.
@IGotAnOffer at 38.12 where it is proposed that every tweet will be posted into kafka
now coming back to calculations velocity in q1 would be 6000 messages / sec as that is the number of tweets produced / sec
second now since you are fetching the followers in the consumer which is 200 / user . so roughly there is 6000 qps on the user follower database which fetches all the followers
now this consumer is publishing 200 messages one for each follower into the second queue , which will be 6000*200= 1.2 million messages / sec
also all other services like in this case the redis would receive as many writes , this is a important issue , if we consider bandwidth
also the secondary approach could be to batch these .
How would you be able to store the timeline for each user in a cache? Wouldn't that be so much data?
@crushingtecheducation - can you explain kafka work here - especially for timeline ? If it is going to get all the tweets in Kafka, what would the processor (consumer) do with all the tweets? are you going to have consumers for each user or active users in Kafka and prepare a data set and store it in cache? is that the purpose of kafka + timeline processor ?
also - how would partioning on tweet id will help?
Thanks for sharing!
One of the very complex systems which could've been simplified.
Question : If a tweet can have 280 characters, how can it be assumed that it takes 10kb space? Isn't that an overestimation?
Good point! I might've assumed average space for the text and image in the tweet.
@@crushingtecheducation Thanks for this video. I have couple of questions:
1) for tweets, why are you going with no sql DB? Tweet id, user id, tweet content (text) - can all be stored in relation DB, right? Media can be stored in nosql or blog DB, with reference in relational DB, probably as part of content itself. I thought that is how you described in the initial field design.
2) Why cassandra DB and wide rows - what does that mean ?
tweets have a fixed "text" length -> translating to a fixed set of data size, right?
@@NathanSubramani Any chance it has to do with tweets being trended and tagged and search by word?
Thank you for the good system design interview.
Also was wandering why @crushingtecheducation has no content?
At 6:48 , interviewee says that a Tweet would be 10KB size. Shouldn't this be 240/1024 KB ~0.2KB ?
you have runes, that can occupy up to 4 bytes / char => 280 * 4 = 1kb, for ez calculation. But would love to also get some feedback on this
Also 200 mil daily active users, and 100mil posting? didnt he say at the beginning that the scale is 1-10 or 1-20, which would make more sense ?
I might've thought about average size of the message with media (picture or video). If there is no video, your numbers make sense
@@marianbuciu7853 1-20 write-read ratio makes perfect sense.
@@crushingtecheducation Even with media that would be cdn url right.
tweets API could also contain timestamp in the meta data along other info
he talks about that later
You should really challenge the interviewed person more, because without it the whole interview feels absolutely unnatural and you look like you have no idea what you are talking about, however I'm sure that you are proficient at system design. You just have to show it
the candidate is a bit weak I feel like you should grill him a bit to see if he knows what he's talking about or just regurgitating from a sys design book
Thanks. It is very useful
You are welcome :)
17:50 do you really need to insert 2 records for 1 following?
Thinking about the same. With one record we can retrieve it for both sides unless both follow each other
2 inserts allow us to better scale the database since we can have 2 independent key-value tables/databases. Once to follower=> followee and the other one for followee=>follower.
If we don't do it, we have to use an index on user_id (follower or followee) which is not optimal for billions of records.
How does it make sense to partition tweets by tweet id?
most common use case of fetching a batch of similarly timed tweets (like for a timeline) won't require any cross-partition joins
Why did he choose cassandra not any sql db? His explanation is not clear
why 20 followers not 200 followers?
not experienced senior
Seems like Too complicated flow diagrams
I think this presentation is more for senior engineer interviews. For new graduate system design interviews I think this design is indeed an overkill lol
You're right, in fact new graduates don't usually face system design interviews, at least not at the likes of Google, Meta, etc.