Don't leave your career to chance. Sign up for Exponent's system design interview course today: bit.ly/4a7wyQ2 Liked this video and want to see more? Click "Subscribe" to let us know!
1. Spent way 2 much time on calculation (15min), when you have 45 min interview you want to spend a max of 5min on that! use rough estimates and ask the interviewer to help with assumptions. Would have been better to say, I assume a post will be ~10-20K assuming the type of data, no more :) 2. Then He started to actually draw his system 25min in! you want to be able to spend the majority of your time drawing your solution, going back and forth with the interviewer. 3. The solution makes "saving a post" extra complicated. 4. The main question was not answered, how to display the feed. That feed should be optimized, cached and you should explain how can you make that load as fast as possible without hitting the DB for every request.
to everyone shitting on him, he was still in college at the time he did this interview. Realistically, a very small percentage of college students can answer these questions well.
One thing I like about what the interviewee did is that before drawing out the high-level architecture, he made a list of services that were needed and types of storage for each. I think that’s a great way to transition from requirements gathering to HLD as it acts like a checklist.
I was told to manage time in SD. reqts + estimation done in first 5 minutes HlD - 20 min deep dive - 15 min roundup - 5 min He could have asked "what does a top post mean? Is it the l# of upvotes, or replies, or views the post has got in last hour?"
I'm here a year after your comment; but yeah, this part was bad. Tbh it's not so much on the interviewee - the interviewer should have stopped him and moved him onto more important tasks
Pretty new to these questions at this point, but it seemed like the point was to gauge how much data would have to be queried for day, to drive the point home that it needs to be cut down as much as possible. I guess it could have been condensed to "each post has 5KB of data", but it didn't seem too out of place to me.
Good design but I think the feed generation part was the key need of the ask and somehow he missed it and she had to remind it. I believe that part of the solution needed some more details on how we can keep generating it every time. maybe the subredditid and userid mapping need to be there in the DB design as well
Cool to see this type of interview from someone who's just had internship experience and still in school. I would have to say Kevin is years ahead of me in terms of knowledge even though i've been working full time for almost 3+ years now. Definitely have to practice more myself! Thanks for these videos Exponent!
I do not think people should spend so much time on estimating the amount of bytes, it's not embedded and a quick guesstimate would suffice. It might seem like you could make a better decision like that, but that's false. What if your estimates for an image are wrong? Then the whole thing is going to be exponentially wrong based on the error in the image estimation. By the way, the images wont be served by Reddit's network bandwidth per se. An object storage would be used like S3 and a CDN would be used too. That's another argument of not going deeply into estimation before thinking about the architecture.
Serving from S3 or a CDN also counts as reddit's bandwidth usage as they're charged for it. Even if they use a third party like Akami, they'd still be charged on the amount of data transferred.
Decent design but i felt a few things missing from this, felt that we went at a too high level. There are a few challenges specific to reddit and would have loved to hear them being discussed, reddit has an infinite level of comments nesting, so how do we handle that, they have relations but given the massive amounts of comments we couldnt have gone relational, so would have been fair to discuss the tradeoffs. 2nd thing were the votes, as Software Engineers we know how difficult counters are, to avoid race conditions you need a lock on the entry and you can not do that for every vote given how frequent they are. So again tradeoffs....
Votes don't really matter all that much on reddit. Race condition or not, the perfect accuracy of votes is not a big deal imo. With reddit, all you need is eventual consistency and you can resolve conflicts by system clock (i.e. last write has precedence). Does it matter if you lose a few votes? Not really honestly. As for comment nesting, I think past a certain point, reddit sends you to a separate thread chain
@@harrywang9375 yeah if anything I wouldn't be shocked if reddit batched vote traffic and just tried to calculate the deltas to cut down on issuing writes to their database layer. They're more concerned with serving reads and stale data is OK. For signed in users you might even be able to crib ideas from twitter and build individual home pages in memory and keep them in some kind of queue data structure. Let your ranking system take a user's subreddits into account to build a home page and unsigned in users are treated like one big group of users with the same preferences. The amount of data to store pages of homepage data is pretty manageable and just like twitter if a user doesn't sign in for a while you can delete their cached homepage to save space. It also means that even if the server holding that cache data fell over it would be trivial to rebuild so it's really available.
Maybe he was intending the App Server to be something like an API Gateway (e.g. Kong), where it routes to different microservices. It can handle security, routing, aggregation, caching, etc.
Good content! I think the interviewer did a pretty good job at specifying each microservices, and provided a very clear and logical thought process on how they interact with each other in the service flow. One thing I think can improve is that it would be better to set aside a section to talk about the data model and data base choices ~
What's the interview's name and experience? They should also provide the interviewee as well as interviewer profile also so that we can know that they have relevant experience to understand what questions they should asked and what not?
Hi Hristian! Estimations help with gauging the technical requirements and scale of the system, establishing the “what” (what we are building) before diving into the “how” (how do we build it).
The key component of horizontal scalability for any data storage type, regardless of SQL/NoSQL, is the partitioning of data into logical or physical "shards" (although they may be referred to differently by various NoSQL vendors, and some NoSQL db's may offer default partitioning strategies out of the box).
at least for the memory calculations, this will give you the amount of storage you will need in GB. Then you know how much you will need to expect to have/buy. The last thing you want in these situations is not have enough memory to store things!
Tbh they aren't particularly helpful. If you're designing and architecting a solution, it does help to have an idea of the scale you'll be working with. For example, if you're building a site for your Grandma's bake sale and expect 0 - 20 clicks per day, that will require a different architecture than something like reddit with tens of thousands of clicks per second in various regions around the world. The interviewer started by saying they'd have millions of clicks per day, and tbh that's all he should have needed in order to make an educated guess about how the architecture should look. Digging so deep into the calculations was not necessary, and tbh was more a sign of inexperience than anything. I don't fault him, though. At this point in his career he seems quite new and doesn't have experience with building full systems. If anything I'd fault the interviewer for not interrupting him and getting him to focus on other aspects of the design. Anyways hopefully that answers your question! (even though I'm coming to it almost a year later haha)
He is totally wrong about the 6.25 billion posts. And here is how: Daily active users are 50 million * visits 5 times * and see 25 posts ( in last 24 hours) That can not be and doesnt have to be multiplied because in last 24 hours only certain amount of interested articles feeded into the reddit storage ( lets say X) and every time someone visits the reddit , few posts would be repititive. And a lot of users will br sharing the interests with each other so that reeuces this number for sure. So instead of doing calculation on posts and number of users and blah blah. Candidates should assume calculations around number of posts have come in last 24 hours ( to narrow it down he ahould be asking interested posts) Make sense??
You have 50 million daily active users and you add cache and crawler in front posts service - wondering how big would be that cache - thanks for sharing but I don't think this is optimal soln
Post service, feed service and ranking service kinda reek of functional decomposition App server acts like a gateway but, does it make sense for the gateway to call one service ? Sharding sharding sharding ... Why? Sounds like an over engineered solution
I dont know why people want to spend time doing calculations and later their design is pretty generic and it is impossible to really prove it meets or doesnt meet the numbers.
While this guy was bright and I'm sure will end up being a great systems engineer, why would you bring a college student on with no real-world experience outside of an internship to do this interview? It would have been much more valuable to viewers if you brought on a seasoned engineer to do this.
Hi D! We understand where you're coming from. However, in this video, our goal was to showcase that you don't need to be a seasoned systems engineer to excel in a mock interview. We want to encourage recent graduates who are currently job hunting and going through interviews. Of course, we also have other videos where we interview subject matter experts, so feel free to check those out too!
@@tryexponent Understood, I guess I'm not the target audience for this specific video. Will check out the interviews with the SMEs. Thanks for responding!
Hey there! The calculations are used to estimate how much load your system will need to be able to handle. This part will impact how you reason about your high-level design. To find out more, you can check out: www.tryexponent.com/courses/system-design-interviews/intro-architecture Hope this helps!
Why not shard by interests or subreddits. Sharding by post id just seems totally useless. Webserver and app server are not microservices to speak of. Why is there no talk of the frontend. Certain technology decisions on the frontend can totally reduce something like 10 times the load. E.g using a SPA can move alot of the compute load straight to the users browser. Splitting the design into the system design when writing/positng and one when reading would simplify this. CQRS anyone. Very little talk about caching. A system with that load should have caching front and center. Caching should even be done on the frontend. Overall, I think the answers are just good starting points but the answers could be much better and the design is definitely lacking
Don't leave your career to chance. Sign up for Exponent's system design interview course today: bit.ly/4a7wyQ2
Liked this video and want to see more? Click "Subscribe" to let us know!
can a review be added at the end of each mock interview. such as what was done well and what was not and how to improve it.
1. Spent way 2 much time on calculation (15min), when you have 45 min interview you want to spend a max of 5min on that! use rough estimates and ask the interviewer to help with assumptions. Would have been better to say, I assume a post will be ~10-20K assuming the type of data, no more :)
2. Then He started to actually draw his system 25min in! you want to be able to spend the majority of your time drawing your solution, going back and forth with the interviewer.
3. The solution makes "saving a post" extra complicated.
4. The main question was not answered, how to display the feed. That feed should be optimized, cached and you should explain how can you make that load as fast as possible without
hitting the DB for every request.
to everyone shitting on him, he was still in college at the time he did this interview. Realistically, a very small percentage of college students can answer these questions well.
Quick useful tips: jump to 18:00 to avoid a very long discussion on estimations and play at 1.25x speed.
One thing I like about what the interviewee did is that before drawing out the high-level architecture, he made a list of services that were needed and types of storage for each. I think that’s a great way to transition from requirements gathering to HLD as it acts like a checklist.
I was told to manage time in SD.
reqts + estimation done in first 5 minutes
HlD - 20 min
deep dive - 15 min
roundup - 5 min
He could have asked "what does a top post mean? Is it the l# of upvotes, or replies, or views the post has got in last hour?"
12 minutes spent on some pretty random calculations…
I'm here a year after your comment; but yeah, this part was bad. Tbh it's not so much on the interviewee - the interviewer should have stopped him and moved him onto more important tasks
Pretty new to these questions at this point, but it seemed like the point was to gauge how much data would have to be queried for day, to drive the point home that it needs to be cut down as much as possible.
I guess it could have been condensed to "each post has 5KB of data", but it didn't seem too out of place to me.
You get only 45 minutes for the interview. The requirement phase + Core Entities + APIs should not be more than 6-8 mins.
Good design but I think the feed generation part was the key need of the ask and somehow he missed it and she had to remind it. I believe that part of the solution needed some more details on how we can keep generating it every time. maybe the subredditid and userid mapping need to be there in the DB design as well
"Can you cut that part out?" but they leave that request in 🤣
Cool to see this type of interview from someone who's just had internship experience and still in school. I would have to say Kevin is years ahead of me in terms of knowledge even though i've been working full time for almost 3+ years now. Definitely have to practice more myself! Thanks for these videos Exponent!
I do not think people should spend so much time on estimating the amount of bytes, it's not embedded and a quick guesstimate would suffice.
It might seem like you could make a better decision like that, but that's false. What if your estimates for an image are wrong? Then the whole thing is going to be exponentially wrong based on the error in the image estimation.
By the way, the images wont be served by Reddit's network bandwidth per se. An object storage would be used like S3 and a CDN would be used too.
That's another argument of not going deeply into estimation before thinking about the architecture.
Hi Vallerious! Thanks for sharing your thoughts with us!
Serving from S3 or a CDN also counts as reddit's bandwidth usage as they're charged for it. Even if they use a third party like Akami, they'd still be charged on the amount of data transferred.
I totally agree, probably the engineer that is interviewing you have no idea of how to do back of the envelop calculations.
Decent design but i felt a few things missing from this, felt that we went at a too high level. There are a few challenges specific to reddit and would have loved to hear them being discussed, reddit has an infinite level of comments nesting, so how do we handle that, they have relations but given the massive amounts of comments we couldnt have gone relational, so would have been fair to discuss the tradeoffs.
2nd thing were the votes, as Software Engineers we know how difficult counters are, to avoid race conditions you need a lock on the entry and you can not do that for every vote given how frequent they are. So again tradeoffs....
Votes don't really matter all that much on reddit. Race condition or not, the perfect accuracy of votes is not a big deal imo. With reddit, all you need is eventual consistency and you can resolve conflicts by system clock (i.e. last write has precedence). Does it matter if you lose a few votes? Not really honestly. As for comment nesting, I think past a certain point, reddit sends you to a separate thread chain
@@harrywang9375 yeah if anything I wouldn't be shocked if reddit batched vote traffic and just tried to calculate the deltas to cut down on issuing writes to their database layer. They're more concerned with serving reads and stale data is OK. For signed in users you might even be able to crib ideas from twitter and build individual home pages in memory and keep them in some kind of queue data structure. Let your ranking system take a user's subreddits into account to build a home page and unsigned in users are treated like one big group of users with the same preferences. The amount of data to store pages of homepage data is pretty manageable and just like twitter if a user doesn't sign in for a while you can delete their cached homepage to save space. It also means that even if the server holding that cache data fell over it would be trivial to rebuild so it's really available.
Votes would be a table themselves not an integer
Dude said "like" 50 million times that is more than DAU
I got frustrated so much that I had to pause the video and debate whether I wanna continue watching the video 😄. I ended up watching it.
what are the app servers in this case? why cant the web serveres talk to the microservices directly?
Maybe he was intending the App Server to be something like an API Gateway (e.g. Kong), where it routes to different microservices. It can handle security, routing, aggregation, caching, etc.
Good content! I think the interviewer did a pretty good job at specifying each microservices, and provided a very clear and logical thought process on how they interact with each other in the service flow. One thing I think can improve is that it would be better to set aside a section to talk about the data model and data base choices ~
What’s the point of asking SDE 1 systems design
It's a tutorial on how your system design interview should NOT be
What's the interview's name and experience? They should also provide the interviewee as well as interviewer profile also so that we can know that they have relevant experience to understand what questions they should asked and what not?
what is the point of these estimations at the start if they dont affect the end system design whatsoever
Hi Hristian! Estimations help with gauging the technical requirements and scale of the system, establishing the “what” (what we are building) before diving into the “how” (how do we build it).
He was supposed to use the capacity estimates to determine what type of servers to use :/
why worry about sharding for NoSQL DB
The key component of horizontal scalability for any data storage type, regardless of SQL/NoSQL, is the partitioning of data into logical or physical "shards" (although they may be referred to differently by various NoSQL vendors, and some NoSQL db's may offer default partitioning strategies out of the box).
Put in s3 bucket and use cloud front
Don't guess the capacity. Can you explain this principle?
thanks Kevin!
Thanks a lot . Please make video on LLD as well(Low level Design) , that will be really very helpful.
Why did he do so much calculation at the beginning? What meaningful conclusions can that lead to? Can someone please educate me a little?
at least for the memory calculations, this will give you the amount of storage you will need in GB. Then you know how much you will need to expect to have/buy. The last thing you want in these situations is not have enough memory to store things!
Tbh they aren't particularly helpful.
If you're designing and architecting a solution, it does help to have an idea of the scale you'll be working with. For example, if you're building a site for your Grandma's bake sale and expect 0 - 20 clicks per day, that will require a different architecture than something like reddit with tens of thousands of clicks per second in various regions around the world.
The interviewer started by saying they'd have millions of clicks per day, and tbh that's all he should have needed in order to make an educated guess about how the architecture should look. Digging so deep into the calculations was not necessary, and tbh was more a sign of inexperience than anything.
I don't fault him, though. At this point in his career he seems quite new and doesn't have experience with building full systems. If anything I'd fault the interviewer for not interrupting him and getting him to focus on other aspects of the design.
Anyways hopefully that answers your question! (even though I'm coming to it almost a year later haha)
Thank you very much! Your answer is helpful!
He is totally wrong about the 6.25 billion posts. And here is how:
Daily active users are 50 million * visits 5 times * and see 25 posts ( in last 24 hours)
That can not be and doesnt have to be multiplied because in last 24 hours only certain amount of interested articles feeded into the reddit storage ( lets say X) and every time someone visits the reddit , few posts would be repititive. And a lot of users will br sharing the interests with each other so that reeuces this number for sure.
So instead of doing calculation on posts and number of users and blah blah. Candidates should assume calculations around number of posts have come in last 24 hours ( to narrow it down he ahould be asking interested posts)
Make sense??
Here is a talk about Reddit design by Reddit Engineers - th-cam.com/video/nUcO7n4hek4/w-d-xo.html
These interview videos are really helpful.
hi ..thanks much for the nice playlist..do we have system design for bookmyshow app?
Felt like he had a script prepared
Thank you.
You have 50 million daily active users and you add cache and crawler in front posts service - wondering how big would be that cache - thanks for sharing but I don't think this is optimal soln
Post service, feed service and ranking service kinda reek of functional decomposition
App server acts like a gateway but, does it make sense for the gateway to call one service ?
Sharding sharding sharding ... Why? Sounds like an over engineered solution
I dont know why people want to spend time doing calculations and later their design is pretty generic and it is impossible to really prove it meets or doesnt meet the numbers.
I need Exponent! Will apply tonight!
what software you are using for wireframing?
Hey gothboi3385, it's called "Whimsical"!
While this guy was bright and I'm sure will end up being a great systems engineer, why would you bring a college student on with no real-world experience outside of an internship to do this interview? It would have been much more valuable to viewers if you brought on a seasoned engineer to do this.
Hi D! We understand where you're coming from. However, in this video, our goal was to showcase that you don't need to be a seasoned systems engineer to excel in a mock interview. We want to encourage recent graduates who are currently job hunting and going through interviews. Of course, we also have other videos where we interview subject matter experts, so feel free to check those out too!
@@tryexponent Understood, I guess I'm not the target audience for this specific video. Will check out the interviews with the SMEs. Thanks for responding!
What was the point of all of those calculations?
Hey there! The calculations are used to estimate how much load your system will need to be able to handle. This part will impact how you reason about your high-level design.
To find out more, you can check out: www.tryexponent.com/courses/system-design-interviews/intro-architecture
Hope this helps!
could any one help me what tool was used for drawing the components
This whiteboard app being used looks like Whimsical.
I can see disappointment on her face 😂
What is the program he's using?
Hi DashOfjuice! The whiteboard program being used here is “Whimsical”. They have a free and paid version so do check them out if you are interested!
useless, just made boxes and connected them, no details whats so ever
By the way what's the host name?
Great!!! Grand Salute to you
Failed interview. No hire
I like the interviewee, but somehow all Exponent interviewers have like a scarily expressionless mild smile robotic vibe. Very uncanny valley.
she is beautiful.
got a feel that he is from a different profession. sorry..
omg the boy is so cute. All my focus is him, not the interview
For me the girl 😂
@@abhishekchauhan1203 Already failed
Why not shard by interests or subreddits. Sharding by post id just seems totally useless. Webserver and app server are not microservices to speak of. Why is there no talk of the frontend. Certain technology decisions on the frontend can totally reduce something like 10 times the load. E.g using a SPA can move alot of the compute load straight to the users browser. Splitting the design into the system design when writing/positng and one when reading would simplify this. CQRS anyone. Very little talk about caching. A system with that load should have caching front and center. Caching should even be done on the frontend. Overall, I think the answers are just good starting points but the answers could be much better and the design is definitely lacking
sharding by subreddits would blow up. Think about wallstreetbets.
You want your shards to have an even distribution, sharing by postid can do exactly that.