I have given HLD interviews and they are not just about making high-level boxes. What kind of boxes (type of cache, type of DB) and how will they solve the problem (how will cache and rate limiter interact to decide how many requests have come for a user in a window). This definitely needed a more elaborate discussion. Informative otherwise, thanks!
straight outta the alex xu book. Is this scripted? Doesn't feel like a conversation at all. All rote learning. Just my two cents here, still a good video.
almost word for word...especially when it mentions the pros and cons of each rate limiter algorithm and why one would chose to implement rate limiter on server side versus client side. Regardless, its helpful to see how someone would communicate these concepts in an interview scenario :D
The sliding window approach explained seems the same as token bucket. I think in time sliding window, each request will have timestamp and whatever requests are within the window will be eligible to process unless if there is new request and the time window slide towards the new request timestamp removing any older requests (if) out of the window and for those removed requests 423 will be sent. Or else, the window won't be sliding at all unless it can to accommodate the new request other wise it will send 423 for new requests. Either approach is fine depending on requirements.
It's not quite the same. Particularly in accuracy, how much memory they use (Token Bucket uses more than windows) and how well they deal with bursty traffic. Token bucket handles bursts implicitly, but bursts cause overthrottling when using a fixed window. Sliding window or improvements of it like sliding window log (very accurate due to timestamp alignment, uses tons of memory) or sliding window counter (technically an approximation, saves on memory and smoothes request rate). So depending on the problem statement one may very well be a better choice than the other. These tradeoffs are discussed in chapter 4 of Alex Xu's System Design book vol. 1
@@oakvillian5what do you even mean? If your bucket has 100 tokens and it gets replenished every minute and you get 100 requests when it gets replenished, then you’ll be an entire minute dropping requests. Fixed window and simple bucket are exactly the same, the difference is that one counts from 0 to quota and the other from quota to 0 haha
@@Damian-cd2tjbuckets are refilling continuously there’s more overhead. They are also more accurate when you want to throttle based on consistent usage like X MB / s. If requests don’t line up with your window you could throttle customers unnecessary and increase load on the system due to retries so it does matter
I think the part of making the rate limiter distributed could be explained better. What does "one common cache" mean? Also the "read cache" and "write cache" were quite confusing, but the interviewer didn't do her job to dig through.
Wont the cache here for checking IP being blocked or not for each request be detrimental for the system and on a peak load scenario the window will slide in sub milliseconds...so the request which is in cache may be invalid for the new window duration
Rate limiter sits in between load balancer and web servers doesn't seem to a neat design at all, because it creates endless of trouble. How do you decide which web server to send a request to when it succeeds rate limit? Does the rate limiter service maintains Web Server auto scale group information? It should be a sit-aside service where LB (or web servers) does a check by calling it. Or it could even be a library in Web Server, but definitely not a pass through component.
Hi yzhan004, thank you for your comment! It is a common approach to place the load balancer before the rate limiter. This configuration allows for an even distribution of traffic to the web servers, followed by the application of rate limiting measures. However, in the specific case of restricting users based on IP addresses, it is advisable to place the rate limiter before the load balancer. This placement enables early evaluation and enforcement of IP-based restrictions, enhancing system efficiency and security. Your observation is accurate, and it is indeed a smart move to put the rate limiter before the load balancer when implementing IP-based restrictions. Thank you for pointing that out! 💪
Did the candidate even design the rate limiter? The most important part of the design is the actual rate limiter component and they just put two boxes called "API Rate Limiter". Maybe interviewer would get enough signal on this, maybe not. Definitely could have been better.
Thanks for the video! Could it not cause problems to rate limit on IP if multiple users are behind the same IP? Like in the case of CGNAT or VPN or similar?
You're correct - and this is also a common implementation you see in the real world. I often see throttling errors since I'm behind the firewall in a big company. Design like this is often about tradeoffs and I think a great candidate can explain that and make an informed choice.
In a distributed env, I was thinking if the load balancer did geolocation based routing and have each rate limiter in each region , with its own isolated region specific cache ? No?
@@viraj_singh even before the idea here is we are rate_limiting on the basis of IP, if anyhow the client changes the country by using VPN the IP changes and it's treated as new user by the rate_limiter. The point here by redirecting based on geolocation seems good enough to me (I also had that in mind).
Hitting the cache for every request by every customer to check rate limit. This will being down the cache service/server immediately on even slightly higher load right? Any better approach to avoid cache service coming down?
It would be better to dig into the consistency and latency issue when using a centralized cache, discuss the solutions like pessimistic and optimistic locks and sharding.
Hi Praveen! This won’t be an issue as the NAT router will do the translation such that the port numbers are not identical before sending it to the external server. So although the users in the same network will have the same public IP, they will have different port addresses which the server can use to identify the users. More info here: stackoverflow.com/questions/1982222/how-do-two-computers-connect-to-same-external-address-through-nat Hope this helps! Thanks for watching!
429 implies the client needs to adjust its behavior. 529 suggests the server is experiencing temporary difficulties. So i would recommend your rate limiter responds with 429, not 529. You mentioned both, but you drew 529
The interviewee and your comment are both wrong here... you can only respond with a 429 if there's too many requests from the same clientID which rarely happens so instead of that I'd design the system to respond with a 503 error (The server is not ready to handle the request - as its busy processing others currently so slow down.) Please check out the official documentation of AWS S3 - Server timeouts 503 for reference.
IP blocking might not be the best way to go as with DHCP, proxy the IP's can be different.. IP block can be done with WAF and we might not need a rate limiter..my view point..
There is another prety powerful technic not mentioned in this video. You can cache on the token level, the backend can encript basic data in the token and look it up when the client is calling. It could eliminate the need of some other cach used in this video
YES, you can have different rate limiters per geography !!! of course!! Not "i don't think so, it will put you in a vulnerable position". Imagine a situation where your backend Servers that you are connecting to from the rate limiters are setup differently in different geographies then each geography has different capabilities and hence can handle incoming requests at different rates!! And if you are rate limiting based on ip, then why do you want to have a joint cache between geographies? The requests from 1 IP always goes to the nearest geography based on the DNS. DUDE!!
I think the rules cache is pointless - just push rules to all rate limiter replicas whenever rules chance - which should be very rarely. One could also use a CDN for that (say, a json file with the rules) and have the API rate limiter servers read it now and then.
interviewer didn't cross question too much almost agreed to whatever interviewee was saying, so this doesn't give idea on what could be cross questions on every decision taken while designing.
Another Q I had was at the end.. I am a customer like Stripe who is being rate limited by a fixed set of rules in the rules engine, I have no malicious intent. I am getting a 429, who needs to make the change? There could be a possibility that the API contract has changed or rules have become obsolete.. will this cause a maintenance headache? @hozefa
This is so bad. You argued that you're going to globally rate limit users based on IP. But when questioned if you could route users to a different rate limiter based on geographic location you said no. Your justification was that IP address could be "easily faked" with a VPN. Hahahah . Make it make sense. His design is poor and she just nods her head and agrees to everything.
It shouldn't - you could make like 1k fake accounts and then if you had a rule per userId, you could still make 1000x calls from one IP using every account at the same time, without even changing the IP address. I doubt facebook would block a family - we are talking about hundreds/thousands of requests per minute/second, not 20 requests per second :D
My SDE friend at Amazon recruits for L4 and L5 positions. His tips for me would suggest that this kind of answers in the video would not pass at all. the interviewee starts spewing and regurgitating knowledge without caring about what the customers want .there were no questions regarding what this rake limit would be for. It just tells the interviewer that you only know one way to implement, and cannot handle the pros and cons of different approaches and why you need a certain type of limiter design
It is all bery stupid. Concurrent requests or threads is the only correct answer. That os what runs out of memory. Everything else is useless solutions, just show off
Don't leave your engineering management career to chance. Sign up for Exponent's EM interview course today: bit.ly/3wQmHQu
I have given HLD interviews and they are not just about making high-level boxes. What kind of boxes (type of cache, type of DB) and how will they solve the problem (how will cache and rate limiter interact to decide how many requests have come for a user in a window). This definitely needed a more elaborate discussion. Informative otherwise, thanks!
This feels very scripted. It's almost like he is reading out of a reference book.
and reference book is alex xu book chapter 6 word to everything from that book
and it's complete one sided
He’s probably built rate limiters before !
straight outta the alex xu book. Is this scripted? Doesn't feel like a conversation at all. All rote learning. Just my two cents here, still a good video.
almost word for word...especially when it mentions the pros and cons of each rate limiter algorithm and why one would chose to implement rate limiter on server side versus client side. Regardless, its helpful to see how someone would communicate these concepts in an interview scenario :D
I was actually reading along with the video, more or less. XD
I would wanna hear the discussion of locking or not locking the high throughput cache while writing. Great video overall!
The sliding window approach explained seems the same as token bucket. I think in time sliding window, each request will have timestamp and whatever requests are within the window will be eligible to process unless if there is new request and the time window slide towards the new request timestamp removing any older requests (if) out of the window and for those removed requests 423 will be sent. Or else, the window won't be sliding at all unless it can to accommodate the new request other wise it will send 423 for new requests. Either approach is fine depending on requirements.
It's not quite the same. Particularly in accuracy, how much memory they use (Token Bucket uses more than windows) and how well they deal with bursty traffic. Token bucket handles bursts implicitly, but bursts cause overthrottling when using a fixed window. Sliding window or improvements of it like sliding window log (very accurate due to timestamp alignment, uses tons of memory) or sliding window counter (technically an approximation, saves on memory and smoothes request rate). So depending on the problem statement one may very well be a better choice than the other. These tradeoffs are discussed in chapter 4 of Alex Xu's System Design book vol. 1
@@oakvillian5what do you even mean? If your bucket has 100 tokens and it gets replenished every minute and you get 100 requests when it gets replenished, then you’ll be an entire minute dropping requests. Fixed window and simple bucket are exactly the same, the difference is that one counts from 0 to quota and the other from quota to 0 haha
@@Damian-cd2tjbuckets are refilling continuously there’s more overhead. They are also more accurate when you want to throttle based on consistent usage like X MB / s. If requests don’t line up with your window you could throttle customers unnecessary and increase load on the system due to retries so it does matter
I think the part of making the rate limiter distributed could be explained better. What does "one common cache" mean? Also the "read cache" and "write cache" were quite confusing, but the interviewer didn't do her job to dig through.
Hey Linchuan! Thanks for the valuable feedback!
Yes. I was thinking if we can have multiple cache servers and use hashing on IP address to select the right cache for data related to this user.
How does it come that this vid has only 2k views? Awesome content!!
Felt same.
It doesn't feel like a real interview
Wont the cache here for checking IP being blocked or not for each request be detrimental for the system and on a peak load scenario the window will slide in sub milliseconds...so the request which is in cache may be invalid for the new window duration
Rate limiter sits in between load balancer and web servers doesn't seem to a neat design at all, because it creates endless of trouble. How do you decide which web server to send a request to when it succeeds rate limit? Does the rate limiter service maintains Web Server auto scale group information?
It should be a sit-aside service where LB (or web servers) does a check by calling it. Or it could even be a library in Web Server, but definitely not a pass through component.
Hi yzhan004, thank you for your comment! It is a common approach to place the load balancer before the rate limiter. This configuration allows for an even distribution of traffic to the web servers, followed by the application of rate limiting measures.
However, in the specific case of restricting users based on IP addresses, it is advisable to place the rate limiter before the load balancer. This placement enables early evaluation and enforcement of IP-based restrictions, enhancing system efficiency and security.
Your observation is accurate, and it is indeed a smart move to put the rate limiter before the load balancer when implementing IP-based restrictions.
Thank you for pointing that out! 💪
This was wonderful!
Did the candidate even design the rate limiter? The most important part of the design is the actual rate limiter component and they just put two boxes called "API Rate Limiter". Maybe interviewer would get enough signal on this, maybe not. Definitely could have been better.
Thanks for the video!
Could it not cause problems to rate limit on IP if multiple users are behind the same IP? Like in the case of CGNAT or VPN or similar?
device also has unique address
@@kolya6955 You can't get the Mac address from an HTTP call.
I was thinking the same thing. Bro a leetcode monkey 🤦🤦
You're correct - and this is also a common implementation you see in the real world. I often see throttling errors since I'm behind the firewall in a big company. Design like this is often about tradeoffs and I think a great candidate can explain that and make an informed choice.
Awesome content!!
pls make more videos❤
Hey darshitgajjar5199, glad you are enjoying our content!
Let us know what types of videos you are looking for! More system design mocks?
In a distributed env, I was thinking if the load balancer did geolocation based routing and have each rate limiter in each region , with its own isolated region specific cache ? No?
What if the client changes their ip to another country by using VPN and then it defats the purpose of rate limiter
@@viraj_singh even before the idea here is we are rate_limiting on the basis of IP, if anyhow the client changes the country by using VPN the IP changes and it's treated as new user by the rate_limiter. The point here by redirecting based on geolocation seems good enough to me (I also had that in mind).
Thanks. I’ve learned something new
Hitting the cache for every request by every customer to check rate limit. This will being down the cache service/server immediately on even slightly higher load right?
Any better approach to avoid cache service coming down?
Will he get a hire/strong hire if this answer is given in a real interview?
I would probably have given a weak hire rating for an L6 position. Context: I work at Google and regularly conduct these kinds of interviews.
@@groovymidnight how would you suggest I study hld, as with 2 yrs of exp I was caught off guard with "Impl idempotency at framework level"
It would be better to dig into the consistency and latency issue when using a centralized cache, discuss the solutions like pessimistic and optimistic locks and sharding.
What drawing tool you are using in the mock interview session.
Hey birajendusahu3198, it's "Whimsical"!
If we use IP address, wouldn't it block all the users in a network because they all would have same public IP?
Hi Praveen! This won’t be an issue as the NAT router will do the translation such that the port numbers are not identical before sending it to the external server. So although the users in the same network will have the same public IP, they will have different port addresses which the server can use to identify the users.
More info here: stackoverflow.com/questions/1982222/how-do-two-computers-connect-to-same-external-address-through-nat
Hope this helps! Thanks for watching!
@@tryexponent Thanks , In that case the user identifier would be ip:port
What application is used by the Hozefa? thanks in advance
Hey ThelmaPriscila! It's called "Whimsical"
Gracias!
feedback : there was no discussion between these 2 guys.. they were just running at their own speed...
429 implies the client needs to adjust its behavior.
529 suggests the server is experiencing temporary difficulties.
So i would recommend your rate limiter responds with 429, not 529. You mentioned both, but you drew 529
The interviewee and your comment are both wrong here... you can only respond with a 429 if there's too many requests from the same clientID which rarely happens so instead of that I'd design the system to respond with a 503 error (The server is not ready to handle the request - as its busy processing others currently so slow down.) Please check out the official documentation of AWS S3 - Server timeouts 503 for reference.
I liked how Huzaifa tried to act dumb the whole time, while being an EM.
The lady has no idea what’s going on she just says yes to everything
waiting for interesting and deep video rather than only discuss on cache, load balancer,....
thanks for this
isnt cache single point of failure
IP blocking might not be the best way to go as with DHCP, proxy the IP's can be different.. IP block can be done with WAF and we might not need a rate limiter..my view point..
There is another prety powerful technic not mentioned in this video. You can cache on the token level, the backend can encript basic data in the token and look it up when the client is calling. It could eliminate the need of some other cach used in this video
Can you please elaborate on this. Didn't clearly understood this solution
YES, you can have different rate limiters per geography !!! of course!! Not "i don't think so, it will put you in a vulnerable position". Imagine a situation where your backend Servers that you are connecting to from the rate limiters are setup differently in different geographies then each geography has different capabilities and hence can handle incoming requests at different rates!! And if you are rate limiting based on ip, then why do you want to have a joint cache between geographies? The requests from 1 IP always goes to the nearest geography based on the DNS. DUDE!!
I think the rules cache is pointless - just push rules to all rate limiter replicas whenever rules chance - which should be very rarely. One could also use a CDN for that (say, a json file with the rules) and have the API rate limiter servers read it now and then.
It should be 429
How did this guy get into Meta?
There are bot engineers all over faang, no need to targate him
interviewer didn't cross question too much almost agreed to whatever interviewee was saying, so this doesn't give idea on what could be cross questions on every decision taken while designing.
Rate limiter system design in Hindi : th-cam.com/video/khhe7avsw1g/w-d-xo.html
Easy to understand...
Is this an engineering manager? He appears more nervous than me lol.
Another Q I had was at the end.. I am a customer like Stripe who is being rate limited by a fixed set of rules in the rules engine, I have no malicious intent. I am getting a 429, who needs to make the change? There could be a possibility that the API contract has changed or rules have become obsolete.. will this cause a maintenance headache? @hozefa
This is so bad. You argued that you're going to globally rate limit users based on IP. But when questioned if you could route users to a different rate limiter based on geographic location you said no. Your justification was that IP address could be "easily faked" with a VPN. Hahahah . Make it make sense. His design is poor and she just nods her head and agrees to everything.
rate limiting on IP ? what if there is a whole family on facebook..they will have the same IP..it should be on userId!
It shouldn't - you could make like 1k fake accounts and then if you had a rule per userId, you could still make 1000x calls from one IP using every account at the same time, without even changing the IP address. I doubt facebook would block a family - we are talking about hundreds/thousands of requests per minute/second, not 20 requests per second :D
A computer science graduate can do this much of design. Should have talked specifically more on rule engine.
Wrong answers
System design should never be just talking without drawing anything and showing. SHOW YOUR THOUGHTS!!!!
My SDE friend at Amazon recruits for L4 and L5 positions. His tips for me would suggest that this kind of answers in the video would not pass at all.
the interviewee starts spewing and regurgitating knowledge without caring about what the customers want .there were no questions regarding what this rake limit would be for. It just tells the interviewer that you only know one way to implement, and cannot handle the pros and cons of different approaches and why you need a certain type of limiter design
Shouldn't he be conversing instead of mansplaining?
Knowledge wasted, this video could have been more of format. You are not discussing in some structure.
watched. -
It is all bery stupid. Concurrent requests or threads is the only correct answer. That os what runs out of memory. Everything else is useless solutions, just show off
For every word dont use ah ah ah ah🤦♂️ its irritating
feels unnatural and scripted
What a pure waste of time... Even LKG kids can do better than meta EMs, I guess.
Bushit
How annoying is his communication skill.
seems like this guy doesn't know his stuff