Gaurav, there is one thing you missed discussing here. How is the service-to-service communication bidirectional here. Gateway and Sessions are also peer-to-peer. But perhaps, that can achieved via gRPC, or again through web-sockets, or asynchronously through an event bus. Great if you can clarify.
@@architkapoor2503 yeah, same from me, how to connect between gateway and session service in bidirectional? we are having multiple session service instance
Hi Gaurav, I have been a developer for 43 years and a system architect for 30. As a very experienced professional all I can say is: good work ! You are the real deal.
Lately i started seeing so many posts all over saying "I learn more on youtube than school/college" No, need to demotivate young generation. They cannot teach you system design in school, this is specialization, not general subjects like PCM. Even in kinder garden, they teach kids about movement of hands to write, ultimately at home parents have to help kids in practice. We cannot expect nursey schools to teach kids writing.
That is a fact. I said the same 8 years ago in some countries the quality of contents in university is really poor, so when you discover youtube is a tool for learning you agree with this sentence
I don't know anything about system design or even what it is, yet I clicked the video and watched it full. Just understood everything, great teaching skills :)
i'm interviewing this week for systems design for the first time in my 6 year career. this isn't part of my job. but here i am, learning an irrelevant (to me anyway) skill to be better at tech interviews. thanks for sharing!
I've gone through all of your system design videos in preparation for my interview - I have found them so useful and easy to understand. Thanks for taking the time to put these together - your sense of humour is also great :D
Hey, just wanted to clarify a bit about long polling. Long polling is actually more of a request from the client to the server and instead of the server returning a response immediately, the server is waiting for a change of some kind, and when there is a change, the server returns the response finally to the client. It was more common back in days before Websockets were introduced and adopted by browsers around 2011 roughly.
This was great. A minor nitpick, this is not what peer to peer means although it can be confusing. Peer to peer would mean apart from one time registration, the connection never touches your server and instead is established directly between 2 clients(friends on whatsapp). Also HTTP now has server push and whatsapp actually uses XMPP(slightly modified).
I'm from a non technical background and this video was super helpful and clear. On a side note, "Check with your doctor if group messaging is right for you" - hilarious :'D
I'm happy that I know what the most important part of programing is which is what you just showed, and not the writing part. It's the pen and paper part that matters. This really satisfies my superiority complex
I personally felt this is too abstract however it's good place to start. There are multiple scenarios which is not handled and it's assumed that it's happy case. I request to go slight deeper in each feature. The design explained in the video is "common sense", any software developer would have think of it. But I really request if you can answer some of the below questions which interviewer can ask us during the interview, if we would have explain the same. 1) You first explained we would store the mapping of each client to the gateway box. But there will be multiple clients will be connected to same gateway. let's say A is on gateway no 1 and B & C is on gateway no 2. then let's say A sends message to B. How would gateway-2 know where we need to route the message for B and C. My hunch is gateway will be storing this information. 2) I really wanted to understand when you will use Gateway vs Load Balancer. In your all videos somewhere you have used gateway and some places you used Load Balancer. You should explain why gateway over load balancer. 3) How are you maintaining the sequence of the message is not explained, this should be a very basic message. A and B can see different view of the message. 4) Why do we need message queue to store the failed message? IMO we should maintain 3 DBs. Sent, Delivered & NotDelivered with userId as key and Map. This will be easier. Whenever A sends message and chat server receives the message you can probably initiate two thread where one thread store message in NotDelivered table and the other thread calls SessionService to fetch the B's gateway box and send the message. Once B send acknowledgement then we can remove message from the NotDelivered and put in Delivered table. You can delete the message from the DB and archive the message in the low cost storage like S3 after 30 days or data is backup. Generally it happens every day night. The advantage of this is let's say B is offline then we do have store all the message that B got from all user's in B's contact list in Not Delivered table. Once B turns internet on their app will send long polling request and then can easily fetch all the data. Is this good design? I might be wrong but this is my thought. 5) You should mention what data we can cache and at which layer. 6) I think you should be more clear when your connection will be active, when it will not. I think if someone has opened whatsapp application or as long as it is there in app background stack the connection should be open. Once we remove the application from the background stack then connection will be lost. Once the connection is lost then your application will continue to send the long polling request until internet is on at certain interval. This kind of arch will have benefit of both websocket and long polling. Once internet is off it can't do even long polling. 7) What type of database and what are the data we are storing. I read in grokking system book that we can use Cassandra (wide column family) But I didn't understood why. So What I am suggesting if possible rather than making too much abstract information, make videos little more systematic it can help us. I think there are multiple counter questions interviewer can ask. why to choose this database over other. what data you would cache, which cache policy would you use (write back/around/through). 8) Here API gateway is just routing the chat server right? It's very difficult to put all the information together. But making video more systematic would certainly help me and others. You can have define structure while making system design video. at least following structure might help better. If you can even explain 5 mins for each of the section could be very helpful to all of us doing system design preparation. 1) Requirement 2) Define constraint/limitation/ data constraint like no of server, Incoming/outgoing data/memory storage 3) High level design for each feature 4) API Gateway/ Load Balancer 5) Data layer what data you will be storing 6) Caching 7) Sharding
@@wonderstruck. I also want to add Cassandra is a good choice here because this system has usually a 1:1 read/write ratio and Cassandra is quite optimized for write heavy systems
Agreed. This approach is way too broad for an interview setting. Perhaps it's because it was made 5 years ago, where the bar for both content and interview isn't as stringent as today although the first principles are the same.
One thing I really like about your video is that you always tried to explain things from the client side, which makes it highly understandable for us as clients in our daily life. Thanks for your video!
Good work. There are a few important topics worth covering though. 1. What happens to a message sent to an offline user. 2. What choices of storage do we have ? 3. What kind of load + image/video messages ?
1. All messages are stored in the database. Once the receiver comes online, receiver's client application checks with the chat server if the receiver has any pending messages that he/she should receive. If there are, then those messages are delivered to the user's client application via websocket. 2. A wide column database like HBase will be good for this use-case (i.e. lots of small messages with very low relation). 3. I didn't understand what you mean here. For storing images and videos we should make use of CDNs.
for no.1 , the answer depends on how you "persist" the messages. In case of queues, a pull request can suffice as all messages in the queues are "pending to be delivered". The trouble (if it's indeed a trouble) would be how you should manage such a big amount of queues (e.g. one queue per friend pair). however if messages are stored in db, there has to be information like "what's the oldest message that client A hasn't received yet ?". this is so that message order can be retained, And more questions come like how sharding/partitioning is done...etc
I only had one or two serious system design interviews till now and the face expressions of the interviewer really put you off sometimes. I have never seen anyone explaining system design like this, it's good to know that I'm not very far off.
Great video! About the 'last seen' feature: I don't think the Last Seen is based on the last activity (e.g. sending a chat) but rather when the user closed the connection to whatsapp. So in my opinion, as soon as 1) the network is interrupted or 2) the user closes the application, the 'last seen' should be updated.
Hi Gaurav, Great video few suggestions; 1. Keeping too many gateway would kill you latency [or may be I'm overthinking] but if we have some services which basically authenticate and establish the web socket connection, there after other services which work as mediator and communicate back-n-forth with client then we can simply reduce the number of gateway we need. [ essentially the gate would do this, but we don't need as many gateway as client are there; imagine 50 million clients at a time on whatsApp] 2. We should not stick with only push mechanism, where you said that "server will keep trying to to send the message to B" instead here it should be pull technique should be use. What I. mean is, once a user successfully establish the connection to Chat server, it AsK "do you any thing for me" and server response back (through queue technique so that message order are persisted ) and then after its all push technique until same thing happen for different user This way we keep server doing less and redundant work, to keep checking. Again think about 50 million user there. on Percent, 0.01% [5K users at least] of them chat each other. then there would be very high number of user might not be online, and for that server keep trying. 3. We don't need to keep checking in server database for where the user is, we should cache them. That may be 50 million entries at max at a time. As soon as the user goes off, remove him [ make sure the app should send the right information, don't send that when user just stack the application and will come back in a minute or so. it should be only when there is no internet connected.] In the essence, the connection would be of two types {1. active 2. ideal- help to build last seen too} when connection is ideal, we can utilise the connection for other chat and as soon as the user receive a message then we'll either create a new (if that connection goes off) or use the same. This is trade off you see, making a new connection is costly but keeping lot of the ideal is also poor. incase , if we mark offline the user who is connected to internet, we'll lost real time communication.
If first time client connection is pull, will that not create an overhead as client can connect disconnect too frequently due to unstable internet connection or is that acceptable?
no,2 makes sense when queues are in place or messages themselvs are persisted somewhere in database (so that a select can be performed). for no.1 a server box likely can only handle a few millions at most, so for whatsapp scale, several tens of gateways will be needed.
@@shubhimohta8488 that should be fine, 1. for most users, connections are there for some time. 2. the payload over a pull is generally not very big and is acceptable .
Hey man, nice video, but there's a mistake: long polling is not polling every minute to get updates. Long polling is when you keep an HTTP request open (think: spinner in Chrome while the page is loading) forever, until the server decides to respond. This in effect makes it near real-time. It's one of the ways websockets work.
I was going to mention the same. Major problem in long polling is that if the amount of data on server is too much then you need to keep creating the new paginated requests. But otherwise for small push messages long polling works well.
Hi Gaurav, thank you so much for these videos. You're the real deal :-) I was trying to dig in deeper (apologies if these are trivial queries). I was trying to understand the handshake between your gateway and the Sessions Server. Let's say Person A is chatting with Person B. Step 1: Person A hits the gateway (a reverse proxy) that will do most of the heavy lifting (auth, TLS termination etc) and detect the websocket handshake. Step 2: The gateway forwards the request to your Sessions Server(which is your websocket server) Step 3: The Sessions server receives the request and sends back a reponse HTTP/1.1 101 Switching Protocols Upgrade: websocket. My understanding goes a bit gray from here (please also correct if something is a miss in the above three steps) Do all further communications between A and the the proxy now use websockets and then the same protocol to hit the sessions server? Hence, the client calls something like - wss://www.whatsapp.com/socketserver and hits the reverse proxy, and that forwards the request to the sessions server? or am i missing something. If this is in an application like slack... then it would switch between wss and http based on the service it's trying to invoke (say chatting vs creating a channel)
well narrated and presented. never knew whatsapp and tinder has so much thing at backend. interesting when you ask someone for a coffee. and she accepts the coffee invitation. both are important sugar in coffee and web sockets and the micro services. lol. Hats off. truly brilliant .
I watched other videos on messaging systems and this video has a very detailed approach, I was hunting for videos just to know how to have multiple servers for socket processes when the client is a mobile app and I feel this video has helped me a lot with that.
Wow Gaurav you should tap your back for being so popular. Lyft HR in USA forwarded this link of yours as an example to get acquittance to System Design interviews. Kudos on this achievement.
Good video! Would have liked even more if you had included following points as well becuase I felt lost there. 1. How does the gateway manage the TCP connections for each user? Is it in memory or in external store? What happens to the clients connection and messages when the gateway goes down? 2. How does the session service talks to gateway? Is it using RPC or message passing etc?
1. A map of userId and its corresponding connection object is stored in the chat server's (gateway) memory/RAM. If a gateway goes down then all of the online clients connected to the effected gateway will loose their websocket connection, but quickly they will make a new connection to the another functioning gateway.
I visited the Uber Hyderabad office the other day for an event. Thought of meeting you. Never had a chance to reach out hope I will meet you someday. Thanks a lot for your amazing videos man. They help a lot.
@@shobhitsingh7735 i got other thing in priority last year. but now i will have microsoft interview for sde2 in 1/2 weeks......... jesus.....so now i am reviewing everything...
You did a great job of listing the microservices and interaction between them. One thing which I would like to know more is the database design. How do we maintain a system with so many chats each with thousands of messages growing over time.
Once WhatsApp successfully sends a message to B, they will delete it from their end and save whatever to your GDrive / iCloud. They do not store billions of messages. N.B: I don’t work in whatsapp.
A moderately sized server can maintain around 25k-35k simultaneous TCP connections, so a single server can serve about 25k-35k users (assuming that every user uses just one client). And to handle the ever-growing user base there no option other than adding more physical servers.
I really like the way he smiles and teaches. I know nothing about systems but that smile alone made me comfortable to watch the video......keep the same passion my friend......
A critical piece that I feel is missing here is delivery of messages when the user is offline. You might not have an active connection to all users at all but you do need to deliver all the messages that were sent to them once they are connected.
+1 I was asked this question in an interview a while back. First question when I'd done simple send receive was "What If user B is offline and not in session".
I'm not into programming, though I appreciated this video. I'd really like you to publish a similar lesson, but for Telegram. Thanks in advance, and keep up the good work.
Wonderful video which covers all the technical concepts involved. This is incomplete without any resource estimates like storage/bw and what kind of stores to use like a Key/Value store like etcd will work etc.
Hi, you mentioned how we determine online status, what if the user opened the whatsapp but hasn't done any activity post that on whatsapp, just sitting idle [how we determine online status as per the discussion], but still online.
Great Video, Helpful !! I feel the 'Last Seen' feature could be improved (based on the steps described in the video). The only shortcoming I see with the current approach is, 'What if the user replies a message (i.e an action taken by the user) from the notification directly without opening the application !!'. One approach could be if a flag is sent in the request { 'fromNotif': true } that could be used to determine to decide if the Last_Seen table needs to be updated. Another could be to use the WebSocket connection itself and using the disconnect callback (once the user gets out of the application, the WebSocket connection can be disconnected). Push Notification can be used to update the message count once the user is outside an application and more approaches could be formulated further. Nevertheless, your video does help for beginners in system design.👍
Hi Gaurav , fantastic architecture design and explanation . But, for group massaging, "session" micro service is calling another "Group" micro service , which is kind of http chaining and potentially example of anti-pattern. Can't we implement messaging mechanism like "distributed Kafka" to make it more decoupled. Just a thought of mine. Let me know your take on this.
Would be great if you have another video to break it down feature wise and why these were the final choices in terms of languages/tools/concepts used to implement all this. Maybe helpful for people coming from non-backend development
@@gkcs Concepts of system design are in place. I saw that playlist. What I mean is if one has to implement this architecture. We have to think which features in in what languages. Should this part be in Python/Node. Shall we consider gateway to database communication in Golang routines or Java etc... kind of questions.
@@kushal1 Aren't those questions, too vague? I have never seen a system design interviewer ask them. At the end of the day, I may like everything implemented in binary opcodes for efficiency, but I have to perform as per the deadlines set by the product.
@@gkcsWow! Someone is rude now. I never mentioned from interview perspective. It was a suggestion to dig into tech stack and make another video maybe. Thanks anyway for your content.
@@kushal1 Sorry if I sounded rude, it wasn't my intention in the slightest. Replying to comments all day has taken off my tact perhaps. I'll look into this a little more and get back to you. Digging into the tech stack will certainly be of use :)
QQ: if you try to keep gateways light and delegating all complicated logic and memory usage to other services, you will still have to scale them per number of users/messages. So either there should be some discussion around batching between gateway and sessions or you can just scale the number of gateways and keep some logic there (for example parsing).
Great Video, Really liked this one...!!! Just wanted to ask that for the last seen timestamp wouldn't it be better if the client sends a request(Or should I say message!) to the server when the user exits the messaging app(maybe when he exits the main app activity)., Wouldn't it save a lot of overhead for the last seen timestamp(Not taking into consideration the online stamp)?
@@gkcs I didn't know anything about system design until I started watching your videos. Starting to get interested in system design because of your effort.
Finally found this kind of tutorials. Thank you very much very good explanation. But one thing i have not understood is Gateway. I know what does it do but dont know what is it :)
Thanks for your effort! It's great to learn lots of things from your video. But I still have a question after watching this video. Why consistent hashing can help to reduce duplicate information (on 20:57 explaining the group service)? Isn't it just a way of knowing which record should be found on which server?
@Gaurav: Man I envy you, you are a gem explaining the complex things. I have 1 basic question: As we are using websockets for communication, don't we require not million but thousands of servers to connect billions of users ?? If you can help me understand this part can help a lot.
I don't think WhatsApp uses WebSockets. A traditional method to scale this kind of application is using telecom technologies such as Jabber or core ErLang.
I also am confused here. If A and B are both connected to the server, is the websocket connection direct between A and B or is it still going through the server?
Having sat in FAANG interviews both as an interviewer and as a candidate, I can say that this is not going to hold up for someone interviewing for a senior position.If you have less than 5-7 years of experience, you may get away with some hand waving, but for 10+ years, we want to see more depth. Some of the users have already brought up questions like delivery of offline messages, which is a basic feature missing. Apart from that, there are some crucial things missing in this video: 1. **API**: You want to discuss basic API (like REST) design, verbs supported, payload etc. You don't need to spend 30 minutes creating a JSON schema, but mention the endpoints and what's being sent and received. This matters for latency, and for some advanced things like security, API versioning, bandwidth usage etc. 2. **Back of the envelope calculations**: This is important for scalability, availability and storage. How many daily active users? Is there a limit to the message size? (you probably don't want people dropping 5 GB pirated movies in a group). How many gateway servers do you need? How long are the messages stored? Do you need to shard the DB? What is your partition key, and how do you deal with hot partitions? 3. Most importantly, **offer alternative design options and discuss trade offs**. If you can only talk about one thing, then you most likely learned it from some blog/video/whatever. There's no perfect solution in system design, so, we want to know you can think out of the box, and why you are choosing to go one way or the other.
thanx Abhijit! This comments gives me an idea of what else to think of rather than taking a fixed approach. If you could suggest some resources/or just general advice to me for preparing sys design, i'd really appreciate it. (I've 2 yrs of exp )
Hey Gaurav - thanks so much for putting up such quality content! You're truly a wizard.. I've been learning so much about system design purely from your videos. I was wondering though, where exactly should I be placing the message queue in this example? Would it be after the sessions microservice confirms with the group-service which users are part of the specific group that the message is directed towards?
7:15 "This can't be done with HTTP". Actually this is wrong, it's only true for HTTP1, but it is possible using HTTP2 using a mechanism called server push: en.wikipedia.org/wiki/HTTP/2_Server_Push
Your system design videos are excellent! very easy to understand when comparing to others. It would be great if you explain multiple recursion(more than one recursive call in a function). I've reffered some online resources but It looks more confusing. I know you could make it easy to understand :)
You can watch more system design videos here: interviewready.io
Gaurav, there is one thing you missed discussing here. How is the service-to-service communication bidirectional here. Gateway and Sessions are also peer-to-peer. But perhaps, that can achieved via gRPC, or again through web-sockets, or asynchronously through an event bus.
Great if you can clarify.
@@architkapoor2503 yeah, same from me, how to connect between gateway and session service in bidirectional? we are having multiple session service instance
Hi Gaurav, I have been a developer for 43 years and a system architect for 30. As a very experienced professional all I can say is: good work ! You are the real deal.
Thank you Louis! 😁
nice
@@BackToBackSWE r u real?
@@jawwadismail9419 Apparently Yes.
43 years..!!!! wtf..which language did you use..what kind of developer you have been..? World wide web just came into existence on 1989.
I learn more on TH-cam than school !!
Yey!
Learn from pune University
It is very tough
Very difficult subjects
Lately i started seeing so many posts all over saying "I learn more on youtube than school/college" No, need to demotivate young generation. They cannot teach you system design in school, this is specialization, not general subjects like PCM. Even in kinder garden, they teach kids about movement of hands to write, ultimately at home parents have to help kids in practice. We cannot expect nursey schools to teach kids writing.
That is a fact. I said the same 8 years ago in some countries the quality of contents in university is really poor, so when you discover youtube is a tool for learning you agree with this sentence
I would say by now that is true for me as well and I have a college education
I don't know anything about system design or even what it is, yet I clicked the video and watched it full. Just understood everything, great teaching skills :)
i'm interviewing this week for systems design for the first time in my 6 year career. this isn't part of my job. but here i am, learning an irrelevant (to me anyway) skill to be better at tech interviews. thanks for sharing!
I've gone through all of your system design videos in preparation for my interview - I have found them so useful and easy to understand. Thanks for taking the time to put these together - your sense of humour is also great :D
Thank you!
Hey, just wanted to clarify a bit about long polling.
Long polling is actually more of a request from the client to the server and instead of the server returning a response immediately, the server is waiting for a change of some kind, and when there is a change, the server returns the response finally to the client.
It was more common back in days before Websockets were introduced and adopted by browsers around 2011 roughly.
Excellent video - as a former data architect from JP Morgan Chase looking to expand their knowledge for social applications - excellent.
Thank you 😁
This was great. A minor nitpick, this is not what peer to peer means although it can be confusing. Peer to peer would mean apart from one time registration, the connection never touches your server and instead is established directly between 2 clients(friends on whatsapp). Also HTTP now has server push and whatsapp actually uses XMPP(slightly modified).
@@asurakengan7173 Hi bud. If HTTP supports server push, does it mean we no longer need WebSockets out there?
You're really underrated in the developer circle. People need to watch all this a lot more than programming basics.
He doesn't have enough videos .plus partial knowledge. Although he us very good
I really amazed by how good you are at explaining complex concept in such a simple way. I can just keep watching all your video. Great job! ❤️
Thanks!
occasions like good morning in India putting a lot of pressure on the servers🤣🤣
Hahaha!
You forgot about forwards that must be forwarded within 3 seconds :P
Hahahaha 🤣🤣
Hahaha
Bwahahahahaha!! 😂😂
I'm from a non technical background and this video was super helpful and clear. On a side note, "Check with your doctor if group messaging is right for you" - hilarious :'D
I'm happy that I know what the most important part of programing is which is what you just showed, and not the writing part. It's the pen and paper part that matters.
This really satisfies my superiority complex
Lol 😛
TH-cam recommendation, thank you so much for this gem
I personally felt this is too abstract however it's good place to start. There are multiple scenarios which is not handled and it's assumed that it's happy case. I request to go slight deeper in each feature. The design explained in the video is "common sense", any software developer would have think of it. But I really request if you can answer some of the below questions which interviewer can ask us during the interview, if we would have explain the same.
1) You first explained we would store the mapping of each client to the gateway box. But there will be multiple clients will be connected to same gateway. let's say A is on gateway no 1 and B & C is on gateway no 2. then let's say A sends message to B. How would gateway-2 know where we need to route the message for B and C. My hunch is gateway will be storing this information.
2) I really wanted to understand when you will use Gateway vs Load Balancer. In your all videos somewhere you have used gateway and some places you used Load Balancer. You should explain why gateway over load balancer.
3) How are you maintaining the sequence of the message is not explained, this should be a very basic message. A and B can see different view of the message.
4) Why do we need message queue to store the failed message? IMO we should maintain 3 DBs. Sent, Delivered & NotDelivered with userId as key and Map. This will be easier. Whenever A sends message and chat server receives the message you can probably initiate two thread where one thread store message in NotDelivered table and the other thread calls SessionService to fetch the B's gateway box and send the message. Once B send acknowledgement then we can remove message from the NotDelivered and put in Delivered table. You can delete the message from the DB and archive the message in the low cost storage like S3 after 30 days or data is backup. Generally it happens every day night. The advantage of this is let's say B is offline then we do have store all the message that B got from all user's in B's contact list in Not Delivered table. Once B turns internet on their app will send long polling request and then can easily fetch all the data. Is this good design? I might be wrong but this is my thought.
5) You should mention what data we can cache and at which layer.
6) I think you should be more clear when your connection will be active, when it will not. I think if someone has opened whatsapp application or as long as it is there in app background stack the connection should be open. Once we remove the application from the background stack then connection will be lost. Once the connection is lost then your application will continue to send the long polling request until internet is on at certain interval. This kind of arch will have benefit of both websocket and long polling. Once internet is off it can't do even long polling.
7) What type of database and what are the data we are storing. I read in grokking system book that we can use Cassandra (wide column family) But I didn't understood why. So What I am suggesting if possible rather than making too much abstract information, make videos little more systematic it can help us. I think there are multiple counter questions interviewer can ask. why to choose this database over other. what data you would cache, which cache policy would you use (write back/around/through).
8) Here API gateway is just routing the chat server right?
It's very difficult to put all the information together. But making video more systematic would certainly help me and others. You can have define structure while making system design video.
at least following structure might help better. If you can even explain 5 mins for each of the section could be very helpful to all of us doing system design preparation.
1) Requirement
2) Define constraint/limitation/ data constraint like no of server, Incoming/outgoing data/memory storage
3) High level design for each feature
4) API Gateway/ Load Balancer
5) Data layer what data you will be storing
6) Caching
7) Sharding
good observations
@@mitgundigara466 yes, indeed. It is too high level. I wish the interviewers are satisfied with such explanations.
@@wonderstruck. I also want to add Cassandra is a good choice here because this system has usually a 1:1 read/write ratio and Cassandra is quite optimized for write heavy systems
Agreed. This approach is way too broad for an interview setting. Perhaps it's because it was made 5 years ago, where the bar for both content and interview isn't as stringent as today although the first principles are the same.
Age and experience doesn't matter to achieve, you proved it bro
These videos clearly shows how far away are university syllabuses from the real-world problems
4 years ago and your comment is still valid today.
And here I am, looking at some guy teaching IT things I should probably know already... Thanks :D
One thing I really like about your video is that you always tried to explain things from the client side, which makes it highly understandable for us as clients in our daily life. Thanks for your video!
I found your channel randomly and I was so grateful, thank you so much.
Great video, this kind of stuff is not easy to learn and not enough resources out to explain. Thank you for your great efforts
Glad to help 😁
Good work. There are a few important topics worth covering though.
1. What happens to a message sent to an offline user.
2. What choices of storage do we have ?
3. What kind of load + image/video messages ?
I believe the 1st point is already covered by him, the message is stored in queue/db.
1. All messages are stored in the database. Once the receiver comes online, receiver's client application checks with the chat server if the receiver has any pending messages that he/she should receive. If there are, then those messages are delivered to the user's client application via websocket.
2. A wide column database like HBase will be good for this use-case (i.e. lots of small messages with very low relation).
3. I didn't understand what you mean here. For storing images and videos we should make use of CDNs.
for no.1 , the answer depends on how you "persist" the messages. In case of queues, a pull request can suffice as all messages in the queues are "pending to be delivered". The trouble (if it's indeed a trouble) would be how you should manage such a big amount of queues (e.g. one queue per friend pair). however if messages are stored in db, there has to be information like "what's the oldest message that client A hasn't received yet ?". this is so that message order can be retained, And more questions come like how sharding/partitioning is done...etc
12:43 Now I understand why my gf would still show online for sometime even after we have sent goodnight messages. 😂😂
one of the best system design video i ever seen
Thank you!
I wish there was a 100 like button for the video, thanks Gaurav. You've been an inspiration for me in cracking Amazon in 2018 placements. :)
Congratulations Ashish!
Thanks for the feedback 😎
P.S. You could be a channel member now if you like 😉
It's explained by me here :D
gkcsblog.wordpress.com/2018/11/06/gaurav-sen-channel-memberships/
I only had one or two serious system design interviews till now and the face expressions of the interviewer really put you off sometimes. I have never seen anyone explaining system design like this, it's good to know that I'm not very far off.
I love you man, I used this to get an offer in my interview today.
Congratulations!
u wrote same thing under facebook system design video
@@vikrant4666 whatsapp is owned by fb, right?
Finally an Indian tech/engineering related youtuber who can speak English! But more importantly a very well explained video. Keep it up!
That's quite a demeaning comment b888, considering English is rarely our first language.
Thanks for the compliment though 🙂
Great video! About the 'last seen' feature: I don't think the Last Seen is based on the last activity (e.g. sending a chat) but rather when the user closed the connection to whatsapp. So in my opinion, as soon as 1) the network is interrupted or 2) the user closes the application, the 'last seen' should be updated.
It should be updated as well when the user comes online and every 10 or 15 secs. So simple health probing from the app to the gateway should work
With this video I got interest in the topic of system design completely,even I don't know what it was before...
Glad to hear that!
Hi Gaurav, Great video
few suggestions;
1. Keeping too many gateway would kill you latency [or may be I'm overthinking] but if we have some services which basically authenticate and establish the web socket connection, there after other services which work as mediator and communicate back-n-forth with client then we can simply reduce the number of gateway we need. [ essentially the gate would do this, but we don't need as many gateway as client are there; imagine 50 million clients at a time on whatsApp]
2. We should not stick with only push mechanism, where you said that "server will keep trying to to send the message to B" instead here it should be pull technique should be use. What I. mean is, once a user successfully establish the connection to Chat server, it AsK "do you any thing for me" and server response back (through queue technique so that message order are persisted ) and then after its all push technique until same thing happen for different user
This way we keep server doing less and redundant work, to keep checking. Again think about 50 million user there. on Percent, 0.01% [5K users at least] of them chat each other. then there would be very high number of user might not be online, and for that server keep trying.
3. We don't need to keep checking in server database for where the user is, we should cache them. That may be 50 million entries at max at a time.
As soon as the user goes off, remove him [ make sure the app should send the right information, don't send that when user just stack the application and will come back in a minute or so. it should be only when there is no internet connected.] In the essence, the connection would be of two types {1. active 2. ideal- help to build last seen too} when connection is ideal, we can utilise the connection for other chat and as soon as the user receive a message then we'll either create a new (if that connection goes off) or use the same.
This is trade off you see, making a new connection is costly but keeping lot of the ideal is also poor.
incase , if we mark offline the user who is connected to internet, we'll lost real time communication.
For point 2 - Notification service and chat-service will be push, whereas, first time client connection would be pull.
If first time client connection is pull, will that not create an overhead as client can connect disconnect too frequently due to unstable internet connection or is that acceptable?
no,2 makes sense when queues are in place or messages themselvs are persisted somewhere in database (so that a select can be performed).
for no.1 a server box likely can only handle a few millions at most, so for whatsapp scale, several tens of gateways will be needed.
@@shubhimohta8488 that should be fine, 1. for most users, connections are there for some time. 2. the payload over a pull is generally not very big and is acceptable .
Thank you Gaurav, you cant imagine how much this video helped me last year.
Thanks Alvaro!
Hey man, nice video, but there's a mistake: long polling is not polling every minute to get updates. Long polling is when you keep an HTTP request open (think: spinner in Chrome while the page is loading) forever, until the server decides to respond. This in effect makes it near real-time. It's one of the ways websockets work.
He meant Ajax polling probably, a lot of people refer to normal polling as long polling
I was going to mention the same. Major problem in long polling is that if the amount of data on server is too much then you need to keep creating the new paginated requests. But otherwise for small push messages long polling works well.
Yes, if he is worried about sending messages from server to client. He can use Server Side events
long polling still requires client to send a request first, in this case websocket works the best. or a custom protocol on top of tcp also ok.
Yep, he meant to say *short* polling.
Hi Gaurav, thank you so much for these videos. You're the real deal :-) I was trying to dig in deeper (apologies if these are trivial queries). I was trying to understand the handshake between your gateway and the Sessions Server.
Let's say Person A is chatting with Person B.
Step 1: Person A hits the gateway (a reverse proxy) that will do most of the heavy lifting (auth, TLS termination etc) and detect the websocket handshake.
Step 2: The gateway forwards the request to your Sessions Server(which is your websocket server)
Step 3: The Sessions server receives the request and sends back a reponse HTTP/1.1 101 Switching Protocols Upgrade: websocket.
My understanding goes a bit gray from here (please also correct if something is a miss in the above three steps)
Do all further communications between A and the the proxy now use websockets and then the same protocol to hit the sessions server? Hence, the client calls something like - wss://www.whatsapp.com/socketserver and hits the reverse proxy, and that forwards the request to the sessions server? or am i missing something.
If this is in an application like slack... then it would switch between wss and http based on the service it's trying to invoke (say chatting vs creating a channel)
Most of the answers are in the tinder video
th-cam.com/video/tndzLznxq40/w-d-xo.html
Not deleting the comment in case someone has similar questions
well narrated and presented. never knew whatsapp and tinder has so much thing at backend. interesting when you ask someone for a coffee. and she accepts the coffee invitation. both are important sugar in coffee and web sockets and the micro services. lol. Hats off. truly brilliant .
One of the best channel to learn system design. Good work.
interview: "ya I'd have a lst seen microservice". Reality: "ok just add that as a function in your monolith we don't have time for a new service"
Exactly! I've seen this happen in real life 🙈😂
He is a master of block diagrams
I watched other videos on messaging systems and this video has a very detailed approach, I was hunting for videos just to know how to have multiple servers for socket processes when the client is a mobile app and I feel this video has helped me a lot with that.
@@SK18459 you both didn't post anything yet
First minute into the video, already subbed and clicked the bell icon
Thank you Sir. I moved from India to San Antonio as new college grad. Amazon offering me 500K per year. Excited. Thanks for video.
Wow Gaurav you should tap your back for being so popular. Lyft HR in USA forwarded this link of yours as an example to get acquittance to System Design interviews. Kudos on this achievement.
Yeah!
@@gkcs Quit your Tech job and become professor in IIT. Students need a positive smiling professor like you :)
Yet another great video on system design! Keep up the good work, also now I can watch in 1080p YAY!
Yey!
TH-cam is such an amazing platform. Thanks for the video
Good video! Would have liked even more if you had included following points as well becuase I felt lost there.
1. How does the gateway manage the TCP connections for each user? Is it in memory or in external store? What happens to the clients connection and messages when the gateway goes down?
2. How does the session service talks to gateway? Is it using RPC or message passing etc?
Same question
1. A map of userId and its corresponding connection object is stored in the chat server's (gateway) memory/RAM. If a gateway goes down then all of the online clients connected to the effected gateway will loose their websocket connection, but quickly they will make a new connection to the another functioning gateway.
I visited the Uber Hyderabad office the other day for an event. Thought of meeting you. Never had a chance to reach out hope I will meet you someday. Thanks a lot for your amazing videos man. They help a lot.
Thanks Sharik!
This video is awesome! Hoping I can find a good job after months of watching ur videos again and again! Thx! U r amazing!
Thank you!
Have you got the job?
@@shobhitsingh7735 i got other thing in priority last year. but now i will have microsoft interview for sde2 in 1/2 weeks......... jesus.....so now i am reviewing everything...
@@大盗江南 got placed buddy ?
Yours are some of the most helpful videos I've come across. Thank you for doing such a wonderful job.
Thank you!
Brilliant communication skills and very good English accent... great job keep it up
I have hit the like button. I have subscribed. You are awesome!. So happy I found your channel.
thanks for making our life more simple with release this, so we can skip CRUD and start build whatsapp for first implementation
Dude! As a beginner, I'm mind blown! Subscribed!
You are god for beginners man ❤️
Amazing content, kudos from Brazil !!!
Thank you!
finally someone who is making unique content. Much love to u brother.
You did a great job of listing the microservices and interaction between them. One thing which I would like to know more is the database design. How do we maintain a system with so many chats each with thousands of messages growing over time.
Once WhatsApp successfully sends a message to B, they will delete it from their end and save whatever to your GDrive / iCloud. They do not store billions of messages. N.B: I don’t work in whatsapp.
A moderately sized server can maintain around 25k-35k simultaneous TCP connections, so a single server can serve about 25k-35k users (assuming that every user uses just one client). And to handle the ever-growing user base there no option other than adding more physical servers.
I really like the way he smiles and teaches. I know nothing about systems but that smile alone made me comfortable to watch the video......keep the same passion my friend......
A critical piece that I feel is missing here is delivery of messages when the user is offline. You might not have an active connection to all users at all but you do need to deliver all the messages that were sent to them once they are connected.
maybe store the message in the db, and when client status changes retry
+1 I was asked this question in an interview a while back. First question when I'd done simple send receive was "What If user B is offline and not in session".
Bhaiya ye video ne meri placement lagva di, thank you so much
I'm not into programming, though I appreciated this video. I'd really like you to publish a similar lesson, but for Telegram. Thanks in advance, and keep up the good work.
I'll check out Telegram and see if I can make time for it 🙂
Wonderful video which covers all the technical concepts involved. This is incomplete without any resource estimates like storage/bw and what kind of stores to use like a Key/Value store like etcd will work etc.
Hi, you mentioned how we determine online status, what if the user opened the whatsapp but hasn't done any activity post that on whatsapp, just sitting idle [how we determine online status as per the discussion], but still online.
Found TH-cam recommendations today. Great video indeed.
Great Video, Helpful !!
I feel the 'Last Seen' feature could be improved (based on the steps described in the video). The only shortcoming I see with the current approach is, 'What if the user replies a message (i.e an action taken by the user) from the notification directly without opening the application !!'. One approach could be if a flag is sent in the request { 'fromNotif': true } that could be used to determine to decide if the Last_Seen table needs to be updated. Another could be to use the WebSocket connection itself and using the disconnect callback (once the user gets out of the application, the WebSocket connection can be disconnected). Push Notification can be used to update the message count once the user is outside an application and more approaches could be formulated further. Nevertheless, your video does help for beginners in system design.👍
Bang on! the tech depth, the teaching style. extra +1 for making the learner think. Great video. Thanks Gaurav
😁
How does Telegram can function having lakhs of members in each group?
Making educational video in a happy mode desires a big thumb up!
After watching this.... I am highly interested in system design
Within first five minutes of the video I figured it out, this is going to be a real informative video and subscribed at the end. Great job!
Hi Gaurav , fantastic architecture design and explanation . But, for group massaging, "session" micro service is calling another "Group" micro service , which is kind of http chaining and potentially example of anti-pattern. Can't we implement messaging mechanism like "distributed Kafka" to make it more decoupled. Just a thought of mine. Let me know your take on this.
Very clear narration and provides a high level overview
Would be great if you have another video to break it down feature wise and why these were the final choices in terms of languages/tools/concepts used to implement all this. Maybe helpful for people coming from non-backend development
Have a look at the playlist mentioned in the description. Each feature is explained with alternatives.
@@gkcs Concepts of system design are in place. I saw that playlist.
What I mean is if one has to implement this architecture. We have to think which features in in what languages. Should this part be in Python/Node. Shall we consider gateway to database communication in Golang routines or Java etc... kind of questions.
@@kushal1 Aren't those questions, too vague? I have never seen a system design interviewer ask them.
At the end of the day, I may like everything implemented in binary opcodes for efficiency, but I have to perform as per the deadlines set by the product.
@@gkcsWow! Someone is rude now.
I never mentioned from interview perspective. It was a suggestion to dig into tech stack and make another video maybe. Thanks anyway for your content.
@@kushal1 Sorry if I sounded rude, it wasn't my intention in the slightest. Replying to comments all day has taken off my tact perhaps.
I'll look into this a little more and get back to you. Digging into the tech stack will certainly be of use :)
Gaurav! Thank you for uploading these videos. System design made easy !!
Gaurav do you have any tips for uber interview? I have been invited to Onsite for the final round here in Chicago office! Aoladiy@gmail.com
dude you are the man.
Great video Guarav, thank you.
QQ: if you try to keep gateways light and delegating all complicated logic and memory usage to other services, you will still have to scale them per number of users/messages. So either there should be some discussion around batching between gateway and sessions or you can just scale the number of gateways and keep some logic there (for example parsing).
Today,I found it as TH-cam recommendation.It really a grate video👍👍. Bhaiya please make the same series for the zomato like app..
great stuff man,
-16 yr old
Keep making such vids :)
This man is a legend!
Great Video, Really liked this one...!!!
Just wanted to ask that for the last seen timestamp wouldn't it be better if the client sends a request(Or should I say message!) to the server when the user exits the messaging app(maybe when he exits the main app activity)., Wouldn't it save a lot of overhead for the last seen timestamp(Not taking into consideration the online stamp)?
That's a really good idea...it should save a lot of bandwidth in general.
It would also be quite accurate. Nice point!
@@gkcs I didn't know anything about system design until I started watching your videos. Starting to get interested in system design because of your effort.
Nice idea..well thought..!!
There might be chance that user lost internet connectivity before exiting app
why don't you create one watshapp like app when you know the design and overtake it ?
Great system design discussion, thank you Gaurav!
You are amazing . love from Pakistan
Love from India :D
@@gkcs Thanks brother
@@ff20e03bbc Shut Up :)
I love your educative system design course announcement. Will buy it.
Great!Thank you. Next topic should be "Service discovery"
Finally found this kind of tutorials. Thank you very much very good explanation. But one thing i have not understood is Gateway. I know what does it do but dont know what is it :)
Try the microservices tutorial video on the channel. It talks about the gateway service with code :)
Doctor recommended not to watch the group messaging feature design.
But i watched that and now Corona is having one to one chat with me . 😂😂
Hahaha 🙈😂
Hey, man first time watching your video but just want to say your explanation is awesome. Appreciated!
Thank you 😁
Thanks for your effort! It's great to learn lots of things from your video. But I still have a question after watching this video. Why consistent hashing can help to reduce duplicate information (on 20:57 explaining the group service)? Isn't it just a way of knowing which record should be found on which server?
Probably for vertically partitioning table?
Great job! Very helpful. You are a great teacher
@Gaurav: Man I envy you, you are a gem explaining the complex things.
I have 1 basic question: As we are using websockets for communication, don't we require not million but thousands of servers to connect billions of users ??
If you can help me understand this part can help a lot.
I don't think WhatsApp uses WebSockets. A traditional method to scale this kind of application is using telecom technologies such as Jabber or core ErLang.
@@deepbrar6152 Yes Whatsapp uses EJabbered which is implemented in ErLang
I also am confused here. If A and B are both connected to the server, is the websocket connection direct between A and B or is it still going through the server?
@@adamhughes9938 I think the websocket connection is between both A and the server and also between B and the server.
Thanx for the Detailed Video . Really cleared my basics queries.
Having sat in FAANG interviews both as an interviewer and as a candidate, I can say that this is not going to hold up for someone interviewing for a senior position.If you have less than 5-7 years of experience, you may get away with some hand waving, but for 10+ years, we want to see more depth. Some of the users have already brought up questions like delivery of offline messages, which is a basic feature missing. Apart from that, there are some crucial things missing in this video:
1. **API**: You want to discuss basic API (like REST) design, verbs supported, payload etc. You don't need to spend 30 minutes creating a JSON schema, but mention the endpoints and what's being sent and received. This matters for latency, and for some advanced things like security, API versioning, bandwidth usage etc.
2. **Back of the envelope calculations**: This is important for scalability, availability and storage. How many daily active users? Is there a limit to the message size? (you probably don't want people dropping 5 GB pirated movies in a group). How many gateway servers do you need? How long are the messages stored? Do you need to shard the DB? What is your partition key, and how do you deal with hot partitions?
3. Most importantly, **offer alternative design options and discuss trade offs**. If you can only talk about one thing, then you most likely learned it from some blog/video/whatever. There's no perfect solution in system design, so, we want to know you can think out of the box, and why you are choosing to go one way or the other.
Yeah there is A LOT missing here. Like.. A LOT
thanx Abhijit! This comments gives me an idea of what else to think of rather than taking a fixed approach. If you could suggest some resources/or just general advice to me for preparing sys design, i'd really appreciate it. (I've 2 yrs of exp )
Analogy is a great way to understand new things
Hey Gaurav - thanks so much for putting up such quality content! You're truly a wizard.. I've been learning so much about system design purely from your videos.
I was wondering though, where exactly should I be placing the message queue in this example? Would it be after the sessions microservice confirms with the group-service which users are part of the specific group that the message is directed towards?
sir thank you very much for this clear explanation. Please keep the hard work!
7:15 "This can't be done with HTTP". Actually this is wrong, it's only true for HTTP1, but it is possible using HTTP2 using a mechanism called server push: en.wikipedia.org/wiki/HTTP/2_Server_Push
Great one! Thank you for these videos. I'm in class 11 now, and I learn a lot from these videos.
Wow, awesome time to start!
Your system design videos are excellent! very easy to understand when comparing to others. It would be great if you explain multiple recursion(more than one recursive call in a function). I've reffered some online resources but It looks more confusing. I know you could make it easy to understand :)
I'll add this to my list of videos to do 😁
Wow! Thanks Gaurav 😊
of course Teaching is the best way to share knowledge , but I want We must have our own messaging app