I'm currently developing an app that highly uses websockets and you have really given me a few insights to think about. Thank you so much for this valuable info 😊
I want to use replica pod Kubernetes for a websocket server. Suppose a client establishes a websocket connection with pod X. If pod X shares the websocket session information with this client to pod Y, can pod Y send messages to this client? Or is it only pod X that can use this websocket session?
Hi, the easiest way around this is to have a single pod always be responsible for WebSocket connection. You use the load balancer to ensure that a pod always has enough compute reserves to handle that connection. If you want to hydrate session state between your pods, then this is a more complicated scenario and you would need to consider either using a realtime database or a distributed source of truth.
Probably, when people ask how many WS connections can a server have - they actually mean "What is the limit of WS/other connections on LB, and what does it depend on? Is it the number of opened file descriptors? Amount of RAM? Anything else?"
I had this exact question and I KNEW (felt it in my bones) that the answer wouldn’t be so simple as having just saying “yeah Only like 100”, thanks for the insight!
If all your servers use a shared redis instance to communicate with each other, don't we just reintroduce the original problem of a single server handling all the load (defeating the purpose of the load balancer)? I see that it still helps, since non-websocket work is still distributed, but at scale, I dont see how anything is solved. Especially for apps like chat apps where the websockets carry a lot of the work. Great video though!
Thank you for helpful video! I have a question regarding horizontal scaling websicket implementation. Is it possible to create a lookup table that maps roomId, which is often used in chat applications, and server id so that users having the same roomId are navigated to connect to the same server when load balancing?
Yes, this would be a recommended design pattern, and has increased security benefits over navigating rooms and servers using naming patterns. Thanks for your question!
I guess using kafka or rabbit MQ to distribute the load coming from the business logic along with horizontal scaling can further help you achieve more scalability. Great content really enjoyed it
@@AblyRealtime Load testing using artillery will be a great topic where you not only emit events to server in a loop but also listen to the server sent events at the client side i.e, artillery
Could you please explain how you would do that ? I guess we could have a chat service that create websocket connections, and subscribe/ push to redis. New messages would be pushed to kafka. We would have another service subscribing to kafka queues dedicated to handling messages, saving them to DB and then publishing to redis Then the chat service receives this message and sends it via ws. What do you think ?
Hmm our system process was: ws => Kafka message => consumer to write message to redis and db for recovery. Another Kafka listener that would also send message to ws by looking at redis to find where client was to send back to client.
Hey there! Redis is used in this situation more as a cache, optimised brokering the messages with ultra low latency. The classic design to persist messages longterm is to have an additional relational DB as a layer after Redis (to the right in the diagram)
Amazing video....thanks But you still did not answer the question. How many active websocket connections can a an avg ec2 server hold... or please give a rough ball park estimate range .... This info can be used to decide how many servers we need right ?
Is Websocket connection is directly between client and server? If no and client have websocket connection from client to LB and LB to server, then how does LB is different than vertical scaling? I am considering my server is light weight as LB, so its resources are only used to maintain the websocket connection and passing messages to client.
The load balancer is used to ensure that you have the right number of servers provisioned, and that the load is distributed across them evenly. The server still maintains the websocket connection itself, but when the connection is being initially made the request should be routed via the load balancer, so it can find out its destination
Is it optimum to horizontally scale web socket servers with a load balancer in front of them? I might be wrong but how would the load balancer maintain a persistent connection between the client and the final node, does it create two different WebSocket connections, one with the client and another with the final node? If so wouldn't that easily choke the load balancer, or is there any way load balancer connects the final node and client?
Yes, you are right that the LB just connects the client to the final node. The load balancer doesn't itself maintain any WebSocket connections, it only routes the client initial HTTP GET request to the most appropriate node, then allowing the server to upgrade the connection to WebSocket
I think this video would be much more valuable if you could talk more details about how the horizontally scaled system works for a chat app. Everybody knows horizontal scaling is the way to go.
You have to set-up a way to provision and shed Redis instances to match scaling demands. Some use-cases will demand a Kubernetes type service to manage the instances, and others a more homegrown solution.
I have 3 instances with LB and kafka, when i send the request to kafka server return 200, but on front-end we need to send the response probably via web-socket and this event can be processed on another server. So how front-end can know to which server need to subscribe to socket if we are using LB?
In a typical setup with a load balancer, you wouldn’t communicate directly with individual server instances from the frontend. Rather, you would communicate with the LB, which would handle redirecting your requests to the appropriate instances - Alex
First and foremost it was good explanation on the complexity of WebSocket connections horizontal scaling. I wish i had seen this video before, instead of days spending on actual debugging and experiencing these failures on my own. But i do find that every video and blog about scaling WebSocket's use the same chat app example, where we need to broadcast messages amongst many clients. This is a typical use case. It would be helpful to get to know the architecture of solving these issues. Please share if any. Broadcasting is one of the challenges, there are many others like for example, pushing events to one specific client based on any key(Basically routing challenges to a single client where clients are connected to different servers in a distributed systems ). Would be helpful if any related post or blog can be referenced!! Thanks in Advance.
Hey Raj, thanks for watching! We have several articles on event-driven architecture on our blog, including this general guide with use case examples for each architecture pattern: ably.com/topic/event-driven-architecture . One of our engineers also wrote a piece on data broadcasting here: ably.com/blog/building-realtime-updates-into-your-application . These cover the basic concepts, so if you'd like to learn more about Ably's implementation specifically, feel free to check out our docs .
Alex from the video here 👋🏻 That is a good question. WebSockets are a realtime communication protocol that provides a full-duplex communication channel between client and server over a long-lived connection, meanwhile Redis is an in-memory data structure store. Sometimes confusion can arise because Redis does support pub/sub, but that mechanism is primarily designed to handle communication between your app/services and Redis. It's not suitable for realtime interaction between your server and clients (end-users). For example, you'd be hard-pressed to connect to Redis from a browser in a a sensible way but that's exactly what Websockets are designed for.
This is often not possible with phones or tablets due to constraints from Apple and Android with apps not allowing background WebSocket connections. It is not even possible to send REST https requests to apps running in the background. The only way around is to send a push notification. If the app is running in the foreground indefinitely, the socket connection can stay open.
The question of "how many maximum socket connections can one server support?" is actually not about CPU utilization, or memory usage, but rather related to how sockets are handled from network point of view, and you didn't answer this question.
Thank you for your comment! You raise a great point about the network's role in handling sockets; network architecture and protocols also significantly limit the number of connections a server can support.
@@AblyRealtime Well, not really. A number of ports is around 64+k, but it is per IP address. So one client, could open up to that many ports on a single server's port. And many clients could connect. So, technically, it's not limited from network point of view. If anything, on linux we would run out of file descriptors. I checked on ubuntu and hard limit is set to 1mil.
Surprisingly this is one of the only videos I found that actually goes into specifics of this topic. 👍
All other videos and docs are kind handy-wavy.
I'm currently developing an app that highly uses websockets and you have really given me a few insights to think about. Thank you so much for this valuable info 😊
Great to hear that and thanks for commenting. Are you going to build your own horizontally scaling WebSocket feature?
I want to use replica pod Kubernetes for a websocket server. Suppose a client establishes a websocket connection with pod X. If pod X shares the websocket session information with this client to pod Y, can pod Y send messages to this client? Or is it only pod X that can use this websocket session?
Hi, the easiest way around this is to have a single pod always be responsible for WebSocket connection. You use the load balancer to ensure that a pod always has enough compute reserves to handle that connection. If you want to hydrate session state between your pods, then this is a more complicated scenario and you would need to consider either using a realtime database or a distributed source of truth.
Thanks to you, I have a broader understanding of websockets. I really want to see a video about horizontal scaling. Thank you.
Glad to hear you liked the video, thanks for taking the time to comment and we'll keep that in mind :D
thanks, I'm currently needing to scale web sockets horizontally, so this was really helpful
Probably, when people ask how many WS connections can a server have - they actually mean "What is the limit of WS/other connections on LB, and what does it depend on? Is it the number of opened file descriptors? Amount of RAM? Anything else?"
All very good questions, thanks for sharing! Perhaps for a future video👍🏻
High quality content! Looking for a real-life tutorial on horizontally scaling web sockets
The Elixir web framework, Phoenix, solves pretty much all of these problems. The BEAM VM was basically built for this.
I had this exact question and I KNEW (felt it in my bones) that the answer wouldn’t be so simple as having just saying “yeah Only like 100”, thanks for the insight!
Super high quality content 🎉
If all your servers use a shared redis instance to communicate with each other, don't we just reintroduce the original problem of a single server handling all the load (defeating the purpose of the load balancer)? I see that it still helps, since non-websocket work is still distributed, but at scale, I dont see how anything is solved. Especially for apps like chat apps where the websockets carry a lot of the work. Great video though!
You’re spot-on, except Redis is well-suited for clustering compared to your own WebSocket server.
Thank you for helpful video!
I have a question regarding horizontal scaling websicket implementation. Is it possible to create a lookup table that maps roomId, which is often used in chat applications, and server id so that users having the same roomId are navigated to connect to the same server when load balancing?
Yes, this would be a recommended design pattern, and has increased security benefits over navigating rooms and servers using naming patterns. Thanks for your question!
@@AblyRealtime I appreciate your reply!
I guess using kafka or rabbit MQ to distribute the load coming from the business logic along with horizontal scaling can further help you achieve more scalability. Great content really enjoyed it
Thanks for the kind words, glad you liked it! Let us know if there's any other topics you'd like to see next.
@@AblyRealtime Load testing using artillery will be a great topic where you not only emit events to server in a loop but also listen to the server sent events at the client side i.e, artillery
Could you please explain how you would do that ?
I guess we could have a chat service that create websocket connections, and subscribe/ push to redis.
New messages would be pushed to kafka.
We would have another service subscribing to kafka queues dedicated to handling messages, saving them to DB and then publishing to redis
Then the chat service receives this message and sends it via ws.
What do you think ?
Hmm our system process was: ws => Kafka message => consumer to write message to redis and db for recovery. Another Kafka listener that would also send message to ws by looking at redis to find where client was to send back to client.
redis can do pub-sub and can be a DB too.
thanks for the nice video. Could you share your thoughts of choosing redis over other Dbs and would you like to persist the state data to disk ?
Hey there! Redis is used in this situation more as a cache, optimised brokering the messages with ultra low latency. The classic design to persist messages longterm is to have an additional relational DB as a layer after Redis (to the right in the diagram)
Amazing video....thanks
But you still did not answer the question. How many active websocket connections can a an avg ec2 server hold... or please give a rough ball park estimate range ....
This info can be used to decide how many servers we need right ?
really good video, high quality content
Need the scaling video! Great content of course.
Is Websocket connection is directly between client and server? If no and client have websocket connection from client to LB and LB to server, then how does LB is different than vertical scaling? I am considering my server is light weight as LB, so its resources are only used to maintain the websocket connection and passing messages to client.
The load balancer is used to ensure that you have the right number of servers provisioned, and that the load is distributed across them evenly. The server still maintains the websocket connection itself, but when the connection is being initially made the request should be routed via the load balancer, so it can find out its destination
Is it optimum to horizontally scale web socket servers with a load balancer in front of them? I might be wrong but how would the load balancer maintain a persistent connection between the client and the final node, does it create two different WebSocket connections, one with the client and another with the final node? If so wouldn't that easily choke the load balancer, or is there any way load balancer connects the final node and client?
Yes, you are right that the LB just connects the client to the final node. The load balancer doesn't itself maintain any WebSocket connections, it only routes the client initial HTTP GET request to the most appropriate node, then allowing the server to upgrade the connection to WebSocket
i liked how you just keep the details away and talk the big picture . great video for the big picture
Thanks very much Hamid 🫡
Very insightful video indeed, great work.
Thanks so much !
Nice one. Thanks
We're glad you enjoyed it
I think this video would be much more valuable if you could talk more details about how the horizontally scaled system works for a chat app. Everybody knows horizontal scaling is the way to go.
Thanks for your feedback! If you're interested, this is certainly the kind of content we'll consider delving deeper into in the future.
Could you elaborate a bit on how the Redis based approach works when scaling out?
You have to set-up a way to provision and shed Redis instances to match scaling demands. Some use-cases will demand a Kubernetes type service to manage the instances, and others a more homegrown solution.
I have 3 instances with LB and kafka, when i send the request to kafka server return 200, but on front-end we need to send the response probably via web-socket and this event can be processed on another server. So how front-end can know to which server need to subscribe to socket if we are using LB?
In a typical setup with a load balancer, you wouldn’t communicate directly with individual server instances from the frontend. Rather, you would communicate with the LB, which would handle redirecting your requests to the appropriate instances - Alex
I think you should run a test with a single mid-tier server and see where the average limit of WS connections would be.
Thanks for the suggestion! We'll keep this in mind for future videos.
First and foremost it was good explanation on the complexity of WebSocket connections horizontal scaling. I wish i had seen this video before, instead of days spending on actual debugging and experiencing these failures on my own. But i do find that every video and blog about scaling WebSocket's use the same chat app example, where we need to broadcast messages amongst many clients. This is a typical use case. It would be helpful to get to know the architecture of solving these issues. Please share if any. Broadcasting is one of the challenges, there are many others like for example, pushing events to one specific client based on any key(Basically routing challenges to a single client where clients are connected to different servers in a distributed systems ). Would be helpful if any related post or blog can be referenced!!
Thanks in Advance.
Hey Raj, thanks for watching! We have several articles on event-driven architecture on our blog, including this general guide with use case examples for each architecture pattern: ably.com/topic/event-driven-architecture . One of our engineers also wrote a piece on data broadcasting here: ably.com/blog/building-realtime-updates-into-your-application . These cover the basic concepts, so if you'd like to learn more about Ably's implementation specifically, feel free to check out our docs .
Hi, did you find solutions to the problem you stated? I'm facing the similar issues.
Why cant we use redis instead of websocket?
Alex from the video here 👋🏻 That is a good question.
WebSockets are a realtime communication protocol that provides a full-duplex communication channel between client and server over a long-lived connection, meanwhile Redis is an in-memory data structure store.
Sometimes confusion can arise because Redis does support pub/sub, but that mechanism is primarily designed to handle communication between your app/services and Redis. It's not suitable for realtime interaction between your server and clients (end-users). For example, you'd be hard-pressed to connect to Redis from a browser in a a sensible way but that's exactly what Websockets are designed for.
@@AblyRealtime Thank you for the reply. Websocket for the realtime experience.
what if your users are actually devices which need always be connected.
This is often not possible with phones or tablets due to constraints from Apple and Android with apps not allowing background WebSocket connections. It is not even possible to send REST https requests to apps running in the background. The only way around is to send a push notification.
If the app is running in the foreground indefinitely, the socket connection can stay open.
Thanks
The question of "how many maximum socket connections can one server support?" is actually not about CPU utilization, or memory usage, but rather related to how sockets are handled from network point of view, and you didn't answer this question.
Thank you for your comment! You raise a great point about the network's role in handling sockets; network architecture and protocols also significantly limit the number of connections a server can support.
@@AblyRealtime Well, not really. A number of ports is around 64+k, but it is per IP address. So one client, could open up to that many ports on a single server's port. And many clients could connect. So, technically, it's not limited from network point of view. If anything, on linux we would run out of file descriptors. I checked on ubuntu and hard limit is set to 1mil.
Ty
wow, not a single useful bit of info