The first option where peer to peer webrtc happens is called a STUN configuration and the second option is called TURN server setup. All the complexity of the STUN or TURN is simplified by using ICE protocol which gathers candidates who need to be connected together and decides the best way to establish connectivity. STUN is particularly very clever in how it navigates the NAT.
@ Yes absolutely. STUN is a protocol to enable two hosts behind NATs talk to each other directly. First one of the peers sends a test udp packet to a STUN server, which just responds back the packet’s source IP and Port as it sees. This is the peer’s public IP and port for udp traffic. Now this peer shares this detail with the other one via any means (other protocol). This other one now can send udp packets to this public ip and port and your router/NAT automatically forwards these packet to you (actual peer’s host). Now you can see the other guy’s public source ip and port, where you can send packets and complete the handshake. This works for most NATs like home routers and all. However if you are behind a hard NAT where it may send and receive to different sites from different ports, it will not work, because the STUN server will see a different source port while it will be different for the actual peer. That’s where we turn to TURN.
Video bridge is an essential component in relaying a/v, earlier a single conference was supposed to be hosted on a single video bridge, this caused high latency and system overload, this was solved using octo relaying, where in video bridges could intercommunicate and the client can connect nearest video bridges reducing latency.
If participants count is less then i think they use the mesh aechitecture but as the count grows higher, i believe zoom switches to SFU or MCU again depending on the load
how does it work at network level? Generally, One computer on the internet can't make a request to another computer on the internet due to firewalls and all, Especially on Mobile networks where you can't even open a port to the ingress. Services like VPN, proxy and NGROK still work as a middle-man to take data from one machine and send to other...
So if 100 users are connected in a meeting then each user creates 99 webrtc connections? Will that not be heavy for each user, or they connect to a single media server?
Media servers are used in case of large meetings with more than a threshold of users, I would guess that number is 4. Have a look at the WhatsApp International calling short for an idea of how multiple media servers can be used in a single call.
In case of a large conference you can restrict the number of video streams a user can receive, audio streams generally take lesser bandwidth, so they are rarely restricted. For example if I set the total video streams a conference participant can receive to 10 then if there are 100 participants then only 10 video streams will be available to the participant and those 10 streams will be decided by the video bridge based on dominant speaker stats. This is how jitsi architecture works, not sure zoom handles it.
The first option where peer to peer webrtc happens is called a STUN configuration and the second option is called TURN server setup. All the complexity of the STUN or TURN is simplified by using ICE protocol which gathers candidates who need to be connected together and decides the best way to establish connectivity. STUN is particularly very clever in how it navigates the NAT.
Don't they use STUN in multiplayer games as well?
@ Yes absolutely. STUN is a protocol to enable two hosts behind NATs talk to each other directly. First one of the peers sends a test udp packet to a STUN server, which just responds back the packet’s source IP and Port as it sees. This is the peer’s public IP and port for udp traffic. Now this peer shares this detail with the other one via any means (other protocol). This other one now can send udp packets to this public ip and port and your router/NAT automatically forwards these packet to you (actual peer’s host). Now you can see the other guy’s public source ip and port, where you can send packets and complete the handshake.
This works for most NATs like home routers and all. However if you are behind a hard NAT where it may send and receive to different sites from different ports, it will not work, because the STUN server will see a different source port while it will be different for the actual peer. That’s where we turn to TURN.
I love the squint at the end 😂 I appreciate these videos breaking down how systems were/are designed! ❤ ty! 🎉
Video bridge is an essential component in relaying a/v, earlier a single conference was supposed to be hosted on a single video bridge, this caused high latency and system overload, this was solved using octo relaying, where in video bridges could intercommunicate and the client can connect nearest video bridges reducing latency.
If participants count is less then i think they use the mesh aechitecture but as the count grows higher, i believe zoom switches to SFU or MCU again depending on the load
how does it work at network level?
Generally, One computer on the internet can't make a request to another computer on the internet due to firewalls and all, Especially on Mobile networks where you can't even open a port to the ingress.
Services like VPN, proxy and NGROK still work as a middle-man to take data from one machine and send to other...
So if 100 users are connected in a meeting then each user creates 99 webrtc connections? Will that not be heavy for each user, or they connect to a single media server?
Media servers are used in case of large meetings with more than a threshold of users, I would guess that number is 4.
Have a look at the WhatsApp International calling short for an idea of how multiple media servers can be used in a single call.
In case of a large conference you can restrict the number of video streams a user can receive, audio streams generally take lesser bandwidth, so they are rarely restricted.
For example if I set the total video streams a conference participant can receive to 10 then if there are 100 participants then only 10 video streams will be available to the participant and those 10 streams will be decided by the video bridge based on dominant speaker stats.
This is how jitsi architecture works, not sure zoom handles it.
Simulcast SFU architecture is the way to scale any webRTC application
@@quanta-o3u yahh I also thought for that😎
But web rtc is free and open source
And there's Jitsi Meet for you
@gnuMan Nice one, thanks
This is how I sail the high seas.
Nice but looks very abstract😅
Lol..