I do not recommend Exponent. I signed up and paid for Exponent's system design course - it is not worth it. It's very generic and the material very shallow. This video is proof :)
Functional Requirements: although it's unlikely you would have doven into this feature, ads are a major component of youtube. I would have liked to hear it at least mentioned. Key Characteristics: using FOSA, I'd imagine the most important features for this system kind of depend on what company you are building it for. It's hard to say as this scenario is unrealistic (as a reddit post I saw cited: doing a design for a company with this much scale would have lots of review - architecture, etc). At a small company you'd (hopefully) do something less complex.
This design based on thought that all videos are single blob, but actual TH-cam preloads videos partially and each of their resolutions can vary independently.
First make sure to add the feature to remind users to subscribe to premium version, endlessly without opt out. Just kidding. Great details, very useful. Thanks.
Few things talked are not clear: - Blob is shared. ... thats little surprising.. - adaptive streaming and non-adaptive streaming.. bandwidth estimation and request for corresponding chunk is done by client not server.. - does shard by video makes good utilization of resources instead of user id?
He talks about sharding the video metadata DB and then says "Corresponding the same thing could happen to the blob storage as well". So let me get this straight, we use S3 for blob storage and then we shard it? What a load of BS!
Why? It is a valid criteria to compare. They are different simplex vs duplex, so a valid comparison to choose one vs another, right? What am I missing ?
For streaming of videos (uploaded) which we know would become popular. We default cache them in the CDN. We know that these are popular videos so we cache them and access the videos via CDN's from various regions based on geography, type of event. We would have a list of popular events like SuperBowl, TaylorSwift, Soccer matches which are popular based on region (country) and those we cache directly once the video is uploaded. Live events streaming is out of scope for this design. Do let me know if this is a good approach and what would be some changes if we went for live streaming events.
I like your approach of region and event-based caching. It should give improved performance! 💪 For live streaming, we could potentially be looking at implementing real-time transcoding, using low-latency protocols, adding origin shields, developing dynamic scaling capabilities and some security measures.
one suggestion, please do system design with experienced Architect, not engineering managers who hardly have 10 years experience and wouldn't master technology as such, people management is easy but grinding for technology is hard. please do this much favour for us and avoid such shallow contents
Got lost towards the end of the discussion when the Q was how to funnel the "Superbowl" live broadcast via this solution. Didn't understand the caching part of it. How reliable that will be?🥵
The SQL you mentioned is relational database right? It's hard to scale rds, but not possible, beside, solutions like aws aurora can take care of the scale for you. Here is an exmaple that I think rds is better than nosql in this case. When you want to know people who replies your comments(join operations), you will find out that implementing rds db is very simple and elegant, all you need to have are userDB, CommentDB, and a join can take care of the request. But if you choose nosql, like mongodb or dynamo, you have to store user info(e.g. name) into CommentDB because nosql doesn't support join. And you will run into problem when user update there profile. The name saved in the CommentDB also needs to be updated to prevent the inconsistency. Clearly it's not a good approche . Of course you can only save the user_id and do another query to mock the join operation, but why not using a solution with join at the first place?
@@yuanhengzhao4188 yes thats one of pro but i think it would difficult when you shard data which require joins have to be on same shard so I'm not sure if we can scale that way so in that case nosql is easier
That's the reason sharding logic was discussed, needed to keep in mind how the recommendation systen will use joins between rows etc. Though the answer is not satisfactory to me, bcos it can create a lot of load on shards which has users with millions of subs, more diff strategies to explore
Same thought here. User-based sharding will help in indexing only the particular creator's videos. But geographic based sharding will help in indexing, and it will be more efficient in building recommender Systems
What happens on a real product like this if you cannot afford to scale more blob storage in a mid of few years of the product, would you tell the users from now on you have limited numbers of videos or other limitations or this more business decision?
Hey DamjanDimitrioski, thanks for the question! While there could be technical solutions to slow down the need to scale, this is ultimately a business decision. If you can't afford to scale, it may likely mean that your business model isn't profitable enough, or you aren't monetising it well. The question then becomes "is it worth it to keep this product up and running?" (considering accounting profits, opportunity cost etc.) Hope this helps!
@@kumarmanish9046 yes it depends on what the server can support but technically you can upload at least 1GB or you can break the video in small files and then upload. You can leverage blob store like S3 here
@@rajsekhar28 Here web-sockets are not just used to upload vedio , he must be trying to say that he would be keeping connection alive while uploading videos i guess..
Hey bro how youtube uploads work if the video is large like 500mb or 1gb. Is the upload happens directly on frontend on s3 then link is given to backend or http call is made to backend and wait for its upload
Video compression is done and every video is split into three major tasks: Video Transcoding, Audio transcoding and metadata persistence. Talking about video transcoding, Videos are split into multiple chunks of files and a DAG of tasks is generated, workers pick parallel nodes and sequential nodes.
Is this supposed to be impromptu interview or what, this is like a basic question they ask everywhere. I have only read Alex Xu book, I would do better than that
Don't leave your system design interview to chance. Sign up for Exponent's system design interview course today: bit.ly/3qMqMyX
What tool was he using to draw?
@@harisbeg7026 It's called Whimsical! We use it across all our system design interviews
I do not recommend Exponent. I signed up and paid for Exponent's system design course - it is not worth it. It's very generic and the material very shallow. This video is proof :)
Functional Requirements: although it's unlikely you would have doven into this feature, ads are a major component of youtube. I would have liked to hear it at least mentioned.
Key Characteristics: using FOSA, I'd imagine the most important features for this system kind of depend on what company you are building it for. It's hard to say as this scenario is unrealistic (as a reddit post I saw cited: doing a design for a company with this much scale would have lots of review - architecture, etc). At a small company you'd (hopefully) do something less complex.
This design based on thought that all videos are single blob, but actual TH-cam preloads videos partially and each of their resolutions can vary independently.
First make sure to add the feature to remind users to subscribe to premium version, endlessly without opt out.
Just kidding. Great details, very useful. Thanks.
Sign up here 😉 bit.ly/2Nl5Bn5
Few things talked are not clear:
- Blob is shared. ... thats little surprising..
- adaptive streaming and non-adaptive streaming.. bandwidth estimation and request for corresponding chunk is done by client not server..
- does shard by video makes good utilization of resources instead of user id?
He talks about sharding the video metadata DB and then says "Corresponding the same thing could happen to the blob storage as well". So let me get this straight, we use S3 for blob storage and then we shard it? What a load of BS!
Which tool is he using for writing those design diagrams ? Good info by the way, thank you Exponent.
i believe it’s whimsical
Yes
Good fun, close to Kapil sharma show.
Very high level . use less for any senior engineer interview s .
Lost interest the moment he started comparing web socket with REST for uploading data.
Why? It is a valid criteria to compare. They are different simplex vs duplex, so a valid comparison to choose one vs another, right? What am I missing ?
@@kumarmanish9046 Manish Kumar vs Kumar Manish
you are right, this guy doesn't know websocket vs REST difference, when to use what.
@@joo02 You won it, bro🤣
@@joo02 a showdown for the ages
Thanks Hozefa!
Good info.Impressed with the content delivered.
How to avoid hot partition when sharping by user Id?
You shouldn’t shard by user id to avoid hotshards . But if you have to, replication and random reads will help mitigate
Is there ever a wrong answer in system design interview?
I think as long as u justify ur good to go
eventual
consistency? for what? video? user profile?
Video since its on blob storage, user data is ACID compliant
For streaming of videos (uploaded) which we know would become popular. We default cache them in the CDN. We know that these are popular videos so we cache them and access the videos via CDN's from various regions based on geography, type of event. We would have a list of popular events like SuperBowl, TaylorSwift, Soccer matches which are popular based on region (country) and those we cache directly once the video is uploaded. Live events streaming is out of scope for this design.
Do let me know if this is a good approach and what would be some changes if we went for live streaming events.
I like your approach of region and event-based caching. It should give improved performance! 💪
For live streaming, we could potentially be looking at implementing real-time transcoding, using low-latency protocols, adding origin shields, developing dynamic scaling capabilities and some security measures.
one suggestion, please do system design with experienced Architect, not engineering managers who hardly have 10 years experience and wouldn't master technology as such, people management is easy but grinding for technology is hard. please do this much favour for us and avoid such shallow contents
Got lost towards the end of the discussion when the Q was how to funnel the "Superbowl" live broadcast via this solution. Didn't understand the caching part of it. How reliable that will be?🥵
i dont why the hell people trying to focus on fake american accent rather then normal and clear speaking
Are FB manager system design interviews as easy to pass as this guy presents? I am sure an engineer will not pass on this answer.
Well, he only took ~20mins. I think he went to the sufficient level of detail given the short time for this mock session
I read that you can't scale SQL horizontally, why are we using SQL?
The SQL you mentioned is relational database right? It's hard to scale rds, but not possible, beside, solutions like aws aurora can take care of the scale for you. Here is an exmaple that I think rds is better than nosql in this case. When you want to know people who replies your comments(join operations), you will find out that implementing rds db is very simple and elegant, all you need to have are userDB, CommentDB, and a join can take care of the request. But if you choose nosql, like mongodb or dynamo, you have to store user info(e.g. name) into CommentDB because nosql doesn't support join. And you will run into problem when user update there profile. The name saved in the CommentDB also needs to be updated to prevent the inconsistency. Clearly it's not a good approche . Of course you can only save the user_id and do another query to mock the join operation, but why not using a solution with join at the first place?
@@yuanhengzhao4188 yes thats one of pro but i think it would difficult when you shard data which require joins have to be on same shard so I'm not sure if we can scale that way so in that case nosql is easier
That's the reason sharding logic was discussed, needed to keep in mind how the recommendation systen will use joins between rows etc. Though the answer is not satisfactory to me, bcos it can create a lot of load on shards which has users with millions of subs, more diff strategies to explore
What software was he using to draw?
waiting for an answer
Notion
Whimsical is the tool
@@bikkina thank you.
@@devanshgarg31 No, It's Whimscal.
i guees when sharding , the right indexer should be genre/ country of origin.. isnt it
Yes indexing by user doesn't make any sense. It should always by the most common trait of a video.
Same thought here.
User-based sharding will help in indexing only the particular creator's videos.
But geographic based sharding will help in indexing, and it will be more efficient in building recommender Systems
Correct me if I am wrong somewhere
device agnostic? better to ask - do we need mobile app or users will use only web application?
What happens on a real product like this if you cannot afford to scale more blob storage in a mid of few years of the product, would you tell the users from now on you have limited numbers of videos or other limitations or this more business decision?
Hey DamjanDimitrioski, thanks for the question!
While there could be technical solutions to slow down the need to scale, this is ultimately a business decision. If you can't afford to scale, it may likely mean that your business model isn't profitable enough, or you aren't monetising it well. The question then becomes "is it worth it to keep this product up and running?" (considering accounting profits, opportunity cost etc.)
Hope this helps!
Server to talk to CDN and get back to users, hmm
You don't need websocket for uploading video
for large videos would http rest work ?
@@kumarmanish9046 yes it depends on what the server can support but technically you can upload at least 1GB or you can break the video in small files and then upload. You can leverage blob store like S3 here
@@rajsekhar28 Here web-sockets are not just used to upload vedio , he must be trying to say that he would be keeping connection alive while uploading videos i guess..
what is white boarding tool used?
Whimsical
What software is this? Is it Figma?
Hi Alice! The whiteboard being used here is “Whimsical”. They have a free and paid version so do check them out if you are interested!
Can we shard by title and description?
Why you want to do that?
There will be very less similar titles and forget about the description.
I would say analytics is as important as the main functionality(video upload and streaming).It drives revenues and system efficiency
My first question is WHY? What’s wrong with TH-cam. 🤣 exponent interviews are so basic.
Hey bro how youtube uploads work if the video is large like 500mb or 1gb. Is the upload happens directly on frontend on s3 then link is given to backend or http call is made to backend and wait for its upload
Video compression is done and every video is split into three major tasks: Video Transcoding, Audio transcoding and metadata persistence.
Talking about video transcoding, Videos are split into multiple chunks of files and a DAG of tasks is generated, workers pick parallel nodes and sequential nodes.
No load balancers?
Its understandable at this level, if scaling a service horizontally its mostly by use of load balancers
Too shallow a solution. Good enough for freshers maybe
23 minutes of non-sense
Hey I'm preparing for APM roles, is there anyone who wants to join to take interview of eachother.
Hi Kaushal
I'm 2 years late in responding :D and I would like to do that!
This guy doesn't know when to use websocket vs REST? I am surprised how try exponent is using this person for all demo interview. Too shallow
+1
Is this supposed to be impromptu interview or what, this is like a basic question they ask everywhere. I have only read Alex Xu book, I would do better than that