Extremely high availability...lot of earthquake everywhere....still you can see videos. It was really funny when you said "You might not want to see videos at that time, but thats a different story" 🤣🤣
Very good, bud! Thank you! Working on just internal applications at non big-tech, not dealing with such big-data or this detailed analytics, never makes you get aware of the scope/scale or all these testcases for these big platforms out there, and how they work such smart and efficiently.
It is really useful to watch your videos to get some understanding about the system designs & not only for interviews. I have doubts- 1) Why do we need to do the tagging for each chunk, for what purpose are we doing the tagging. 2) If ISP decides to do the caching on its side, then how are we planning to collect the statistics & how do we manage the accessibility of such cached content.
Hello Sandeep, good job. Liked your plain & no nonsense way of teaching. Also, I liked the color code of components (services, open sources etc.), that makes it easy to understand. One more thing, you speak slowly/calmly, which helps immensely, that sets the pace, while we hear we can think as well.
Great video ..!! However if you are in a 45 min interview to design TH-cam like service…. Talking about tags .. talking about user sharing accounts. or ppl clicking through pagination to find the videos can be seen as diverting from the core requirement. As a candidate you need to strictly focus on how users upload, how the server processes the videos formats etc, second how videos are streamed when user clicks on a video. As focused and great the first part was… the second part was all over the place, if someone were to just see the 2nd half they might think the question was “design google search”
Thanks @codeKarle. Your system design videos are gems. These videos for sure will gain traction in future and will get its much deserved views and likes.
IMO, E2E is very broad - which is super helpful! But one needs to skim through to get the basics of the components. Choice DB, caching & partitioning could have been little deeper than explaining the user login or analytics (probably a topic on it's own). But definitely very helpful overall.
Great Video. Thanks and kudos to the hard work you have put in. I just have one request - when using an external component like Cassandra, if you could compare with other alternatives and talk about why you chose this, that will be of great help.
We have done that in some of the videos, at random places. Doing that everytime would have become repetitive. You can check out the Databases video. There you would get to know about the alternatives that are available: th-cam.com/video/cODCpXtPHbQ/w-d-xo.html
Very nice video, complete in all respects. cover lots of topics: CDN, search, recommendations, optimization - great content, and thoroughly enjoyed. Thanks for sharing the knowledge.
Amazing video like others, thank you Sandeep! One request though, this video does not have a writeup like others. I find that very useful for making notes. Please keep up the good work!!!
All I got to learn was that you're a big time fan if mission impossible! Just kidding.. Great Work! Appreciate all the hard work you've put in to consolidate all this 👍
Not a bad video. Thanks for this. Two things I learned - The presenter's favorite movies :) - Why I'm so frustrated by the the recommendations in any sort of systems. The "AI" thinks bc I like numbers 2 and 3 and someone likes numbers 1 and 2 and 3, -> I'm going to like the number 1 as well, however I don't like non-prime numbers, but the dumb "AI" will never notice that. And bc they copy each other's homework, all recommendation systems in the world lets me down.
Hey, Thank you so much all your knowledge sharing. I am able to perform very nice in all my interviews. Keep up the good work. More power to you. Keep rocking!!!
I really enjoy these videos, but in a way, it's a little daunting - the analysis/explanation of all your components is taking at minimum 1 hour, where as I think most systems design interviews take place E2E within maybe 50 minutes. For example, I'm not sure if log in is something you really have to cover at all in this kind of interview, as it's mostly irrelevant and non-specific to this application. Also I might be just dumb but I feel like your system architecture design is quite higher in quality compared to other interview resources, but at the same time much more unrealistic (in my opinion). I think for me, explaining the design choices would be more helpful rather than you walking through the flow of why this architecture was set up this way. You mentioned in a comment below that it would be repetitive, but I think if you explain it in the context of each interview problem it would actually be very helpful. Thanks for all your content though, I am learning a lot!
It must be intentional - since all ppl will not have same skill - some might need it to be slow - Use the speed button to double the speed and it would suddenly turn into a 30min video
49:10 I couldn't understand why you used cassandra for home page service. You mentioned that it contains information about user such as likes, dislikes etc. But doesn't this information change very often. For example, if I watch a video on system design today, then I'd like the first row of my home page to contain another system design video. For that to happen, we have to update the Cassandra row. But I heard Cassandra is not good for updates. I understand that it is good here because it handles reads very well and it's "always on". Cassandra scales very well with additional data. But we don't need it here, right? There won't be many new user accounts getting created every minute. So can we use something like mysql + cache(maybe a write-back) similar to what you suggested for user service?
Hey Sandeep, your videos are very detailed and help learn a lot. Thanks for making these videos. Please continue to make more videos as these are really awesome!!!
This was really good, but would appreciate more if you could speak a more about caching and database design with tradeoffs (like relational DBs vs Cassandra), how would we cache content in CDNs, etc.
I believe it should be usable for audio streaming. The main difference would be your bandwidth and storage requirements, because sound files are inherently different from video files. You may also consider that most users tend to listen to the same songs over and over again stored locally on their device, in contrast to TH-cam video where most users stream new videos on a constant basis while almost never downloading those videos locally. This could have implications on bandwidth as well as analytics - if you want to aggregate a user's behavior for recommendation purposes when they mostly listen to their local library (as opposed to music streaming), how do you get that information? One way would be to have the client send an event about to the server side every time a song is played or completed (this can be done asynchronously, it does not have to be sent immediately after a song is played), where Kafka could route that event to a Spark cluster, which would then be stored in a Hadoop cluster for the Recommendation Engine to consume.
Still dont know how is a video played? I mean, there are so many chunks stored in CDN server. So How does CDN organize these chunks and merge them into one whole movie? Will it merge completely before playing? Or the service just load one or several chunks at one time to give to the user? If there were any troubles during the playing, how can the service fix them?
For Content Processor, instead of storing chunks directly to CDN, I think transcoded chunks should be stored to Output S3 bucket and CDN can use Output Bucket as Origin server.
@@Saurabh2816 i think he isnot saying that we should get file from S3 bucket over cdn. He is saying instead of the cdn uploader uploading the chunks to cdn, why dont we first upload to s3 and then there is an automatic sync available in aws that keeps all the cdns (cloudfronts) updated with the origin (s3 chunks) in this case. I think that would have some pros and cons
yes, that would have pros and cons Pros 1) there will be one master source of the chunks in s3 for later retrieval if required 2) leveraging existing sync process from AWS S3 to AWS cloudfront would be super efficient Cons 1) More redirection and complication in terms of process flow. But i dont think it is a major cons as this happens when user is not waiting for something.
Really awesome content and great explanation. One request from my side is please increase the video sound or use some other mic which provide great sound. Middle of video I heard very low compare to beginning or ending of the video. Thanks for the useful awesome content
Agree with this, in an interview we would be asked to create the diagram live. I would prefer if you would create the diagram on camera while explaining what each component does.
Amazing Video thanks for sharing this ,Can you please add Summary for all your design videos , I see you have added for a few which gives a lot of sense . Thanks Again for all the great work 👍👍👍
Before even uploadign to S3 - dont you think that content processor should first check for filteration lets say if its against the policy its not worth to upload on S3
Hi codeKarle, nice video. I'd appreciate it if you could explain more about why you made decisions. For instance why cassandra? why use an async pipeline?
Very nice presentation. Thanks very much. A question I have: why do you choose cloud services (like Amazon S3) for some part and decide to use your own stack in other places, e.g., running your own Cassandra cluster?
There is no technical reason for that, I try to use very common solutions which anyone can use easily. People are generally comfortable to use S3 as a file store, because a lot of companies use it, so it makes it easy to understand the larger picture. Any other solution would also be equally good :) My main idea is to tell people how can they design a system using solutions that they can easily get their hands on. This particular thing is explained more in th-cam.com/video/cODCpXtPHbQ/w-d-xo.html
Questions: 1) Do you put all the data from multiple events and sources in one Kafka queue or do you use separate Kafka queues? If it is only one KAFKA queue how do multiple consumers decide which ones they want to process ? Do they have to do that inside their code or does KAFKA provide that configuration? 2) Instead of KAFKA would Amazon SNS/SQS work in this case , if not why not?
Kafka is not meant for parallelize processing on a message-by-message basis but a AMPQ style broker is (rabbitmq/amazonsqs). So I think the video was wrong about choosing kafka to load balance messages among the content processor
How does the inter-service communication is handled in this scenarios, Does the requests goes through load balancer for every service to communicate since I am considering multiple instances of one or more services may be running at one point of time.
I have a query on the upload service, i.e how are we handling the case when let's say some of the process in the content processor fails , are we maintaining state of each chunk and in case how are we handling which chunk needs to be retried?
One ques here - why did you introduce apache spark streaming between kafka and hadoop cluster in this design? Couldn't hadoop cluster directly consume from kafka and itself perform batch processing as opposed to spark consuming all data and then sending to hadoop in batches?
Great video as usual. Does live streaming of cricket matches say by hotstar or disney use the same process. I am sure there will be different challenge as we wont have enough time for transcoding , creating lower quality images and then uploading chunks to cdn. how to handle that scenario where we have to live stream with say very very near real time say not more than 10 seconds
Great video thanks !!! Have a few doubts - 1. Why did we choose cassandra instead of Hbase for storing graph? Tutorials about their internals will be good. 2. Sometimes interviewers do not mention the requirements clearly, even after asking. There is some miscommunication. How do we resolve this? Egs - Some say how will we scale read/write requests(even after horizontal scaling, caching, CDNs, sharding in case data storage is huge). 3. Is it ok to mention the DB? Some could ask internals in case we dont know. Thanks in advance
1. You could use HBase as well. Cassandra and HBase are very similar in terms of performance and use-cases they cater to when looking from a high level. For alternatives for most DBs, you can check this video: th-cam.com/video/cODCpXtPHbQ/w-d-xo.html 2. Depends on a case by case basis. But it's usually a good idea to tell your assumption and then solve for it. Usually people tend to tell what they expect when you do that. 3. If you have some idea about the DB, you should say a name, or a type, for example you can say that you want to use a key-value store and not explicitly mention redis. But if you have no idea about a DB, don't throw random names. Also, no one expects you to know internals of a DB unless you are applying for a DBA role. All you need to know is the use-cases that the DB is good at.
for predicting what you would watch tomorrow, is it ok to have the client occupy bandwith and user device storage just for the prediction? or is there a mature solution to achieve it on the server or cdn side?
Really nice video goes into details of many areas. details about openconnect show efforts you have put into getting to details. Since its big video do you plan to share summary link like you did with few other videos ?
Thank you for the detailed explanations. What I recommend is for your videos, you can add some pictures such as showing Netflix home page during mentioning Collaborative filtering. You can add some colors to your videos to engage more. Thanks again for your effort.
Hello, I have a question - What happens in content filtering stage ( piracy, nudtity etc ) I meant lets say any of these is present then what happens, because the user has already uploaded the video so video is already in S3 now ( at least) so we then only flag and send a notification to user but content is still available ? its not like while uploading if these things gets detected then we restrict at upload only?
Hello Sir, I have watched all your videos. Its great, but I have noticed that you have used Cassandra quite a lot. But cassandra is not known for high consistent and reads are also slow in it. So, why did you see Cassandra to store video meta deta in youtube System design?
Very nice presentation. I have a small question, how is video played by same user account across multiple devices at same time handled. And is it synchronized, i.e., if I watch a video on my phone for 20 min, and then login to my laptop, it still shows 20 min. I am guessing we regularly send user activity information to the backend, and so anytime we login or homepage is refreshed, it loads up the user's recent activity. What do you think ?
Great Video!! Doubt: Say, in the Original S3, the links for Format A Resolution 480 Chunk X is L1 Format B Resolution 1080 Chink Y is L2 And so on.... Doubt 1: What will be the link be after storing them to the Cache (open connect) at ISP? Doubt 2: How will viewer client get to know know URL? Doubt 3: The Database has the links to L1, L2, L3...of the original S3 storage. How will it fetch the links saved in the OC(cache at the IPS)? What is the flow? Like: Client - ISP - fetch video service - DB - ISP - Cache - client ?
why did we choose cassandra here? why not mongo db? even mongo db provides index facility on all sub parts of the json. And cassandra we can just have queries on primary and secondary (cluster keys). in mongo as we can have multiple indexes, we can have wide varities of queries. So why cassandra here?
Amazing and to the point explanation. Just one thing though, is this the sufficient approach for any HLD interview or do we need to go into DB design and API as well?
cool video. One question: those content filters, how do the work? Is it a machine watching video chunks and somehow defining privacy, nudity etc levels? Or will it be youtube employees to watch those chunks and set the levels/tags of privacy, nudity, legal etc?
A lot of your video focuses using Kafka as a MessageQueue, are you leverage the dumb broker-smart consumer for throughput and expect the consumer to handle properly? Instead of others like RabbitMQ, Kinesis, SQS, etc.
Thanks a lot for making the video! You have provided a holistic view of the design! I have a few questions though. 1) Does the elastic search has its own data store or will it be indexing on Cassandra? 2) There is a link shown between Elastic Seach cluster and Recommendation Engine. Not able to understand the purpose?
Thanks!! To answer your queries: 1. ES would have it's own data store to store the data and the indexes. 2. The arrow between the Recommendation engine and Elastic Search is because the Recommendation Engine reads from Elastic Search.
@@codeKarle reads from ES? In the video while explaining choosing between different thumbnails I think you mentioned that recommendations engine feeds the data in es. Now when search service queries, it gets result from es based on the recommendations stored. I am confused
Thanks for this video .. do you mean after processing rom spark cluster you will store in HDFS (Hadoop cluster).? In short what will be used from hadoop cluster.?...
Raw video is stored in s3 and when we divided the into chunks we stored these into cdn. But generally cdn has TTL , what will happen when cdn service down ? Do we need to process the raw video again or do we store these processed for HA?
Great work! Thank you for creating these.
The breadth of your videos is absolutely unmatched.
Hey I am an SDE 2 at Amazon. I went through the entire video. Great content man! Very informative. Worth the length.
amazon sde-2's are overrated they don't know basic things
Is that true bro?@@Markcarleous1903
@@Markcarleous1903where do you work buddy
Extremely high availability...lot of earthquake everywhere....still you can see videos. It was really funny when you said "You might not want to see videos at that time, but thats a different story" 🤣🤣
Your code karle videos are amazing, Sandeep. No BS; comprehensive; pure distilled information. Thank you!!!
Very good, bud! Thank you!
Working on just internal applications at non big-tech, not dealing with such big-data or this detailed analytics, never makes you get aware of the scope/scale or all these testcases for these big platforms out there, and how they work such smart and efficiently.
The best I have ever seen! Impressive work, Sandeep!
It is really useful to watch your videos to get some understanding about the system designs & not only for interviews.
I have doubts-
1) Why do we need to do the tagging for each chunk, for what purpose are we doing the tagging.
2) If ISP decides to do the caching on its side, then how are we planning to collect the statistics & how do we manage the accessibility of such cached content.
Hello Sandeep, good job. Liked your plain & no nonsense way of teaching. Also, I liked the color code of components (services, open sources etc.), that makes it easy to understand. One more thing, you speak slowly/calmly, which helps immensely, that sets the pace, while we hear we can think as well.
Great video ..!! However if you are in a 45 min interview to design TH-cam like service…. Talking about tags .. talking about user sharing accounts. or ppl clicking through pagination to find the videos can be seen as diverting from the core requirement. As a candidate you need to strictly focus on how users upload, how the server processes the videos formats etc, second how videos are streamed when user clicks on a video. As focused and great the first part was… the second part was all over the place, if someone were to just see the 2nd half they might think the question was “design google search”
Thanks @codeKarle. Your system design videos are gems. These videos for sure will gain traction in future and will get its much deserved views and likes.
IMO, E2E is very broad - which is super helpful! But one needs to skim through to get the basics of the components. Choice DB, caching & partitioning could have been little deeper than explaining the user login or analytics (probably a topic on it's own). But definitely very helpful overall.
Great Video. Thanks and kudos to the hard work you have put in.
I just have one request - when using an external component like Cassandra, if you could compare with other alternatives and talk about why you chose this, that will be of great help.
We have done that in some of the videos, at random places. Doing that everytime would have become repetitive.
You can check out the Databases video. There you would get to know about the alternatives that are available: th-cam.com/video/cODCpXtPHbQ/w-d-xo.html
bhai, bahut hard work kiya hai apne explain krne ke liye.. thank you for your free service with knowledge
Channel seems to be extremely underrated, amazing content
Awesome explanation Sir. I have not seen so much detailed and informative explanation for system design questions.
bohot badhiya Sandeep bhai - you are a combination of good looks, good brain, and good attitude
Very nice video, complete in all respects. cover lots of topics: CDN, search, recommendations, optimization - great content, and thoroughly enjoyed. Thanks for sharing the knowledge.
Thanks!! Glad that you find it useful :)
Hey I went through the entire video. Great content man! Very informative. Worth the length. Can you create more video for LLD and HLD
Amazing video like others, thank you Sandeep! One request though, this video does not have a writeup like others. I find that very useful for making notes. Please keep up the good work!!!
All I got to learn was that you're a big time fan if mission impossible! Just kidding.. Great Work! Appreciate all the hard work you've put in to consolidate all this 👍
Not a bad video. Thanks for this.
Two things I learned
- The presenter's favorite movies :)
- Why I'm so frustrated by the the recommendations in any sort of systems. The "AI" thinks bc I like numbers 2 and 3 and someone likes numbers 1 and 2 and 3, -> I'm going to like the number 1 as well, however I don't like non-prime numbers, but the dumb "AI" will never notice that. And bc they copy each other's homework, all recommendation systems in the world lets me down.
Hey, Thank you so much all your knowledge sharing. I am able to perform very nice in all my interviews. Keep up the good work. More power to you.
Keep rocking!!!
I really enjoy these videos, but in a way, it's a little daunting - the analysis/explanation of all your components is taking at minimum 1 hour, where as I think most systems design interviews take place E2E within maybe 50 minutes. For example, I'm not sure if log in is something you really have to cover at all in this kind of interview, as it's mostly irrelevant and non-specific to this application. Also I might be just dumb but I feel like your system architecture design is quite higher in quality compared to other interview resources, but at the same time much more unrealistic (in my opinion). I think for me, explaining the design choices would be more helpful rather than you walking through the flow of why this architecture was set up this way. You mentioned in a comment below that it would be repetitive, but I think if you explain it in the context of each interview problem it would actually be very helpful. Thanks for all your content though, I am learning a lot!
It must be intentional - since all ppl will not have same skill - some might need it to be slow - Use the speed button to double the speed and it would suddenly turn into a 30min video
I appreciate all the efforts you put in sharing out this video
49:10 I couldn't understand why you used cassandra for home page service. You mentioned that it contains information about user such as likes, dislikes etc. But doesn't this information change very often.
For example, if I watch a video on system design today, then I'd like the first row of my home page to contain another system design video. For that to happen, we have to update the Cassandra row. But I heard Cassandra is not good for updates.
I understand that it is good here because it handles reads very well and it's "always on". Cassandra scales very well with additional data. But we don't need it here, right? There won't be many new user accounts getting created every minute.
So can we use something like mysql + cache(maybe a write-back) similar to what you suggested for user service?
Hey Sandeep, your videos are very detailed and help learn a lot. Thanks for making these videos. Please continue to make more videos as these are really awesome!!!
i felt that good recommendation system should be a functional requirement
This was really good, but would appreciate more if you could speak a more about caching and database design with tradeoffs (like relational DBs vs Cassandra), how would we cache content in CDNs, etc.
Awesome video . Seamless delivery . Can the same system design be used for Audio streaming service like spotify or youtube music ?
I believe it should be usable for audio streaming. The main difference would be your bandwidth and storage requirements, because sound files are inherently different from video files. You may also consider that most users tend to listen to the same songs over and over again stored locally on their device, in contrast to TH-cam video where most users stream new videos on a constant basis while almost never downloading those videos locally. This could have implications on bandwidth as well as analytics - if you want to aggregate a user's behavior for recommendation purposes when they mostly listen to their local library (as opposed to music streaming), how do you get that information? One way would be to have the client send an event about to the server side every time a song is played or completed (this can be done asynchronously, it does not have to be sent immediately after a song is played), where Kafka could route that event to a Spark cluster, which would then be stored in a Hadoop cluster for the Recommendation Engine to consume.
One Suggetion. It would be be wonderful if you could break the video into chapters. The content was amazing. Thanks for sharing your knowledge.
well explained in detail. Looking forward for videos on data structures, in java as coding language.
Very detailed and informative explanation. Thanks for all the efforts in teaching us system design.
Still dont know how is a video played? I mean, there are so many chunks stored in CDN server. So How does CDN organize these chunks and merge them into one whole movie? Will it merge completely before playing? Or the service just load one or several chunks at one time to give to the user? If there were any troubles during the playing, how can the service fix them?
Super video bro! This cleared out a lot of questions and helped me understand end to end architecture of video streaming services.
Absolutely brilliant. Thank you SO much for this much detail and depth.
Awesome and easily understood content!
For Content Processor, instead of storing chunks directly to CDN, I think transcoded chunks should be stored to Output S3 bucket and CDN can use Output Bucket as Origin server.
but wouldn't you prefer to get a file from a CDN then from a S3 bucket? What's the benefit?
@@Saurabh2816 i think he isnot saying that we should get file from S3 bucket over cdn. He is saying instead of the cdn uploader uploading the chunks to cdn, why dont we first upload to s3 and then there is an automatic sync available in aws that keeps all the cdns (cloudfronts) updated with the origin (s3 chunks) in this case. I think that would have some pros and cons
yes, that would have pros and cons
Pros
1) there will be one master source of the chunks in s3 for later retrieval if required
2) leveraging existing sync process from AWS S3 to AWS cloudfront would be super efficient
Cons
1) More redirection and complication in terms of process flow. But i dont think it is a major cons as this happens when user is not waiting for something.
Really awesome content and great explanation. One request from my side is please increase the video sound or use some other mic which provide great sound. Middle of video I heard very low compare to beginning or ending of the video. Thanks for the useful awesome content
I gave a thumbs up and watched it till the end, will this be rated as a super awesome video
Though it was very long but it was very informative and very detailed. Thanks for it 👍
AWesome video. One feedback, the echo from the mic is making it hard to understand and if you can switch, will make it more easy
Loved it! though it would have been good if you could please sketch the diagram and explain each block at the same time.
Agree with this, in an interview we would be asked to create the diagram live. I would prefer if you would create the diagram on camera while explaining what each component does.
Very detailed explanation, Great work!!!
Amazing Video thanks for sharing this ,Can you please add Summary for all your design videos , I see you have added for a few which gives a lot of sense . Thanks Again for all the great work 👍👍👍
Very very nicely explained. Thank you so much!
Before even uploadign to S3 - dont you think that content processor should first check for filteration lets say if its against the policy its not worth to upload on S3
Great video. Got a doubt. The chunks will be stored only in the CDN ? is that the best way ? shouldn't we store the processed chunks as well ?
Such great videos on system design. Hats off !!! :) :)
As the traffic is encrypted, how ISP can cache it?
Hi codeKarle, nice video. I'd appreciate it if you could explain more about why you made decisions. For instance why cassandra? why use an async pipeline?
Can you please provide a summary for this video which has been done for several of your other presentations?
Very nice presentation. Thanks very much. A question I have: why do you choose cloud services (like Amazon S3) for some part and decide to use your own stack in other places, e.g., running your own Cassandra cluster?
There is no technical reason for that, I try to use very common solutions which anyone can use easily. People are generally comfortable to use S3 as a file store, because a lot of companies use it, so it makes it easy to understand the larger picture. Any other solution would also be equally good :)
My main idea is to tell people how can they design a system using solutions that they can easily get their hands on. This particular thing is explained more in th-cam.com/video/cODCpXtPHbQ/w-d-xo.html
Awesome video man !! Thank you so much 😍
Awesome video .. very in-depth touched a lot of different things, great explanation.
This is awesome! and in great detail. Thank you for this video & Please do more on the subject. Subscribed!
Quite detailed, thanks for ur efforts.
You said we are going to upload each chunk on the CDN, if that is true, then why are we aggregating the chunks using spark again?
Questions:
1) Do you put all the data from multiple events and sources in one Kafka queue or do you use separate Kafka queues? If it is only one KAFKA queue how do multiple consumers decide which ones they want to process ? Do they have to do that inside their code or does KAFKA provide that configuration?
2) Instead of KAFKA would Amazon SNS/SQS work in this case , if not why not?
have you found an answer to this question?
looking for the same
Kafka is having concept of topics.. Multiple events are put into various topics.. Which are subscribed by the required consumer
Kafka is not meant for parallelize processing on a message-by-message basis but a AMPQ style broker is (rabbitmq/amazonsqs). So I think the video was wrong about choosing kafka to load balance messages among the content processor
How does the inter-service communication is handled in this scenarios, Does the requests goes through load balancer for every service to communicate since I am considering multiple instances of one or more services may be running at one point of time.
when you say local CDN 1, 2,3 at 58:50, do you mean different servers, server1, server2, serve3 within the same local CDN?
I have a query on the upload service, i.e how are we handling the case when let's say some of the process in the content processor fails , are we maintaining state of each chunk and in case how are we handling which chunk needs to be retried?
super quality content. Thanks a lot for sharing.
Great content. Thank you Sandeep.
Awesome video .. Covered great depth of this difficult topic ..
One ques here - why did you introduce apache spark streaming between kafka and hadoop cluster in this design? Couldn't hadoop cluster directly consume from kafka and itself perform batch processing as opposed to spark consuming all data and then sending to hadoop in batches?
Great video as usual. Does live streaming of cricket matches say by hotstar or disney use the same process. I am sure there will be different challenge as we wont have enough time for transcoding , creating lower quality images and then uploading chunks to cdn. how to handle that scenario where we have to live stream with say very very near real time say not more than 10 seconds
Very nice video, very good explanation, thank you so much.
Great video thanks !!! Have a few doubts -
1. Why did we choose cassandra instead of Hbase for storing graph? Tutorials about their internals will be good.
2. Sometimes interviewers do not mention the requirements clearly, even after asking. There is some miscommunication. How do we resolve this? Egs - Some say how will we scale read/write requests(even after horizontal scaling, caching, CDNs, sharding in case data storage is huge).
3. Is it ok to mention the DB? Some could ask internals in case we dont know.
Thanks in advance
1. You could use HBase as well. Cassandra and HBase are very similar in terms of performance and use-cases they cater to when looking from a high level. For alternatives for most DBs, you can check this video: th-cam.com/video/cODCpXtPHbQ/w-d-xo.html
2. Depends on a case by case basis. But it's usually a good idea to tell your assumption and then solve for it. Usually people tend to tell what they expect when you do that.
3. If you have some idea about the DB, you should say a name, or a type, for example you can say that you want to use a key-value store and not explicitly mention redis. But if you have no idea about a DB, don't throw random names. Also, no one expects you to know internals of a DB unless you are applying for a DBA role. All you need to know is the use-cases that the DB is good at.
Great Content. Super helpful
for predicting what you would watch tomorrow, is it ok to have the client occupy bandwith and user device storage just for the prediction? or is there a mature solution to achieve it on the server or cdn side?
Great explanation and informative content. Keep going 🙂👍
Thank you!! There is a lot more coming your way :)
You are awesome man. More power to you :D
Thanks!! Glad that you liked it :)
Should we use any caching for recommendation (in home page service) as well ?
Really nice video goes into details of many areas. details about openconnect show efforts you have put into getting to details. Since its big video do you plan to share summary link like you did with few other videos ?
Elastic search cluster is shared between two services, is that recommended?
Why did you stop creating content? Awesome video explanations!
your walk through is so much better than those "L8 engineer" or ex-Fang tutorials. shame that the accent is a bit thick for me
Thank you for the detailed explanations. What I recommend is for your videos, you can add some pictures such as showing Netflix home page during mentioning Collaborative filtering. You can add some colors to your videos to engage more. Thanks again for your effort.
Thanks! Honestly it looks like it might require a lot of editing effort, if thats not too much , we'll definitely try and do that :)
Peer to Peer Protocol - used in Torrents, DC++ or any filesharing across thousands of machines.
Are we storing all the CDN url for all the chunks in the Cassandra cluster ?
Thanks for very informative video.
@15:09 How system handles if "content filter" or any other step fails for a chunk of video ?
Hello, I have a question - What happens in content filtering stage ( piracy, nudtity etc ) I meant lets say any of these is present then what happens, because the user has already uploaded the video so video is already in S3 now ( at least) so we then only flag and send a notification to user but content is still available ? its not like while uploading if these things gets detected then we restrict at upload only?
Great Content 👍. Can you please explain how livestreams are handled?
Very nicely explained.
Hello Sir, I have watched all your videos. Its great, but I have noticed that you have used Cassandra quite a lot. But cassandra is not known for high consistent and reads are also slow in it. So, why did you see Cassandra to store video meta deta in youtube System design?
Very nice presentation. I have a small question, how is video played by same user account across multiple devices at same time handled. And is it synchronized, i.e., if I watch a video on my phone for 20 min, and then login to my laptop, it still shows 20 min. I am guessing we regularly send user activity information to the backend, and so anytime we login or homepage is refreshed, it loads up the user's recent activity. What do you think ?
Great Video!!
Doubt:
Say, in the Original S3, the links for
Format A Resolution 480 Chunk X is L1
Format B Resolution 1080 Chink Y is L2
And so on....
Doubt 1:
What will be the link be after storing them to the Cache (open connect) at ISP?
Doubt 2:
How will viewer client get to know know URL?
Doubt 3:
The Database has the links to L1, L2, L3...of the original S3 storage. How will it fetch the links saved in the OC(cache at the IPS)?
What is the flow? Like: Client - ISP - fetch video service - DB - ISP - Cache - client ?
Beautiful explaination
why did we choose cassandra here? why not mongo db? even mongo db provides index facility on all sub parts of the json. And cassandra we can just have queries on primary and secondary (cluster keys). in mongo as we can have multiple indexes, we can have wide varities of queries. So why cassandra here?
Amazing and to the point explanation. Just one thing though, is this the sufficient approach for any HLD interview or do we need to go into DB design and API as well?
Amazing. Learnt a lot. Thanks
Does it efficiently handle live streaming as well?
cool video. One question: those content filters, how do the work? Is it a machine watching video chunks and somehow defining privacy, nudity etc levels? Or will it be youtube employees to watch those chunks and set the levels/tags of privacy, nudity, legal etc?
A lot of your video focuses using Kafka as a MessageQueue, are you leverage the dumb broker-smart consumer for throughput and expect the consumer to handle properly? Instead of others like RabbitMQ, Kinesis, SQS, etc.
I liked you explanation but you should try to reduce it little and make it more bullet points centric , like you explained for cassandra .
Thanks a lot for making the video! You have provided a holistic view of the design!
I have a few questions though.
1) Does the elastic search has its own data store or will it be indexing on Cassandra?
2) There is a link shown between Elastic Seach cluster and Recommendation Engine. Not able to understand the purpose?
Thanks!!
To answer your queries:
1. ES would have it's own data store to store the data and the indexes.
2. The arrow between the Recommendation engine and Elastic Search is because the Recommendation Engine reads from Elastic Search.
@@codeKarle reads from ES?
In the video while explaining choosing between different thumbnails I think you mentioned that recommendations engine feeds the data in es. Now when search service queries, it gets result from es based on the recommendations stored.
I am confused
Superb explanation. But please also try to cover how the data is sharded across different servers
Thanks!
Sure, probably in the next ones I'll cover that.
Thanks for this video .. do you mean after processing rom spark cluster you will store in HDFS (Hadoop cluster).? In short what will be used from hadoop cluster.?...
Raw video is stored in s3 and when we divided the into chunks we stored these into cdn. But generally cdn has TTL , what will happen when cdn service down ? Do we need to process the raw video again or do we store these processed for HA?
Are the chunks only stored in CDNs? I would imagine they would have to be stored in S3