Want to learn how to answer system design interview questions and land the job? Make sure you're interview-ready with Exponent's system design interview prep course. Start free. bit.ly/3uZ2jfu
This is a good vidoe. Great work Abhishe and thank you. Here is my 2 cents as an enthusiastic techie. 1. An autoscaling group needs to be tied to the target group in which the ec2 instances (or compute instances depending on the cloud provider) are present. 2. The client should break the file into chunks and send across each chuck to the backend video service. 3. each chuck should be sent to the aws transcoder service and you should receive back 3 chunks from the service for the tablet , mobile and PC. 4. Once this is done each chunk can be stored in a separate s3 subfolder under a folder with the movie name. (s3 has prefixes and delimiters which are like a folder structure.) The corresponding meta data table where a movie is a List should have information relating to all the chunks of a movie and the type of chunk ie mobile, movie or tablet. There also needs to be error checking of each chunk by the service and a retry mechanism with a certain number of retires in case a chunk does not get uploaded properly. 5. The search service should basically have a trie internally that gives hints when a movie is typed in the search bar. 6. The s3 bucket needs to tied to cloudfront distributions at several locations which will cache the video. I don't think there is an explicit need to setup several s3 buckets in several regions. One can just enable the cross region replication feature in s3 directly. Regards Mukund Sridhar (Architect)
5. feels like too much detail for the search service especially when he said "I'm going to use elasticsearch" which is already built by a team at amazon
This is one of the very crisp system design mock interview I came across with. Abhishek, You are really flluent in what ever you are saying. I liked the way you had connected everything, therefore, grasping the interviewer's attention. I really learnt a lot from you. The way you communicate, organize thoughts and orchestrate everything in a platter is really outstanding. Once again thanks a lot Abhishek and team Exponent for this video. It is by far the best System Design Interview I came across till now.
I don't think that Abhi gave enough reasoning for why uploads should be fast. In the case of TH-cam I would agree that uploads should be fast, videos are shorter, you have many smaller creators/uploaders, and viewers are more likely to want to see videos quickly after they are uploaded as TH-cam videos are more likely to have time relevance. For Prime video it's almost entirely movies and tv shows which are long form content, with fewer uploads from larger creators (studios), meaning that the workflow for uploading a video is a more formal and involved process (ie copyright checks) that would happen days before the content is watchable. Ultimately it doesn't change the design that much, the processing queue is already async, nor do I think that this is a huge error on Abhi's part.
Thank you, this is a helpful suggestion. I still think the faster downloads part is key, but yes, it'd have been fun to talk about the additional content validation and preprocessing part.
They way Abhishek articulated, drew and explained is amazing, i was hooked for the entire video and learnt a ton. Hoping for more such mock interviews. Thanks Exponent and Abhishek
Design Amazon Prime VIdeo is a good question but where did "Content Creators uploading videos onto PrimeVideo" come from? Why should the uploads be fast? (I mean they should be fast but that is not important enough to be in the requirements and instead no buffering should be the more imp requirement, both should not be possible) Amazon PrimeVideo team handles the uploading part. They have to acquire licenses and make legal deals before doing that. Why is the content constantly referred to as "video"? I mean it is a video but movie/tv show/content are logically obvious choice here. Then the fact that in the beginning, the design was referred to as a design of TH-cam multiple times..... This makes me think that the interviewee here just stated the design of TH-cam that they might have rehearsed before. 1 million people are uploading the video, 100 million are viewing it - ohh really? Thats TH-cam, right?
dont fall for this mock interview. most of the system design interviews does not happen like this, it will be constant interference from the interviewer, he will not be just sitting like a doll and listening to you. no system design interview ever happens like this in real life unless you are giving interview in faang
Do you mean the interference is good or bad. In my opinion, the interference is okay until is goes with mutual respect. I have noticed cases, where interviewer misbehave, makes fun of the candidate.
Usually interviewers have a direction they want to go in, and no interference means he's ok with you saying anything and isn't trying to test your abilities
Hy, could you share what I can expect in actual system design interview for front-end. I am struggling alot for preparing this. Because there is no such content present in TH-cam that shows what to expect what to not, what to prepare etc.
The best video on your channel that I have watched so far. Abishek crushed it. Great end tips. Great drawing. I really liked drawing out each endpoint as a service, and how he scaled up after doing the base case.
Important point that was not discussed is big video files cannot be uploaded through HTTP based service.HTTP requests have size limit. Our client needs to chunking and directly load the data to S3 storage.So there has to be an arrow that hooks client to S3 directly.upload service could return presigned url in s3 and hand it over to client ,for uploading the video. Amazing point was introduction of CDN to bring videos near to users and push based mechanism to populate the CDN for faster viewing of files
I guess we can use the amazon s3 multipart service to do so and upload the files in chunks, but Abhishek mentioned other things for this part like encoding the videos later on
i suppose one thing to handle a popular release is we can bring it onto the platform a few days ahead cache it and make it public on the release time 🤔
For interviewee, API design and Database design, estimation. For interviewer, could have more depth questions, that's what I met in my pass interviews.
I didn’t think Amazon prime video had an upload feature, am I missing something? Great examples and tips at the end to do one requirement at a time, thanks!
Well, someone does upload the content right? The number of content creators are not as many as ones available on youtube. Most probably videos on Amazon prime go through a vetting process, however there's surely an additional interface for selected people to be able to upload the content.
I agree! I think he actually misunderstood the requirements a little. While uploading *is* an important part of amazon prime video, it's not a user-facing feature. The interviewee seems to assume that "creators" are uploading to the platform. I suspect no one outside amazon uploads videos directly into amazon prime. Amazon likely has TWO upload interfaces, not one. An external upload interface, where license content holders and amazon-subcontractors can pass raw high resolution content over to amazon, and an internal interface that processes that raw video for encoding and to add metadata (what actors are on the screen at what time, imdb info, closed captioning, etc). Only the internal interface would distribute videos to the amazon prime video servers after passing some kind of review. The interviewee's focus on availability vs consistency seems like a misstep if you expect that Amazon is the only consumer of the video publishing process. He seems to think that Amazon Prime works more like TH-cam. And who knows, maybe he's right! But I think he should have asked further about the design parameters instead of assuming he knew them just because he was familiar with the product. At least then we would know better what he was designing!!
@@abhishek0647 true, although i think it might have been good to elaborate on that a little more, given that it's probably a very different process from how youtube/instagram/tiktok handles uploads. anyways, thanks for doing the video and responding!
I agree with availability vs consistency, but I'm pretty sure "end users" don't upload videos on amazon, unless you count internal customers. I know the CTO of a video streaming service and we've talked about this, and they receive these super large files from studios, then do their own manipulation and encoding and upload to storage themselves.
So this is kind of generic video servicing application design which is fine in interview, until and unless interviewer restricts you after you present the functional requirement.
For me, what he came up with was fine. The issue that wasn't addressed in this video was the deep dive to the api and data level. I would have liked to see the base data stuctures and keys used. Ideas like s3 paths and the metadata needed.
Thanks for sharing, appreciated. I see a challenge for content creators in this design. For example, if a video is large (10+ GB), the current design requires the user to upload the entire file. If something goes wrong during the upload, they have to start over, which can be frustrating. From my perspective, it would be better to split the video into smaller segments during the upload process on client and same on view user, using adaptive bitrate streaming technique download the segment and stream on and repeat the process.
Good mock interview overall. I think the design is missing few key points that was mentioned but never elaborated. 1. Fast uploads. Since 1 s3 bucket sitting in let’s say us-east1. How would speed up if u uploaded from Australia or India?? May be use AWS global accelerator for faster file uploads. 2. Clients can’t directly upload the entire file if the file sizes are in GBs. They need to use multi-part upload. 3. Object partitioning was mentioned but never went in great detail. 4. Didn’t discuss Disaster Recovery/replication concepts.
how does the api for upload video looks like ? are you gonna chunk video and upload, what sort of protocol for transferring ( and streaming) chunked data looks like ? there is no API design covered, which should have been I believe. How do you handle the scenario where user pauses video, switches off device for few days and comes back and resumes? Are these not very fundamental of such platform like prime ?
Well done system design. I think there is a lot to learn from this one over the Netflix system design video from this channel. A lot of the concepts would apply.
Awesome Video! I do have one question, so in object store are we storing the video as a whole? or we are storing the chunks with certain lengths i.e 5 or 10 min lengths of clip?
What has a streaming video has anything to do with CAP theorem . I am not saying it has no relevance. But if you want your streaming video platform to be highly available with some latency tolerance then you have to think about using CDNs and other caching strategies. You are elevating something that has no major part to play.
Guess the CAP theorem was getting applied for the video upload feature and not for video streaming. While uploading eventual consistency to all the distributed servers is what was being mentioned IMO.
The only thing I couldn't quite understand is why using the Object Storage (in this case S3) for storing videos instead of also using a CDN, and not only using it as a Cache
Not sure if I understand your comment properly, but the reason why I chose to store the video files in an object store is for persistence and replication. I would not want to accidentally lose even a single file uploaded onto the platform. It'd lead to a bad user experience. The reason for using a CDN over cache is for faster video streaming service by ensuring the right set of videos are in close proximity and thus lower latency and better viewing experience. I hope this helps.
He mentioned that the customer would upload the video. I dont think any customer uploads video to Amazon Prime. Maybe he prepared TH-cam and used it here. :) He clearly mentioned "TH-cam videos" in the next statement. He simply blurts out without even understanding the subject. He is just throwing words without any context. He mentioned the CAP theorem which is not required there. No doubt Kevin's face is so surprising.
@jasper5016 think of content creators as directors, producers or production house uploading the video themselves. Or Amazon people uploading them instead of the stakeholders
Does the upload part mean Amazon prime video has user uploads where I can just log in and upload whatever video I want like TH-cam? Never knew that was a feature there
Hey nff_1950! Video splitting has the following benefits: - Size optimization: Videos can be large files, and splitting them into smaller segments allows for more efficient storage and transfer. - Ease of upload: Splitting a large video into smaller segments makes the upload process more manageable. If there's an interruption during the upload, you can resume from where it left off instead of starting the entire process again. - Parallel processing: Splitting a video enables parallel processing, where different segments of the video can be uploaded simultaneously. This can significantly speed up the overall upload time. Hope this helps!
Hey Yogesh! The whiteboard being used here is an online whiteboard called “excalidraw”. They have a free and paid version so do check them out if you are interested!
Thanks a lot for this wonderful explanatory video. Have a doubt though - shouldn't the cache also communicate with the object store to retrieve video apart from cdn?
No, we will be using the CDN to interact with the actual files. A cache is a very costly component. The size of video files will be large and using cache for them will not be an efficient solution. We will use cache to store file url and other metadata for the video. And the CDN will be used to render the actual video files in a fast and efficient manner. I hope this answers your query.
Hey narutoind! If you are asking about the whiteboard tool then the answer is “excalidraw”. They have a free and paid version so do check them out if you are interested!
does the data streaming happen directly from s3 or the video view service gets it from s3 and streams it to the user? i would assume we could directly stream from s3
You can and S3 provides you the option of doing the same. The final choice of how you want the streaming to work is yours. All approaches come with their own set of tradeoffs.
why do you need a processing queue, once you split, each partition can take x,y,z amount of time and why do you want to process the upload in a synchronous pattern. Instead each partition can be uploaded asynchronously
It is mentioned that we use map reduce concepts to split and encode the videos before pushing into s3, then why did the interviewer asked if the videos will be split in the questions?
Loved the clarity of the component design but he didn't dive into data modeling or api design. Would his analysis be considered enough for a real interview?
My answer would be yes. I wouldn't have done anything differently even when interviewing at a company. I have given multiple system design interviews at product companies with positive outcome. An interview is to assess a candidate's approach and ability to explain their design and solution. The 2nd half of an interview is more oriented with interviewer follow up questions. If they had asked me to do API or DB design, i would have ditched some detailed explanation from this video in exchange for explaining the API or DB design. Long story short, within 45 mins of an interview, I'd prefer going deep into a few topics (discussed with the interviewer) and explain this properly. I would not try to over do my skills by a brief explanation of multiple topics.
Just to reiterate, this is my personal opinion. I prefer a depth first approach of explanation. Others may prefer a breadth first approach. Do what best suits you. It is also advisable to discuss your potential approach with the interviewer at the start of the session, define the scope of what you are planning to do, and then proceed with the same for the duration of the interview.
Wouldn't you also want delete video for content that Amazon loses the rights to or if they want to remove it because they don't want to pay royalties when the video is viewed?
Hey pauldesrivieres7083, you are right that there are other functional requirements that we could consider e.g. deletion. It is important to note that in an interview, it is not expected for the candidates to cover all possible functional requirements since there are often too many. You can just cover the main system requirements to keep the scope manageable (unless otherwise stated by the interviewer).
Very beginner friendly and when we are saying it's a mock interview, at least some questions from the interviewer were expected. He didn't give any insight on the storage part, replication. I think storage was very crucial for streaming platforms. Only by mentioning the tools name not going to work in a real interview.
This is a good video but not an accurate replica of the actual system design interview. This looks like someone already knew about the system design interview. In the actual interview please ask the clarifying question, in this case, we can add the video and decide the date to make it available to watch and some videos are not available for some regions, etc.
When a video is 'HOT' or quite popular. You'd want them to be available in the cache. So you can either force update the cache and ensure the popular video is available in it. If you make the TTL low for a cache, the objects in the cache will be evicted quitckly and only the popular videos will exist in the cache. I hope this helps, Abhishek
I have a question. are we storing the full videos at CDN level ? or only few intial seconds of a video. because if we are storing the full videos then it will become another storage which will cost very heavily.
Why should the TTL for a hot/trending video be low if its meant to be viewed by more number of user shouldn't it stay in the memory for the more time or TTL should be more. Although one can argue that we won't be able to server the latest update but that can also be pushed by invalidating the cache.
Hey Alok! The whiteboard being used here is an online whiteboard called “excalidraw”. They have a free and paid version so do check them out if you are interested!
CDN is a content delivery service which is a component designed to interact and cache flat files such as images and videos in a fast, simple and efficient manner. We are using CDN as a caching service between the viewer and the file storage service (S3) to serve the video files to users with the least possible latency. Please refer to exponent's course video on CDN for a better understanding of the same.
Hey AwpshuN! The whiteboard app being used here is an online whiteboard called “excalidraw”. They have a free and paid version so do check them out if you are interested!
Hey adityaranjanyadav1776! The tool being used here is an online whiteboard called “excalidraw”. They have a free and paid version so do check them out!
He lost me in the first three minutes when he prioritized uploading videos as part of his functional and non-functional requirements. Platforms like Amazon Prime have hundreds of thousands of reads for a single upload.
The fundamental design is flawed, interviewee took TH-cam in his mind to design this (See timestamp 2:57). He even said that during functional requirements. That said, Amazon Prime Video doesnot allow Content Creators to upload videos, they have production house or work with production services to provide videos. My few questions, How can you upload a video over HTTPS ? The videos are in 4k/1080p = ~2GB, do you expect this to be uploaded through google chrome ? There was no mention of multipart upload - what if there is a break in connection ? What is this View Video service? you expect to download the video from this service ? Where is low latency non functional requirements, how do you expect this to work with 10k of titles ? The interviewee is talking about CDNs and Cache everywhere, do we understand the limitations(storage) and cost associated with CDN and Cache? So overall this video is maybe for Entry level Software engineers/New grads, for mid and Senior+, this design will not fly at all.
Imagine you're uploading a single 10GB video file, and if, due to a network glitch, it gets stuck at 99.9%, you'd need to re-upload the entire 10GB. However, if you divide the 10GB file into, say, 100MB chunks, and a failure occurs at 99%, you'd only need to re-upload that specific chunk. Moreover, by utilizing multi-core, multi-thread capabilities, you can upload all these smaller chunks simultaneously, resulting in faster uploads compared to uploading a single large file sequentially. I hope that clarifies it.
How to support video releases that are predicted to have a lot of demand: 1. Force push to cache: Proactively distribute video content to edge servers before high demand to reduce latency, minimize server load, and ensure faster access for users. 2. Reduce the TTL: Lower the TTL for video content to fetch the latest version more frequently, providing real-time updates and quick adaptation to changes during anticipated periods of high demand.
Hey Lakshman! The software being used here is an online whiteboard called “excalidraw”. They have a free and paid version so do check them out if you are interested!
This is really not a good format for an interview. A bunch of boxes and arrows that leaves more questions than answer is NOT how a real interview is conducted. For instance, the moment you draw that box that says "processing queue", the interviewer will ask, what is the nature of that queue, what is actually in that queue? where are the segments of a split video are stored? is it a different storage than the object store depicted? Or when the interviewee mentions "Elastic Search API service" what does it mean? does it mean Elastic search is actually used? Then what is the metadata store? Why should they be kept separate? This is NOT a real interview is conducted.
yes. I too feel that there are a lot of things missing. When happens if while sending the chunks to split server, the upload server crashes. The same with split service to encoder service. There is no data model to show how the video chunks are mapped to the actual video ID. And too many services are talking to Metadata DB, which begs the question, why not have a metadata service. If this is the actual interview then I am sure these questions would popup.
Hey Zhong Yang! The design tool being used here is an online whiteboard called “excalidraw”. They have a free and paid version so do check them out if you are interested!
The guy would have failed that module if I was the interviewer. He is making a ton of assumptions, not asking questions what the interviewer really wants, and basically building youtube, not prime video. For example, where is the requirement that uploads should be fast coming from?
@@tryexponent No it does not. I worked at a startup streaming service that was taking in digital masters from studios and re-compressing them and providing them to users.... The studio does not "upload" to the service like a youtube user. That's not how the industry works. Sorry, but you're wrong. this is a false premise.
i think this guy miss system design concepts from very basic level. He was lost into his own thought and interviewer was like you only go explain assume and complete the interview. Guy was total lost. He miss basics to clarify question. It seems like he was teaching rather then requirement gathering or what user wants. Good part was he was explaining well and justifying what he knows lol ., that was same thing we can learn from this.
Want to learn how to answer system design interview questions and land the job? Make sure you're interview-ready with Exponent's system design interview prep course. Start free. bit.ly/3uZ2jfu
This is a good vidoe. Great work Abhishe and thank you. Here is my 2 cents as an enthusiastic techie.
1. An autoscaling group needs to be tied to the target group in which the ec2 instances (or compute instances depending on the cloud provider) are present.
2. The client should break the file into chunks and send across each chuck to the backend video service.
3. each chuck should be sent to the aws transcoder service and you should receive back 3 chunks from the service for the tablet , mobile and PC.
4. Once this is done each chunk can be stored in a separate s3 subfolder under a folder with the movie name. (s3 has prefixes and delimiters which are like a folder structure.) The corresponding meta data table where a movie is a List should have information relating to all the chunks of a movie and the type of chunk ie mobile, movie or tablet.
There also needs to be error checking of each chunk by the service and a retry mechanism with a certain number of retires in case a chunk does not get uploaded properly.
5. The search service should basically have a trie internally that gives hints when a movie is typed in the search bar.
6. The s3 bucket needs to tied to cloudfront distributions at several locations which will cache the video. I don't think there is an explicit need to setup several s3 buckets in several regions.
One can just enable the cross region replication feature in s3 directly.
Regards
Mukund Sridhar
(Architect)
Wow mukundsridhar4250! Thanks for taking the time to share your knowledge! Appreciate it 🙏
3. You mean you should encode for multiple resolutions?
5. feels like too much detail for the search service especially when he said "I'm going to use elasticsearch" which is already built by a team at amazon
This is one of the very crisp system design mock interview I came across with. Abhishek, You are really flluent in what ever you are saying. I liked the way you had connected everything, therefore, grasping the interviewer's attention. I really learnt a lot from you. The way you communicate, organize thoughts and orchestrate everything in a platter is really outstanding. Once again thanks a lot Abhishek and team Exponent for this video. It is by far the best System Design Interview I came across till now.
Hey Avinash, we really appreciate the kind words. Glad you found our video useful!
I don't think that Abhi gave enough reasoning for why uploads should be fast. In the case of TH-cam I would agree that uploads should be fast, videos are shorter, you have many smaller creators/uploaders, and viewers are more likely to want to see videos quickly after they are uploaded as TH-cam videos are more likely to have time relevance. For Prime video it's almost entirely movies and tv shows which are long form content, with fewer uploads from larger creators (studios), meaning that the workflow for uploading a video is a more formal and involved process (ie copyright checks) that would happen days before the content is watchable. Ultimately it doesn't change the design that much, the processing queue is already async, nor do I think that this is a huge error on Abhi's part.
Thank you, this is a helpful suggestion. I still think the faster downloads part is key, but yes, it'd have been fun to talk about the additional content validation and preprocessing part.
They way Abhishek articulated, drew and explained is amazing, i was hooked for the entire video and learnt a ton.
Hoping for more such mock interviews.
Thanks Exponent and Abhishek
Glad you liked it! More system design videos on the way.
Could not agree more
I didn't know interviews can go this perfect as well. Abhishek should make some videos on how he approaches any system design interview problems.
Love this, thanks !!
This is the best mock interview I have seen on TH-cam.
Design Amazon Prime VIdeo is a good question but where did "Content Creators uploading videos onto PrimeVideo" come from? Why should the uploads be fast? (I mean they should be fast but that is not important enough to be in the requirements and instead no buffering should be the more imp requirement, both should not be possible)
Amazon PrimeVideo team handles the uploading part. They have to acquire licenses and make legal deals before doing that.
Why is the content constantly referred to as "video"? I mean it is a video but movie/tv show/content are logically obvious choice here. Then the fact that in the beginning, the design was referred to as a design of TH-cam multiple times..... This makes me think that the interviewee here just stated the design of TH-cam that they might have rehearsed before.
1 million people are uploading the video, 100 million are viewing it - ohh really? Thats TH-cam, right?
I also felt the same & it didn't look like the real interviews I have seen. For me this interview was over in first 10 mins.
dont fall for this mock interview. most of the system design interviews does not happen like this, it will be constant interference from the interviewer, he will not be just sitting like a doll and listening to you. no system design interview ever happens like this in real life unless you are giving interview in faang
That's why they mentioned 'mock'.
Do you mean the interference is good or bad. In my opinion, the interference is okay until is goes with mutual respect. I have noticed cases, where interviewer misbehave, makes fun of the candidate.
Usually interviewers have a direction they want to go in, and no interference means he's ok with you saying anything and isn't trying to test your abilities
Hy, could you share what I can expect in actual system design interview for front-end. I am struggling alot for preparing this. Because there is no such content present in TH-cam that shows what to expect what to not, what to prepare etc.
The best video on your channel that I have watched so far. Abishek crushed it. Great end tips. Great drawing. I really liked drawing out each endpoint as a service, and how he scaled up after doing the base case.
Important point that was not discussed is big video files cannot be uploaded through HTTP based service.HTTP requests have size limit. Our client needs to chunking and directly load the data to S3 storage.So there has to be an arrow that hooks client to S3 directly.upload service could return presigned url in s3 and hand it over to client ,for uploading the video. Amazing point was introduction of CDN to bring videos near to users and push based mechanism to populate the CDN for faster viewing of files
I guess we can use the amazon s3 multipart service to do so and upload the files in chunks, but Abhishek mentioned other things for this part like encoding the videos later on
Chapters (Powered by ChapterMe) -
00:00 - Introduction
00:47 - Question
01:12 - Requirements
06:16 - Design
21:34 - Follow-up questions
23:32 - Interview Analysis
24:04 - Tips
i suppose one thing to handle a popular release is we can bring it onto the platform a few days ahead cache it and make it public on the release time 🤔
I was thinking the same. Good call!
one of the best HLD mock interview ever seen
For interviewee, API design and Database design, estimation. For interviewer, could have more depth questions, that's what I met in my pass interviews.
I didn’t think Amazon prime video had an upload feature, am I missing something? Great examples and tips at the end to do one requirement at a time, thanks!
Well, someone does upload the content right? The number of content creators are not as many as ones available on youtube.
Most probably videos on Amazon prime go through a vetting process, however there's surely an additional interface for selected people to be able to upload the content.
I agree! I think he actually misunderstood the requirements a little. While uploading *is* an important part of amazon prime video, it's not a user-facing feature. The interviewee seems to assume that "creators" are uploading to the platform. I suspect no one outside amazon uploads videos directly into amazon prime.
Amazon likely has TWO upload interfaces, not one. An external upload interface, where license content holders and amazon-subcontractors can pass raw high resolution content over to amazon, and an internal interface that processes that raw video for encoding and to add metadata (what actors are on the screen at what time, imdb info, closed captioning, etc). Only the internal interface would distribute videos to the amazon prime video servers after passing some kind of review.
The interviewee's focus on availability vs consistency seems like a misstep if you expect that Amazon is the only consumer of the video publishing process. He seems to think that Amazon Prime works more like TH-cam. And who knows, maybe he's right! But I think he should have asked further about the design parameters instead of assuming he knew them just because he was familiar with the product. At least then we would know better what he was designing!!
@@assumptionsoup totally agrees with you. I took many systems design interviews and this is big RED Flag for the interview.
@@abhishek0647 true, although i think it might have been good to elaborate on that a little more, given that it's probably a very different process from how youtube/instagram/tiktok handles uploads. anyways, thanks for doing the video and responding!
@@shivers222 agreed. I'll try to record it sometime later on. It'd be a good topic to dive deeper into
I agree with availability vs consistency, but I'm pretty sure "end users" don't upload videos on amazon, unless you count internal customers. I know the CTO of a video streaming service and we've talked about this, and they receive these super large files from studios, then do their own manipulation and encoding and upload to storage themselves.
He just copy pasted the design of youtube.
So this is kind of generic video servicing application design which is fine in interview, until and unless interviewer restricts you after you present the functional requirement.
For me, what he came up with was fine. The issue that wasn't addressed in this video was the deep dive to the api and data level. I would have liked to see the base data stuctures and keys used. Ideas like s3 paths and the metadata needed.
Thanks for sharing, appreciated.
I see a challenge for content creators in this design. For example, if a video is large (10+ GB), the current design requires the user to upload the entire file. If something goes wrong during the upload, they have to start over, which can be frustrating. From my perspective, it would be better to split the video into smaller segments during the upload process on client and same on view user, using adaptive bitrate streaming technique download the segment and stream on and repeat the process.
Good mock interview overall. I think the design is missing few key points that was mentioned but never elaborated.
1. Fast uploads. Since 1 s3 bucket sitting in let’s say us-east1. How would speed up if u uploaded from Australia or India?? May be use AWS global accelerator for faster file uploads.
2. Clients can’t directly upload the entire file if the file sizes are in GBs. They need to use multi-part upload.
3. Object partitioning was mentioned but never went in great detail.
4. Didn’t discuss Disaster Recovery/replication concepts.
Great video! Wish we had enough time for API design and Schema Design, but really great explanation
Agreed. I was looking for that as well.
how does the api for upload video looks like ? are you gonna chunk video and upload, what sort of protocol for transferring ( and streaming) chunked data looks like ? there is no API design covered, which should have been I believe. How do you handle the scenario where user pauses video, switches off device for few days and comes back and resumes? Are these not very fundamental of such platform like prime ?
Well done system design. I think there is a lot to learn from this one over the Netflix system design video from this channel. A lot of the concepts would apply.
Umm it’s Amazon Prime, not TH-cam! You don’t need a load balancer to upload video, you just store the contents in a CDN I think.
Exactly, I thought the same,
Awesome Video!
I do have one question, so in object store are we storing the video as a whole? or we are storing the chunks with certain lengths i.e 5 or 10 min lengths of clip?
If there is no counter question, anyone with even lesser experience can answer this question better.
Abhishek explanation was too good. Really appreciate this video. Hope we get more such videos. For sure I am saving and sharing this video.
Thank you so much Abhishek, clear and crisp explanation of System Design
What has a streaming video has anything to do with CAP theorem . I am not saying it has no relevance. But if you want your streaming video platform to be highly available with some latency tolerance then you have to think about using CDNs and other caching strategies. You are elevating something that has no major part to play.
Guess the CAP theorem was getting applied for the video upload feature and not for video streaming. While uploading eventual consistency to all the distributed servers is what was being mentioned IMO.
The only thing I couldn't quite understand is why using the Object Storage (in this case S3) for storing videos instead of also using a CDN, and not only using it as a Cache
Not sure if I understand your comment properly, but the reason why I chose to store the video files in an object store is for persistence and replication. I would not want to accidentally lose even a single file uploaded onto the platform. It'd lead to a bad user experience.
The reason for using a CDN over cache is for faster video streaming service by ensuring the right set of videos are in close proximity and thus lower latency and better viewing experience.
I hope this helps.
It would be great to have a capacity estimation of the storage and its implementaton.
He mentioned that the customer would upload the video. I dont think any customer uploads video to Amazon Prime. Maybe he prepared TH-cam and used it here. :) He clearly mentioned "TH-cam videos" in the next statement. He simply blurts out without even understanding the subject. He is just throwing words without any context. He mentioned the CAP theorem which is not required there. No doubt Kevin's face is so surprising.
@jasper5016 think of content creators as directors, producers or production house uploading the video themselves. Or Amazon people uploading them instead of the stakeholders
Yeah. I noticed the same )) It's not about Prime
Great clarity of thought right through the video! Really loved the way Abhishek approached it.
Does the upload part mean Amazon prime video has user uploads where I can just log in and upload whatever video I want like TH-cam? Never knew that was a feature there
Who's this guy, that was so good. Abishek nail It
Not experience about video upload and stuff, but why we need video splitter instead uploading it directly to s3?
Hey nff_1950! Video splitting has the following benefits:
- Size optimization: Videos can be large files, and splitting them into smaller segments allows for more efficient storage and transfer.
- Ease of upload: Splitting a large video into smaller segments makes the upload process more manageable. If there's an interruption during the upload, you can resume from where it left off instead of starting the entire process again.
- Parallel processing: Splitting a video enables parallel processing, where different segments of the video can be uploaded simultaneously. This can significantly speed up the overall upload time.
Hope this helps!
Thanks Abhishek. Can you also recommend some books to grasp system design concepts.
Designing data intensive applications is a really good book
Thanks for the video, curious to know what tool are you using for documenting and drawing.
Hey Yogesh! The whiteboard being used here is an online whiteboard called “excalidraw”. They have a free and paid version so do check them out if you are interested!
Thanks a lot for this wonderful explanatory video.
Have a doubt though - shouldn't the cache also communicate with the object store to retrieve video apart from cdn?
No, we will be using the CDN to interact with the actual files. A cache is a very costly component. The size of video files will be large and using cache for them will not be an efficient solution.
We will use cache to store file url and other metadata for the video. And the CDN will be used to render the actual video files in a fast and efficient manner.
I hope this answers your query.
even i had the same question. thanks for askin and for the answer
Only one question which tool you are using. Looks fast and friendly
Hey narutoind! If you are asking about the whiteboard tool then the answer is “excalidraw”. They have a free and paid version so do check them out if you are interested!
Amazing video! Thanks to both interviewer and interviewee!
Glad you enjoyed it!
good video
slightly more opposing force would have been appreciated though
does the data streaming happen directly from s3 or the video view service gets it from s3 and streams it to the user? i would assume we could directly stream from s3
You can and S3 provides you the option of doing the same. The final choice of how you want the streaming to work is yours. All approaches come with their own set of tradeoffs.
why do you need a processing queue, once you split, each partition can take x,y,z amount of time and why do you want to process the upload in a synchronous pattern. Instead each partition can be uploaded asynchronously
It is mentioned that we use map reduce concepts to split and encode the videos before pushing into s3, then why did the interviewer asked if the videos will be split in the questions?
Since when does Amazon Prime Video allow you to upload?
There is no upload feature available for end user so I think we can ignore it
Loved the clarity of the component design but he didn't dive into data modeling or api design. Would his analysis be considered enough for a real interview?
Also would have been better if cost analysis was done.
My answer would be yes. I wouldn't have done anything differently even when interviewing at a company.
I have given multiple system design interviews at product companies with positive outcome.
An interview is to assess a candidate's approach and ability to explain their design and solution.
The 2nd half of an interview is more oriented with interviewer follow up questions. If they had asked me to do API or DB design, i would have ditched some detailed explanation from this video in exchange for explaining the API or DB design.
Long story short, within 45 mins of an interview, I'd prefer going deep into a few topics (discussed with the interviewer) and explain this properly. I would not try to over do my skills by a brief explanation of multiple topics.
Just to reiterate, this is my personal opinion. I prefer a depth first approach of explanation. Others may prefer a breadth first approach.
Do what best suits you. It is also advisable to discuss your potential approach with the interviewer at the start of the session, define the scope of what you are planning to do, and then proceed with the same for the duration of the interview.
It depends on interviewer wether he want to change path or is good for him
Wouldn't you also want delete video for content that Amazon loses the rights to or if they want to remove it because they don't want to pay royalties when the video is viewed?
Hey pauldesrivieres7083, you are right that there are other functional requirements that we could consider e.g. deletion. It is important to note that in an interview, it is not expected for the candidates to cover all possible functional requirements since there are often too many. You can just cover the main system requirements to keep the scope manageable (unless otherwise stated by the interviewer).
Push to cache will grow the cache size , how to get rid of that ?
Great Video , thanks .
Which whiteboard you are using here ?
Thanks for the compliment mpmohi! The whiteboard used here is "Excalidraw"
Very beginner friendly and when we are saying it's a mock interview, at least some questions from the interviewer were expected. He didn't give any insight on the storage part, replication. I think storage was very crucial for streaming platforms. Only by mentioning the tools name not going to work in a real interview.
This is a good video but not an accurate replica of the actual system design interview. This looks like someone already knew about the system design interview. In the actual interview please ask the clarifying question, in this case, we can add the video and decide the date to make it available to watch and some videos are not available for some regions, etc.
Hi Tejendra! Thanks for watching and taking the time to share your thoughts!
Can someone explain me the hot video part where Abhishek suggested two solutions 1. We can do cache force reset 2. Make ttl very low .
When a video is 'HOT' or quite popular. You'd want them to be available in the cache.
So you can either force update the cache and ensure the popular video is available in it.
If you make the TTL low for a cache, the objects in the cache will be evicted quitckly and only the popular videos will exist in the cache.
I hope this helps,
Abhishek
@@abhishek0647 This would also evict the trendy video sooner, when there is a row of requests for other videos, isn't it?
Shouldnt the Cache layer speak to Object store as well for new requests?
I have a question. are we storing the full videos at CDN level ? or only few intial seconds of a video. because if we are storing the full videos then it will become another storage which will cost very heavily.
No, we will not be storing the full video, but chunks of it at the CDN level.
good point on cost optimization.
What is the editor that Abhishek is using?
Excalidraw
Why should the TTL for a hot/trending video be low if its meant to be viewed by more number of user shouldn't it stay in the memory for the more time or TTL should be more. Although one can argue that we won't be able to server the latest update but that can also be pushed by invalidating the cache.
Thanks Abhishek! This is really helpful!
what's the name of drawing tool Abhishek used? it's kind of neat and looks very natural drawings.
Hey apurvraveshia5168! The drawing tool is called “excalidraw”. They have both a free and paid version!
he is very good
Which whiteboard platform is being used here?
Hey Alok! The whiteboard being used here is an online whiteboard called “excalidraw”. They have a free and paid version so do check them out if you are interested!
I didnt undertand the CDN service and view video service. What CDN is for ? viewer service is interacting with CDN
CDN is a content delivery service which is a component designed to interact and cache flat files such as images and videos in a fast, simple and efficient manner.
We are using CDN as a caching service between the viewer and the file storage service (S3) to serve the video files to users with the least possible latency.
Please refer to exponent's course video on CDN for a better understanding of the same.
Really good one!
I'd hire this guy
Great video, thank you so much !
Really good and easy to follow
So this guy came up with all this on the fly without any prior knowledge of Prime other than being a user? 🙂But good pointers for a mock interview.
Is it prime video design or TH-cam?
what is the tool you've used to type and draw ?
Hey sreenathp8359, it's called "Excalidraw"!
what is the tool are you using for design in this video
Hey redfly2963! The tool is "excalidraw"
Abhishek, do you take Nick interviews?? If so, please let me know how I can book one?
Are you referring to mock interviews?
@@abhishek0647 yes!!
What app was used for the note taking Abhishek used?
Hey AwpshuN! The whiteboard app being used here is an online whiteboard called “excalidraw”. They have a free and paid version so do check them out if you are interested!
gold stuff
Amazing explanation Abhishek🤩
What tool he used for drawing the diagram ? Does anyone know?
Hey discoverAnkitG, it's called "Excalidraw"!
Which tool Abhishek is using to draw?
Hey adityaranjanyadav1776! The tool being used here is an online whiteboard called “excalidraw”. They have a free and paid version so do check them out!
He lost me in the first three minutes when he prioritized uploading videos as part of his functional and non-functional requirements. Platforms like Amazon Prime have hundreds of thousands of reads for a single upload.
The fundamental design is flawed, interviewee took TH-cam in his mind to design this (See timestamp 2:57). He even said that during functional requirements. That said, Amazon Prime Video doesnot allow Content Creators to upload videos, they have production house or work with production services to provide videos.
My few questions,
How can you upload a video over HTTPS ? The videos are in 4k/1080p = ~2GB, do you expect this to be uploaded through google chrome ?
There was no mention of multipart upload - what if there is a break in connection ?
What is this View Video service? you expect to download the video from this service ?
Where is low latency non functional requirements, how do you expect this to work with 10k of titles ?
The interviewee is talking about CDNs and Cache everywhere, do we understand the limitations(storage) and cost associated with CDN and Cache?
So overall this video is maybe for Entry level Software engineers/New grads, for mid and Senior+, this design will not fly at all.
Why are we splitting the video? Why not store at as just one object?
Imagine you're uploading a single 10GB video file, and if, due to a network glitch, it gets stuck at 99.9%, you'd need to re-upload the entire 10GB. However, if you divide the 10GB file into, say, 100MB chunks, and a failure occurs at 99%, you'd only need to re-upload that specific chunk. Moreover, by utilizing multi-core, multi-thread capabilities, you can upload all these smaller chunks simultaneously, resulting in faster uploads compared to uploading a single large file sequentially. I hope that clarifies it.
Quite nicely done!
What is this board that he is using to draw diagram
Hey santosh_bhat! The whiteboard used here is "Excalidraw"
I didn't understand the response given for the follow-up question asked. Can somone help me here?
How to support video releases that are predicted to have a lot of demand:
1. Force push to cache: Proactively distribute video content to edge servers before high demand to reduce latency, minimize server load, and ensure faster access for users.
2. Reduce the TTL: Lower the TTL for video content to fetch the latest version more frequently, providing real-time updates and quick adaptation to changes during anticipated periods of high demand.
@@tryexponentthank you
Thanks Abhishek
which app is used to draw in this video?
Hey PoornaBengaluruShivajiRao, it's called "Excalidraw"!
What software is he using whiteboard this out?
Hey Andyjohnsheridan, the whiteboard is called "Excalidraw"!
What's the software he's using?
Hey Lakshman! The software being used here is an online whiteboard called “excalidraw”. They have a free and paid version so do check them out if you are interested!
This is really not a good format for an interview. A bunch of boxes and arrows that leaves more questions than answer is NOT how a real interview is conducted. For instance, the moment you draw that box that says "processing queue", the interviewer will ask, what is the nature of that queue, what is actually in that queue? where are the segments of a split video are stored? is it a different storage than the object store depicted? Or when the interviewee mentions "Elastic Search API service" what does it mean? does it mean Elastic search is actually used? Then what is the metadata store? Why should they be kept separate? This is NOT a real interview is conducted.
yes. I too feel that there are a lot of things missing. When happens if while sending the chunks to split server, the upload server crashes. The same with split service to encoder service. There is no data model to show how the video chunks are mapped to the actual video ID. And too many services are talking to Metadata DB, which begs the question, why not have a metadata service. If this is the actual interview then I am sure these questions would popup.
It's not enough and need go more deep with more details such as the schema of the database, the qps, the number of hosts.
what is the text editor he is using?
Hey 1nOnlySB! Do you mean the whiteboard? If so, it's called "Excalidraw"
I wander what the design tool the guy used is?
Hey Zhong Yang! The design tool being used here is an online whiteboard called “excalidraw”. They have a free and paid version so do check them out if you are interested!
@@tryexponent thanks dude ^_^
❤
1 out of 100 users are uploading videos on Amazon Prime?
The guy would have failed that module if I was the interviewer. He is making a ton of assumptions, not asking questions what the interviewer really wants, and basically building youtube, not prime video. For example, where is the requirement that uploads should be fast coming from?
We don't have to make service for each verb 😓
but... the public can't upload video to Amazon Prime... why the heck is this treated like it's a thing?!
prime is not youtube.
Film and TV distribution works similarly to uploading to TH-cam. Although, it’s restricted to specific users and production companies.
@@tryexponent No it does not. I worked at a startup streaming service that was taking in digital masters from studios and re-compressing them and providing them to users.... The studio does not "upload" to the service like a youtube user. That's not how the industry works.
Sorry, but you're wrong. this is a false premise.
i think this guy miss system design concepts from very basic level. He was lost into his own thought and interviewer was like you only go explain assume and complete the interview. Guy was total lost.
He miss basics to clarify question. It seems like he was teaching rather then requirement gathering or what user wants. Good part was he was explaining well and justifying what he knows lol ., that was same thing we can learn from this.
Is he storing every video in CDN!!!
Amazon Prime does not have creators. He designed TH-cam. Not Prime service
thanks for the video! the font on the screen is a bit hard to read though...