Great video few questions for clarification: 1. since we have high through put why we didnt use something like Kakfa / Flink instances? It would have made the design more fault tolerant and efficient, right? 2. Instead of using a API for orchesting all Microservices, cant we just use a API gateway?
Assuming that we aren't allowed to use S3 for pre-signed URL for upload/streaming, the question arises how can the system be designed without these AWS services? How to use a distributed object storage? Can you also mention how to achieve service availability & reliability? Also the transcoding part is mentioned at small length here. Considering the number of uploads (in the order of millions), that would be a great piece of discussion on scalability. Using Message Queues, Async workers to run transcoding at scale, the number of workers required all that should be accounted in the design. Streaming using CDN also should be elaborated, cause that is another major requirement in the system design for scalability.
Great, thanks! Don't understand total metadata storage required per day. 4:05 Why 1 mln videos power day * 1KB is not enough? You add 100 mln*1KB to it. Why?
Uploading to blob from the UI shouldn't be done, it means anyone can upload what they want to your blob store, not only that checks on the data can't be cheched as local UI can be overriden. Always have an API forward to the blob. if you are getting data, that should be done blob from UI.
Thank you for pointing that out! Yes, it is always advisable to have an API that forwards the data to the blob store, this way, one can enforce proper authentication, authorization, and validation checks on the server-side, ensuring that only authorized and validated data is uploaded to the blob store. It adds an extra layer of security and control over the data being uploaded. Uploading directly to a blob from the UI can pose security risks and lack proper validation checks.
If we upload to our server first and then to blob storage then it will take double the time to upload, and it can be quite significant for large files.
Yes, using Logstash in the Video Catalog service to send data to Elasticsearch for search functionality is a common and effective approach. Logstash can be utilized to collect, transform, and enrich data from various sources, including databases like MongoDB or Cassandra, and then send that data to Elasticsearch for indexing and fast search. By integrating Logstash into the Video Catalog service, you can ensure that the video metadata is efficiently indexed and made available for quick and accurate search queries in Elasticsearch. This approach enhances the search capabilities of the system and provides a seamless way to keep the search index up to date with changes in the video metadata.
Or we can use a message queue so that even other systems can consume. Other consumers could be something like IllegalVideoDetection, Analyser-service, Datalake etc.
Nicely explained! Have few doubts though: 1. What is the difference between metadata service and video catalog? As you mentioned both are used to store and retrieve metadata about the video 2. At timestamp 12:43, you mentioned about pre-signed url generation and then that gets uploaded from ui to s3. Wanted to know why can't we directly upload this pre- signed url in s3 from video uploader service, that might save us some bandwidth and trips over server. Am I missing something here?
Thank you for the question. 1. The Video Catalog database is used to store the information about the available videos in the system, such as their title, description, and the user who uploaded them. This database also keeps track of the number of views, likes, and dislikes for each video. The purpose of the Video Catalog database is to provide a quick and easy way to access basic information about the videos without having to query the more complex Video Metadata Database. On the other hand, the Video Metadata Database contains all the information related to the video content, such as the video file location, encoding parameters, resolution, bitrate, and other technical details. This database also keeps track of the various versions and renditions of the video that are available for streaming. The purpose of the Video Metadata Database is to provide a centralized repository of all the video-related information that can be accessed by different microservices in the system.
2. A presigned URL is a URL that allows access to an object or a file (in this case, youtube video) in a cloud storage service for a limited period of time without requiring authentication and It is generated by the cloud storage service provider. In future I might do a follow up video on Presigned URLs, but here is how it works at a high level, which should help you understand the flow better. Step 1: The User Requests a Presigned URL When a user wants to upload a video to TH-cam, they first request a presigned URL from TH-cam's server. The presigned URL contains a set of instructions that the user can use to upload the video file. The server generates the presigned URL with the HTTP method parameter set to PUT, which means that the user can upload the video file to the URL using the HTTP PUT method. Step 2: The Server Generates the Presigned URL TH-cam's server generates the presigned URL and sends it back to the user. The presigned URL is a unique URL that is valid for a limited period of time, usually a few minutes. This means that the user must upload the video file to the URL within the specified time limit, otherwise the URL will expire and become invalid. Step 3: The User Uploads the Video File Once the user receives the presigned URL, they can use it to upload the video file to TH-cam's server. The user can use any tool or application that supports HTTP PUT requests to upload the video file. The presigned URL contains a set of instructions that specify the location and format of the video file on TH-cam's server, as well as any other parameters that may be required.
@@ByteMonk - won't It cause challenges while editing a video's title ? Since both the tables are in different dbs are different how do you do it atomically? Doesn't It makes more sense to cache the details of VideoCatalog table and keep everything in Video Metadata ?
Does it make sense to store the video in the Object Store, retrieve it from the object store, encode/transcode it and send it back to the object store?? Why not encode it first before sending it to the object store??
Great video
few questions for clarification:
1. since we have high through put why we didnt use something like Kakfa / Flink instances? It would have made the design more fault tolerant and efficient, right?
2. Instead of using a API for orchesting all Microservices, cant we just use a API gateway?
Complex design broken down just to the extent it should be. Very nicely put together and explained.
Assuming that we aren't allowed to use S3 for pre-signed URL for upload/streaming, the question arises how can the system be designed without these AWS services? How to use a distributed object storage? Can you also mention how to achieve service availability & reliability?
Also the transcoding part is mentioned at small length here. Considering the number of uploads (in the order of millions), that would be a great piece of discussion on scalability. Using Message Queues, Async workers to run transcoding at scale, the number of workers required all that should be accounted in the design. Streaming using CDN also should be elaborated, cause that is another major requirement in the system design for scalability.
Excellent video!!!! I’m shocked by how little views this has. I’m glad I found it.
Thank you for your kind words
Please make more system design videos.
This is really amazing man. You are very knowledgeable. I highly appreciate your effort.
Nice Video, one clarification, why storage is needed for video watched ? Also CDN, I think for streaming of video, it should goto CDN ?
Nailed it 🎉
Great, thanks! Don't understand total metadata storage required per day.
4:05
Why 1 mln videos power day * 1KB is not enough? You add 100 mln*1KB to it. Why?
Uploading to blob from the UI shouldn't be done, it means anyone can upload what they want to your blob store, not only that checks on the data can't be cheched as local UI can be overriden. Always have an API forward to the blob. if you are getting data, that should be done blob from UI.
Thank you for pointing that out! Yes, it is always advisable to have an API that forwards the data to the blob store, this way, one can enforce proper authentication, authorization, and validation checks on the server-side, ensuring that only authorized and validated data is uploaded to the blob store. It adds an extra layer of security and control over the data being uploaded. Uploading directly to a blob from the UI can pose security risks and lack proper validation checks.
If we upload to our server first and then to blob storage then it will take double the time to upload, and it can be quite significant for large files.
@@DK-ox7ze No it won't as you can just stream it stright through... Though it seems that you have missed the point!
I stopped watching the video as soon as I saw he was recommending uploading the file from the UI
Hi great video!
Im assuming for search, we would be using something like logstash in the video catalog service, to send the data to Elastic search?
Yes, using Logstash in the Video Catalog service to send data to Elasticsearch for search functionality is a common and effective approach. Logstash can be utilized to collect, transform, and enrich data from various sources, including databases like MongoDB or Cassandra, and then send that data to Elasticsearch for indexing and fast search. By integrating Logstash into the Video Catalog service, you can ensure that the video metadata is efficiently indexed and made available for quick and accurate search queries in Elasticsearch. This approach enhances the search capabilities of the system and provides a seamless way to keep the search index up to date with changes in the video metadata.
Or we can use a message queue so that even other systems can consume. Other consumers could be something like IllegalVideoDetection, Analyser-service, Datalake etc.
It was a good explanation thx!
Nicely explained!
Have few doubts though:
1. What is the difference between metadata service and video catalog? As you mentioned both are used to store and retrieve metadata about the video
2. At timestamp 12:43, you mentioned about pre-signed url generation and then that gets uploaded from ui to s3. Wanted to know why can't we directly upload this pre- signed url in s3 from video uploader service, that might save us some bandwidth and trips over server. Am I missing something here?
Thank you for the question.
1. The Video Catalog database is used to store the information about the available videos in the system, such as their title, description, and the user who uploaded them. This database also keeps track of the number of views, likes, and dislikes for each video. The purpose of the Video Catalog database is to provide a quick and easy way to access basic information about the videos without having to query the more complex Video Metadata Database.
On the other hand, the Video Metadata Database contains all the information related to the video content, such as the video file location, encoding parameters, resolution, bitrate, and other technical details. This database also keeps track of the various versions and renditions of the video that are available for streaming. The purpose of the Video Metadata Database is to provide a centralized repository of all the video-related information that can be accessed by different microservices in the system.
2. A presigned URL is a URL that allows access to an object or a file (in this case, youtube video) in a cloud storage service for a limited period of time without requiring authentication and It is generated by the cloud storage service provider. In future I might do a follow up video on Presigned URLs, but here is how it works at a high level, which should help you understand the flow better.
Step 1: The User Requests a Presigned URL
When a user wants to upload a video to TH-cam, they first request a presigned URL from TH-cam's server. The presigned URL contains a set of instructions that the user can use to upload the video file. The server generates the presigned URL with the HTTP method parameter set to PUT, which means that the user can upload the video file to the URL using the HTTP PUT method.
Step 2: The Server Generates the Presigned URL
TH-cam's server generates the presigned URL and sends it back to the user. The presigned URL is a unique URL that is valid for a limited period of time, usually a few minutes. This means that the user must upload the video file to the URL within the specified time limit, otherwise the URL will expire and become invalid.
Step 3: The User Uploads the Video File
Once the user receives the presigned URL, they can use it to upload the video file to TH-cam's server. The user can use any tool or application that supports HTTP PUT requests to upload the video file. The presigned URL contains a set of instructions that specify the location and format of the video file on TH-cam's server, as well as any other parameters that may be required.
@@ByteMonk - won't It cause challenges while editing a video's title ? Since both the tables are in different dbs are different how do you do it atomically? Doesn't It makes more sense to cache the details of VideoCatalog table and keep everything in Video Metadata ?
Awesome explanation sir
Does it make sense to store the video in the Object Store, retrieve it from the object store, encode/transcode it and send it back to the object store?? Why not encode it first before sending it to the object store??
Thanks for the great work, helped a lot!
Great work
WoW just awesome bro!!
Thank you
Hi, can you add a video for food delivery system design like uberEats, DoorDash or Swiggy? @ByteMonk
sorry for the late reply, I will work on it. Thank you for the topic!
Great work here. Can you share your contact with me?
Thank you.
Hello! Our website is going thru an upgrade. Please email here: bytemonksystems@gmail.com