S3 system design | cloud storage system design | Distributed cloud storage system design

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 ธ.ค. 2024

ความคิดเห็น •

  • @kumarc4853
    @kumarc4853 4 ปีที่แล้ว +33

    I interviewed a candidate recently and he mentioned to me about your channel. Thank you for the good content and teaching lot of people and helping them crack system design interviews,

  • @metalalive2006
    @metalalive2006 3 ปีที่แล้ว +29

    20:28 overview of the design with example
    * 22:04 partition layer
    * 23:40 stream layer
    * 26:34 different partition strategies
    27:34 stream layer
    * 28:06 store new file in append-only fashion
    * 29:00 seal file server that is full
    * 31:24 monitor space of all these file servers
    * 32:36 garbage collection performed on sealed file servers
    * 34:30 replication
    * 38:01 health check on the file servers
    * 40:32 block group
    45:27 partition layer
    48:56 performance improvement tips

    • @metalalive2006
      @metalalive2006 2 ปีที่แล้ว

      At 28:06, you mentioned that spinning hard disk was a cheap feasible hardware solution for scalable storage service like S3 and SSD disk was expensive, I am interested to know if that is still true in 2022 since I know very little about detail architecture and marketing of SSD storage .

  • @kumarc4853
    @kumarc4853 3 ปีที่แล้ว +9

    A friend of mine got into FB and APPLE. He found your channel (and couple of other SD channels) very helpful in his prep.
    We can do this!
    Thank you

  • @pramodsingh4668
    @pramodsingh4668 3 ปีที่แล้ว +1

    This channel covers a lot of ground and probably one the best channels. But...and a big but...It takes 2-3 times more time than needed. A lot of duplication, unrelated content which turns a 20 minute video into an hour video. For example, everything before first 20 minutes could have been finished in just 2-3 minutes. Please keep it short and precise. Appreciate all the hard work you put and the knowledge you are sharing. Keep going.

  • @kunchasaikrishna
    @kunchasaikrishna 4 ปีที่แล้ว +18

    Really your channel content not less than any other top online education platforms.
    Appreciate your content 😊 Thankyou so much🙏

  • @rohitsharma-rp2jh
    @rohitsharma-rp2jh 3 ปีที่แล้ว

    shandaar zabardast zindabaad!

  • @gunhound45
    @gunhound45 4 ปีที่แล้ว +10

    Just want to say that I really love watching these videos. Even if I'm not preparing for system design interviews, its fun to do these thought exercises to design a big system.

    • @asahikitase5398
      @asahikitase5398 4 ปีที่แล้ว +2

      you got recognized by Kim Jong-un!

  • @bhavyamishra3502
    @bhavyamishra3502 4 ปีที่แล้ว +3

    Nice content....keep it up👍👍

  • @forgotten225522
    @forgotten225522 3 ปีที่แล้ว

    Most valuable information ever on your channel.

  • @renon3359
    @renon3359 3 ปีที่แล้ว +1

    Your channel is priceless brother, thank you.

  • @fendy0390
    @fendy0390 3 ปีที่แล้ว

    Really Appreciate your video here. You explain it very clear.

  • @ramakrishnanvisvanathan3378
    @ramakrishnanvisvanathan3378 2 ปีที่แล้ว

    Really liked this comprehensive design session, great keep it up and all the very best. I really appreciate the the work you have done towards bringing such wonderful to us.

  • @amlanch
    @amlanch 3 ปีที่แล้ว

    Terrific presentation! Love your videos

  • @amanpervaiz2843
    @amanpervaiz2843 3 ปีที่แล้ว

    This channel is gold!

  • @balakrishnan3725
    @balakrishnan3725 3 ปีที่แล้ว

    Thank you Naren! Nice video. I could feel the effort which you have put to create such video.

  • @aneksingh4496
    @aneksingh4496 4 ปีที่แล้ว +2

    Must say ,it would have taken much time for you to prepare this content , kudos !!!

  • @Siddharth42280
    @Siddharth42280 3 ปีที่แล้ว +12

    @Tech Dummies Narendra L: Could you please make videos on a centralized logging system and a distributed job scheduler?

  • @a.yashwanth
    @a.yashwanth 4 ปีที่แล้ว +9

    Amount of work you put in making these 50 minute long videos is insane.

    • @kumarc4853
      @kumarc4853 3 ปีที่แล้ว

      phenomenal work. we dont have to read books, they are for dummies :p

  • @kirankothandan5529
    @kirankothandan5529 3 ปีที่แล้ว +1

    You are an amazing teacher bro. I am a frontend folk but I am still interested towards system design because of you. How the design are made the way you explain makes me very curious. Thanks for the big efforts. Cheers 👌

  • @anuragagnihotri5238
    @anuragagnihotri5238 3 ปีที่แล้ว +1

    Thanks a lot for putting effort and providing design details of the distributed cloud storage. Although I had few questions:-
    1. I see Cluster manager is SPOF, how do we handle if the CM is down ?
    2. Why do we use DNS approach to update available Region routing ? Usually dns resolving is cached with few minutes or so, which will increase the downtime ?
    3. How do we handle concurrent update(not append) for same file from different users ?

  • @prashant211087
    @prashant211087 4 ปีที่แล้ว +24

    I appreciate your efforts. If possible, can you also share the references you go through for such design questions.

    • @vijayprajapati8475
      @vijayprajapati8475 4 ปีที่แล้ว

      444r

    • @fragrancias972
      @fragrancias972 4 ปีที่แล้ว +2

      He seems to read a lot of tech companies’ engineering blogs, based on his content.

    • @metalalive2006
      @metalalive2006 3 ปีที่แล้ว

      really appreciate his effort , these engineering blogs in these tech companies are mostly very long articles

  • @abhiruchi16
    @abhiruchi16 4 ปีที่แล้ว +3

    Really appreciate the level of detailed information provided in this video. Thanks a lot for your hard work and creating such awesome content !! :D

  • @Miguel-ym2rr
    @Miguel-ym2rr 2 ปีที่แล้ว

    This is the first time that I see how S3 works. Thank you so much!. I decided to focus my career on Distributed Systems as a Software Engineer, how do you get the base knowledge to design and implement a Distributed System?

  • @content-consumer-max
    @content-consumer-max 3 ปีที่แล้ว +1

    Time 48:10 Remapping of range from 0-100 to 0-50 and 50-100 is fine. But what happens to the files which are already written in the previous partition? How will the reads for UUIDs with hashes 0-50 map to the older partition?

    • @SudhanshuTamhankar
      @SudhanshuTamhankar 2 ปีที่แล้ว

      In that case, the mapping is not updated till the new stream is already "warmed up", which means that the files with 0-50 hashes are already copied over to the new stream. Once this is done, there is a cut-over transaction in the partition manager DB which now starts routing the calls for 0-50 into the new stream. In the meanwhile, there might be files which got written to the old stream while this transaction was still happening. So that is handled by a catchup routine which ensures all files have been copied over.
      Imagine it to be a two stage commit : When the cut over begins, there is a soft commit which says : write all new files for 0-50 in new stream. At the same time, while reading, try reading from both new and old stream. Once all files are copied over and there's no stale writes left over in old stream, the commit is finalized. Now all reads and writes for 0-50 go to new stream, and some garbage collection happens for old stream to free up space.
      Hope this helps.

  • @kveldgorkon4611
    @kveldgorkon4611 2 ปีที่แล้ว

    Thank you .. Great Explanation

  • @abrarisme
    @abrarisme 3 ปีที่แล้ว

    this was great, can't wait to see more videos!

  • @sushantasaha9938
    @sushantasaha9938 4 ปีที่แล้ว

    Appreciate your hard work behind it

  • @sureshnathann8360
    @sureshnathann8360 4 ปีที่แล้ว

    Hi Narendra, You awesome man! Keep posting ! Keep learning!!

  • @trybeingakr
    @trybeingakr 4 ปีที่แล้ว

    Appreciate the drastic improvement in delivery style.

  • @ravitandon9351
    @ravitandon9351 2 ปีที่แล้ว

    Very well done!

  • @ullas06
    @ullas06 4 ปีที่แล้ว

    Thank you for your time and efforts ,Its very helpful.

  • @tanayakarmakar2407
    @tanayakarmakar2407 2 ปีที่แล้ว

    great content

  • @adithyaks8584
    @adithyaks8584 3 ปีที่แล้ว +1

    Wow!! simply wow... Now I can cross question managers at Amazon during interviews

  • @pravaskumar7078
    @pravaskumar7078 4 ปีที่แล้ว

    awesome...very helpful

  • @OnkarSingh-fc8mu
    @OnkarSingh-fc8mu 3 ปีที่แล้ว +1

    (Time 48:10) In case, when there is more load on the partition servers, the partition manager splits the range into two partition servers, how does this newly created partition server would talk to the older file server in the streaming layer (where the file was actually stored) Does anything change in streaming layer as well?

    • @amishsumit
      @amishsumit 3 ปีที่แล้ว

      When partition manager assigns a new partition for a subrange say 1-50 out of 1-100, it also updates the partition map table entries. For example all the hash values say 14, 36, 42, 58, 89 were initially mapped to the partition server 2. Once the new partition server is added corresponding exiting stream servers in map table (14, 36 & 42) will be mapped to this new partition server. That way any further read request for those existing stream servers will be served by this new partition server.

    • @phildinh852
      @phildinh852 2 ปีที่แล้ว

      ​@@amishsumit But a partition server is assigned to 1 stream only?

  • @groinache
    @groinache 2 ปีที่แล้ว

    very nice presentation. Concise and good pronounciation. However, too much echo. Suggest to get a better recording system or infra with anti-echo.

  • @amlanch
    @amlanch 3 ปีที่แล้ว

    Excellent explanation. You didnt talk about the leader election and manager election in any of the layers but that's just some more detail.

  • @zianxu2006
    @zianxu2006 4 ปีที่แล้ว +19

    great content. Really appreciate it. I'm wondering, is it a good idea to start with a simple design and then scale up towards the final target design? I tried that at an interview and got the feedback that I didn't address many of the complexities until later in the discussion... Some other times I jumped into details upfront and got the feedback that I was focusing on details too much too soon....

    • @RajenderReddy12sw
      @RajenderReddy12sw 2 ปีที่แล้ว +2

      it's always a good idea to ask the interviewer.. what they are interested in..

  • @boombasach
    @boombasach 2 ปีที่แล้ว

    Really appreciate you putting up quality content. Very insightful . Couple of suggestions thougth - may be starting with high level user flow which you started talking at 21.00 will be useful. Also I am not sure both API server and Cluster Mgr two separate component talking to one DB is a good idea.

  • @harishkrish14386
    @harishkrish14386 4 ปีที่แล้ว

    Very nice videos including ur perspective on how to get jobs in germany, kerp going bro 👌🏻👌🏻

  • @Vendettaaaa666
    @Vendettaaaa666 3 ปีที่แล้ว +1

    The partition server + linked list of file servers idea seem like "Consistent Hashing on steroids"!
    Basically instead of a single server on a ring for a given hash range, it's an array of servers.

  • @praveenjain183
    @praveenjain183 3 ปีที่แล้ว

    Great Stuff Narendra, I appreciate the effort you make in gaining all this knowledge from multiple sources and sharing with us. Thanks a lot.

  • @hydtechietalks3607
    @hydtechietalks3607 4 ปีที่แล้ว +5

    Great Talk, I love this.. but to differentiate from others, Please anounce who is the audience and what is the depth level you would go in this video..for example, are you going to discuss algorithms used in design or overview of it.. if its scoped for an application developer or scoped for systems design developer...

  • @vigneshrajarajan6724
    @vigneshrajarajan6724 4 ปีที่แล้ว +2

    Hi Naren,
    thanks for your work. I have a question on uber/ food delivery design, from what i collected most of the applications rely on state machines to proceed to next step, could you please explain us how this Finite state machine is used in food delivery/uber designs

  • @icey3080
    @icey3080 4 ปีที่แล้ว

    this is very useful, thank you

  • @mattleahy3951
    @mattleahy3951 3 ปีที่แล้ว +1

    Great video! Only question I had is in the table you showed for the Stream manager, where it tracked the Start and stop offsets for the primary, it also had fields for the secondary and tertiary replicants, but it didn't separately track their offsets; that would need to be included as well, right? Thanks.

  • @asahikitase5398
    @asahikitase5398 4 ปีที่แล้ว

    thanks buddy, I do prefer the way you started with a simple architecture, and improve the system while increasing the traffic.

  • @progfan234
    @progfan234 4 ปีที่แล้ว

    Awesome stuff as always! I have a couple of questions:
    1. What impact will consistent hashing in realtime have on serving requests?
    2. What will happen when a particular partition server goes down? Will it be replaced by a standby? How many standbys should you consider maintaining?
    3. Is the Partition Map table a single point of failure? Or is it a within-cluster replicated data store?
    4. Would there be any benefits to replicating a given file server within a cluster?

    • @SharpySnipery
      @SharpySnipery 2 ปีที่แล้ว

      حء
      مگر
      جنگففےےےےےتےگءیءءءیثتسےڈےڈءءقرقر
      قررقنرضنعضھڑضھھڑضھرگےرےڑےڑےڑثڑڑےثثثثڑحڑحڑءضءقءرءرقءڑقےڑضےڑتقتڑقءڑقحڑضیرءرقےرقفڑقےڑقیہقریءےن
      نڑںڑچغدڑ
      ڑنر

  • @prasadg9583
    @prasadg9583 4 ปีที่แล้ว +1

    loved it mate!! thanks ❤️

  • @kdakan
    @kdakan ปีที่แล้ว

    How do you do file and disk operations on the remote file server, from the partition server and the stream server (like copying, clearing up space from unused blocks, etc.)? Do you mount an NFS share on these servers and issue local shell commands on these remote shares?

  • @ankita8867
    @ankita8867 3 ปีที่แล้ว

    Thanks for posting!!

  • @paraschawla3757
    @paraschawla3757 3 ปีที่แล้ว

    S3 system use Object Storage instead of Block Storage as mentioned in 43:00 min, Correct me if I misunderstood.

  • @shantanu143
    @shantanu143 2 ปีที่แล้ว

    Good contect however one doubt like if we are replicating from Europe to Asia isnt it Asynchronous replication?

  • @JashanPreetsingh-mi2nl
    @JashanPreetsingh-mi2nl 3 ปีที่แล้ว

    Nice

  • @zakariamaaraki1130
    @zakariamaaraki1130 4 ปีที่แล้ว +2

    Great video keep going! I have only one remark, in minute 11 you said that replication must be in other region in case of a disaster, i think data must stay in the same region for some reasons (latency, RGPD ...) but in different Availability zones instead (this is the default option used by S3). Am i right ?

    • @phildinh852
      @phildinh852 2 ปีที่แล้ว

      Yes, data is replicated in AZs of same region. There is an option to replicate data to another bucket in another region.

  • @ranjithsudhakar9304
    @ranjithsudhakar9304 4 ปีที่แล้ว +2

    Great work, a small suggestions if it makes sense for you. Videos less than 20 minutes are appealing than longer videos. In case if it cannot be condensed then could be split in to parts.
    Awesome work on all your system design videos. Thanks

    • @Reji012345
      @Reji012345 4 ปีที่แล้ว +2

      It's better to be at file.. otherwise it will break the flow.

    • @ellakkiankvp6267
      @ellakkiankvp6267 4 ปีที่แล้ว +2

      Not really, that can be left to the audience, I mean if you need break, you can pause, right? Also since this is a single entity, it's good to be a single video, honestly, I don't see any partitions here. Also psychologically imo if you recall the flow and feel something's hazy it's Less cognitive load to look for it in the flow compared to thinking between videos.

  • @doydoybb
    @doydoybb 4 ปีที่แล้ว

    I have a question. In your first simple design, you have a separate server to store metadata. On your second scaled storage system, where are the metadata stored? Is it all stored in the stream manager? Or is it stored on each individual partition server? Thanks!

    • @kinandchowdary6456
      @kinandchowdary6456 4 ปีที่แล้ว

      That will handled by cluster manager.

    • @rujhanarora7892
      @rujhanarora7892 3 ปีที่แล้ว

      @@kinandchowdary6456 Stream manager dude

  • @eugenee3326
    @eugenee3326 2 ปีที่แล้ว

    Great video but why can't ZooKeeper just do what Partition Manager does?

  • @rohanbundelkhandi3202
    @rohanbundelkhandi3202 4 ปีที่แล้ว

    Very Nice Video. One doubt, How Partition Server communicates to Stream Manager? As we don't have direct link over there..

  • @himanshuupadhyay6749
    @himanshuupadhyay6749 3 ปีที่แล้ว

    Quick question, when the request of a file upload goes to the server, is it chunked on client side? if so where sync service will come into the picture?

    • @Gerald-iz7mv
      @Gerald-iz7mv 2 ปีที่แล้ว

      good question - shouldnt there be a chunk service - which splits the file into chunks?

  • @rahulketech-h8e
    @rahulketech-h8e 3 ปีที่แล้ว

    good

  • @sowjanyav6570
    @sowjanyav6570 3 ปีที่แล้ว

    what happens if a user wants to add more content to a file, (say file has 1-100 lines, and user wants to add 10 more lines to it) which is already in a sealed storage server? Will the file be copied to a new server? Or only the extending part in a different file server?

  • @noypi613
    @noypi613 4 ปีที่แล้ว

    what technology do you use store the file? is it a database?

  • @viewforsourav
    @viewforsourav 4 ปีที่แล้ว

    How does Partition Server handle concurrent write requests if the system wants to honor append mode of writing to disk?
    One solution will be for a Single Stream - one can have multiple writers, each of which write to different file servers. However orchestrating such a model would be excruciatingly complex.
    Or Partition Servers can be logical entities with a 1-1 mapping to the stream id. Definitely that will lead to having many stream ids and some house keeping work for the Stream Manager. This will ensure the append mode of writing data and a better spread of file servers to stream ids.
    Let me know your thoughts Naren@Tech Dummies.
    Thanks for your videos.

    • @willinton06
      @willinton06 4 ปีที่แล้ว

      "excruciatingly complex" sounds about right, there's a reason why only a handful of companies even try to get something like this working.

  • @DarwinLo
    @DarwinLo 4 ปีที่แล้ว

    The Cluster Manager is responsible for updating the DNS entries upon a cluster failure. What do you suggest doing for client-side caching of DNS queries?

  • @SunilKumar-yd8xv
    @SunilKumar-yd8xv 4 ปีที่แล้ว

    Amazing Content! Really appreciate your efforts.
    One question - Do you need cluster manager in this architecture? Simple, failure, geo, weighted routing are supported by DNS mostly.

  • @happyandinformedlife1212
    @happyandinformedlife1212 4 ปีที่แล้ว

    Given a set of processes running on a cluster of hosts , design a system that load balances the hosts through live migration of the process. The goal of the load balancer is to minimize or prevent recourse starvation, a situation in which processes are not allocated the amount of recourses they want to consume. In case where all hosts in the cluster are overloaded, we want to distribute recourses evenly across demanding process. Given an imbalanced cluster, we want to bring it to a banned state as soon as possible at the lowest cost. Can you do Load Balancer next:

  • @KimetsuNoYaiba100
    @KimetsuNoYaiba100 4 ปีที่แล้ว

    Good followup: How does PUT API work for large files?

  • @baoleijia3764
    @baoleijia3764 4 ปีที่แล้ว

    appreciate your share, but
    1, I don't think different replications located in defferent Region, it costs to much to tranfser data between replications
    2, i don't think the fail over switch is done by dns,

  • @andybhat5988
    @andybhat5988 2 ปีที่แล้ว

    Ceph RADOS layer with remote replication can handle this much better. It also does not need metadata server for replication. Using CRUSH, proper availability can be guaranteed.

  • @RachnaDiary
    @RachnaDiary 3 ปีที่แล้ว

    how to store images or videos? what is the mechanism behind that? what have you explained it's for storing a file is okay but for photo/videos how it works?

  • @Vendettaaaa666
    @Vendettaaaa666 3 ปีที่แล้ว

    Mind blown!

  • @akashjain2990
    @akashjain2990 2 ปีที่แล้ว

    Why do we need partition layer? Why can't the API layer directly talk to Streaming layer since there is 1:1 of Partition to streaming layer anyway?

  • @PoojaMehta271
    @PoojaMehta271 3 ปีที่แล้ว

    Isn’t API server at 23 min nothing but a load balancer?

  • @pearlssnowboard3793
    @pearlssnowboard3793 4 ปีที่แล้ว

    Do you have any idea how to design a system load a 5G file to 5000 server?

  • @mopsyched
    @mopsyched 4 ปีที่แล้ว

    Something like RAFT or Frangimini or Spanner is always used for file servers

  • @tylerscott6531
    @tylerscott6531 3 ปีที่แล้ว

    Do AWS regions each represent a continent? I thought "us-east-1" and "us-west-2" were both in the US.

  • @metalalive2006
    @metalalive2006 3 ปีที่แล้ว

    does anyone know how cloud storage like Amazon S3 handle access control of each uploaded file ? for example , Amazon S3 exposes API endpoints for consumers to read and edit access control list of a file object , how does S3 do things ? really appreciate any reply or hints.

  • @noypi613
    @noypi613 4 ปีที่แล้ว

    how will the api insert data to the data store server?

  • @rishabhgoel1877
    @rishabhgoel1877 4 ปีที่แล้ว

    Thanks, it would have been much better if you had related these concepts in terms of S3 keys and buckets

  • @viditmathur8437
    @viditmathur8437 4 ปีที่แล้ว

    what happens if cluster manager goes down?

  • @ariellyrycs
    @ariellyrycs 4 ปีที่แล้ว +1

    Hey , how can I deposit you the dollar 💵, this is too much work, I have an interview coming up and I’m watching all your videos , thank you

    • @TechDummiesNarendraL
      @TechDummiesNarendraL  4 ปีที่แล้ว +1

      Thanks, Join the channel. You will find join button in the channel page!

  • @prasenjitkundu7904
    @prasenjitkundu7904 3 ปีที่แล้ว

    do you know captain america

  • @sumonmal009
    @sumonmal009 3 ปีที่แล้ว

    Solution 20:28

  • @nalamda3682
    @nalamda3682 2 ปีที่แล้ว

    why not zip?

  • @zuowang5185
    @zuowang5185 7 หลายเดือนก่อน

    Is this a mid level answer?

  • @gijduvon6379
    @gijduvon6379 3 ปีที่แล้ว

    I think noone today use spinning disks in production. At least in new projects. SSD are not so costly as they used to be.

  • @MohanRaj-vp1zt
    @MohanRaj-vp1zt 4 ปีที่แล้ว +1

    Lot of content, but language & presentation is quite poor. Because of that the flow is broken multiple times. This really doesn't help in an interview setting of 45 mins. The first major thing that an interviewer would want to see is the REST API signature of different functionalities offered , for example upload_file.