Database Sharding and Partitioning

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 พ.ค. 2024
  • System Design for Beginners: arpitbhayani.me/sys-design
    System Design for Experienced Engineers: arpitbhayani.me/masterclass
    Redis Internals: arpitbhayani.me/redis
    Build Your Own Redis / DNS / BitTorrent / SQLite - with CodeCrafters.
    Sign up and get 40% off - app.codecrafters.io/join?via=...
    In the video, I explained the concepts of sharding and partitioning for scaling systems at a database level. Initially, I discussed scaling databases through different stages, the difference between sharding and partitioning, and when to introduce these concepts. I highlighted the advantages and disadvantages of adopting them. I also promoted my system design course, emphasizing collaborative learning and practical problem-solving. The video covered vertical scaling, read replicas, horizontal scaling, and the importance of sharding and partitioning for high throughput in database systems.
    Recommended videos and playlists
    If you liked this video, you will find the following videos and playlists helpful
    System Design: • PostgreSQL connection ...
    Designing Microservices: • Advantages of adopting...
    Database Engineering: • How nested loop, hash,...
    Concurrency In-depth: • How to write efficient...
    Research paper dissections: • The Google File System...
    Outage Dissections: • Dissecting GitHub Outa...
    Hash Table Internals: • Internal Structure of ...
    Bittorrent Internals: • Introduction to BitTor...
    Things you will find amusing
    Knowledge Base: arpitbhayani.me/knowledge-base
    Bookshelf: arpitbhayani.me/bookshelf
    Papershelf: arpitbhayani.me/papershelf
    Other socials
    I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff.
    LinkedIn: / arpitbhayani
    Twitter: / arpit_bhayani
    Weekly Newsletter: arpit.substack.com
    Thank you for watching and supporting! it means a ton.
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 127

  • @shishirchaurasiya7374
    @shishirchaurasiya7374 11 หลายเดือนก่อน +9

    I was literally consfused in gaining the clarity untill you came to the point where you transposed this theory into understanding through tables and the reference with SQL queries, thanks a lot to your efforts for this loving beautiful explaination Arpit sir

  • @ranjithpals
    @ranjithpals ปีที่แล้ว +4

    Thanks a lot ! That was well explained with clear and concise explanation. Looking forward to enrolling in your complete system design course.

  • @AlokMehta24
    @AlokMehta24 9 หลายเดือนก่อน +1

    Excellent video Arpit . Coming from no software and system engineering background , this was the best video to explain data sharding and partioning . I am a Tech PM for AWS Supply Chain and data partitioning and sharding is real deal for us. Thank for making this extremely easy to understand video

  • @nuclearniraj
    @nuclearniraj 9 หลายเดือนก่อน

    One video and all the clutter on Sharding and Partitioning is clear. Thank you so much Arpit.

  • @jaskiratwalia
    @jaskiratwalia 3 หลายเดือนก่อน +1

    Wonderfully explained! Cleared all my doubts. Please keep making such videos. These are also well timed, not too short nor too long.

  • @aditijalaj5036
    @aditijalaj5036 9 หลายเดือนก่อน

    this is an amazing video and your explainations are very clear

  • @___vandanagupta___
    @___vandanagupta___ ปีที่แล้ว +1

    The knowledge of amount in this video is tremendous!!! Extremely helpful 👍👍👍 thankyou sir!!

  • @kritibindra4232
    @kritibindra4232 ปีที่แล้ว +1

    Wow this was really really helpful! Thank you posting this.✨

  • @chaitanyawaikar382
    @chaitanyawaikar382 ปีที่แล้ว +4

    One of the best videos explaining the nuances between partitioning and sharding. Thank you @ArpitBhayani

  • @nimitkanani1691
    @nimitkanani1691 ปีที่แล้ว

    Very beautifully and simply explained. The content of the video flowed so smoothly. Thank You @ArpitBhayani

  • @jithinb7047
    @jithinb7047 9 หลายเดือนก่อน

    Awesome content Arpit ! Thanks a lot and please do continue post more on concepts such as well as analysis of real use cases.

  • @timamet
    @timamet ปีที่แล้ว +1

    amazing explanations, thank you

  • @neerajdixit7102
    @neerajdixit7102 ปีที่แล้ว

    Awesome Arpit, Thanks truly admire your way of teaching

  • @mohitkumartoshniwal
    @mohitkumartoshniwal ปีที่แล้ว +1

    A very clear and detailed explanation. ♥️

  • @AqibJavaid-zl7vc
    @AqibJavaid-zl7vc หลายเดือนก่อน

    Excellent video ❤. Finally, I got a good grasp of the whole concept.

  • @Jamsessions0
    @Jamsessions0 19 วันที่ผ่านมา

    One of the best explanations on the internet, well done sir

  • @zeyuli53
    @zeyuli53 ปีที่แล้ว +1

    well explained, thank you

  • @vamsidharvemuluri3817
    @vamsidharvemuluri3817 หลายเดือนก่อน

    Best explanation so far. thanks brother

  • @hanzalasiddique6313
    @hanzalasiddique6313 ปีที่แล้ว +1

    Mind Blowing ❤

  • @sameer1571
    @sameer1571 5 หลายเดือนก่อน

    Bro your diagram example made my day. Such a clear and concise explanation of this topic. Bro dil se love u ❤❤ for making this video.

  • @kalinduabeysinghe8917
    @kalinduabeysinghe8917 10 หลายเดือนก่อน

    Such a clean explanation🙌

  • @varshard0
    @varshard0 4 หลายเดือนก่อน

    thank you. I always assumed that they are the same thing. This cleared things up for me.

  • @Sharmasurajlive
    @Sharmasurajlive ปีที่แล้ว

    Simple and efficient explanation 👍🏻

  • @lazry1773
    @lazry1773 ปีที่แล้ว

    Dude this was amazing

  • @aneksingh4496
    @aneksingh4496 8 หลายเดือนก่อน

    super video Arpit

  • @KishoreThatavarthi
    @KishoreThatavarthi 4 หลายเดือนก่อน

    thanks a lot arpit sir really enjoyed and got full clarity

  • @PoojaDurgi
    @PoojaDurgi 7 หลายเดือนก่อน

    Amazing !!

  • @jasper5016
    @jasper5016 3 หลายเดือนก่อน

    Thanks so much Arpit!!

  • @vijaymunavalli335
    @vijaymunavalli335 ปีที่แล้ว +1

    Its very practical explanation...cool one

  • @iMakeYoutubeConfused
    @iMakeYoutubeConfused 2 หลายเดือนก่อน

    Very clear explanation, thanks!

  • @ryan-bo2xi
    @ryan-bo2xi 11 หลายเดือนก่อน

    bohot badhia bhai .. lajawwab

  • @DEEPAKKUMAR-wk5pk
    @DEEPAKKUMAR-wk5pk ปีที่แล้ว +1

    Wow great explanation

  • @anandahs6078
    @anandahs6078 หลายเดือนก่อน

    Very good explanation with right examples. Hats off to you. Thanks for great content. I always thought shard and partitions are same but you clarified it very well.

  • @heykalyan
    @heykalyan ปีที่แล้ว

    Kudos to you❤

  • @kaal_bhairav_23
    @kaal_bhairav_23 หลายเดือนก่อน

    thanks a lot arpit for an awesome explanation as always

  • @KriszSch
    @KriszSch 2 หลายเดือนก่อน

    Great explanation!

  • @letsexplorewithanika2642
    @letsexplorewithanika2642 ปีที่แล้ว +1

    Very clear explaination

  • @shintojoseph9166
    @shintojoseph9166 ปีที่แล้ว +1

    Clear explanation

  • @prashantkamble898
    @prashantkamble898 10 หลายเดือนก่อน

    Greatly explained

  • @akshayrahangdale8511
    @akshayrahangdale8511 6 หลายเดือนก่อน

    Very Nice Video, I just loved the explanation.

  • @TechSpot56
    @TechSpot56 2 หลายเดือนก่อน

    Nice explaination, arpit.

  • @pramodpatil-ue8sm
    @pramodpatil-ue8sm 7 หลายเดือนก่อน

    Great explanation, as always. Please post a link If you have recorded any video on Partitioning strategies

  • @jivanmainali1742
    @jivanmainali1742 2 ปีที่แล้ว

    Arpit sir I need your help clearifying few doubts
    In ecommerce platform like shopify each mechant is given their own collection for order cart account differentiated by some merchant identifier (projectId-order ) vs Same order table index by merchant ideidentifier ie projectId.So we can't apply sharding in first case.
    Also is it wise idea to deploy each merchant application separately as we would have to maintain each merchant app separately.So what do you suggest in those case?

  • @nikhilrajput8696
    @nikhilrajput8696 หลายเดือนก่อน

    Wow...really nice. Nowadays a lot of people are selling and talking about system design and always try to build some optimistic solution straight forward without going into the internals and in fact they have not even worked on a lot of systems. I strongly feel the way of your explanation is very very nice and I am going to buy your system design plan to improve mine.

    • @AsliEngineering
      @AsliEngineering  หลายเดือนก่อน

      Thanks. Looking forward to having you enrolled 🙌

  • @dhaanaanjaay
    @dhaanaanjaay ปีที่แล้ว

    One question, at 21.00 the matrix shows what it looks like when we have both sharding and partioning, how that is different from having two databases on two different EC2 instance for two applications?

  • @shreyanshsinha37
    @shreyanshsinha37 ปีที่แล้ว +1

    When we say Shard1 or Shard2, do we mean the sql server hosted on the EC2 instance combinedly as a shard?

  • @amananurag07
    @amananurag07 23 วันที่ผ่านมา

    @arpit Thanks for such dense information in so short and simple video.
    However I have a query on a corner case
    - How can have replicas when one has multiple shards with partitioning?
    - In this case is replication locally on the shard or it can also be replicated on other shards for high availability across avalability zone or DR (like kafka architecture)?

  • @anshujaiswal5622
    @anshujaiswal5622 19 วันที่ผ่านมา

    Simple and to the point explanation .. Thanks Arpit, Liked & Subscribed :)

  • @codecspy3479
    @codecspy3479 6 หลายเดือนก่อน +1

    2 Important points which i felt could be discussed more are 1) When you said the choice of partitioning depends on the load , usecase and access patterns , can you please give an example of each case ?? 2) When you were talking about the advantages and disadvantages of sharding , have you written these points considering only sharding and no partitioning or have you written considering both sharding and partitioning ??

  • @pranjalchoudhury1670
    @pranjalchoudhury1670 3 หลายเดือนก่อน

    Nicely expalined. :)

  • @ranjithpals
    @ranjithpals ปีที่แล้ว +2

    Thanks!

  • @sarthaknarayan2159
    @sarthaknarayan2159 ปีที่แล้ว

    Awesome!!!!

  • @vikasbhutra9400
    @vikasbhutra9400 2 ปีที่แล้ว +1

    Thanks a lot Arpit for explaining in so simplistic way. One request can you please make video on Sharding strategies and also on how composite indexes stores in the disk.

    • @AsliEngineering
      @AsliEngineering  2 ปีที่แล้ว

      Soon.

    • @hc90919
      @hc90919 ปีที่แล้ว

      @asli engineering - Bhai, any update on the sharding strategies.
      Also, one more request is examples of scenarios to explain shard key selection.
      How is the data replicated behind the scenes n stuff please ?

  • @ankitmaheshwari2341
    @ankitmaheshwari2341 11 หลายเดือนก่อน

    Do we use sharding when we have better options available like Oracle RAC where database can be scaled horizontally

  • @aditiagarwal7081
    @aditiagarwal7081 14 วันที่ผ่านมา

    When running two databases on the same machine, are we not still sharing the same underlying resources such as CPU, memory, and disk I/O?

  • @user-dq8sg4ik5k
    @user-dq8sg4ik5k 10 หลายเดือนก่อน

    literally one of the based video i have ever seen on this topic.

  • @hemsagarpatel8992
    @hemsagarpatel8992 ปีที่แล้ว

    If we had horizontal partitioning and 1 partition getting so much traffic in real time how can we load balance the traffic. is it possible

  • @sumeetsingh1729
    @sumeetsingh1729 2 หลายเดือนก่อน

    how's it decided which shard is hit by request? Is there any router in front ensuring routing of requests?

  • @tawseefbhat977
    @tawseefbhat977 ปีที่แล้ว

    how do we know which partition or shard our data is located when we make query? any detailed explantion

  • @pixiedustdreams
    @pixiedustdreams 26 วันที่ผ่านมา

    I think I'm in love with this guy. 😢

  • @rahulpanjwani1887
    @rahulpanjwani1887 ปีที่แล้ว +1

    Beautiful

    • @rahulpanjwani1887
      @rahulpanjwani1887 ปีที่แล้ว +1

      It makes you understand the value of a unified data platform team when scale increases.

  • @geekmuralin
    @geekmuralin 8 หลายเดือนก่อน

    Wow

  • @aditigupta6870
    @aditigupta6870 3 หลายเดือนก่อน

    Hello arpit, at 5:49, why you mentioned that the new resources are being allocated to the EC2 machine? I think that should be allocated to the DB server running on EC2 machine right?

    • @AsliEngineering
      @AsliEngineering  3 หลายเดือนก่อน

      I meant the server running the database. The database is eventually running on some VM.

    • @aditigupta6870
      @aditigupta6870 3 หลายเดือนก่อน

      @@AsliEngineering thanks arpit

  • @ohmygosh6176
    @ohmygosh6176 ปีที่แล้ว

    Cross sharding quiries very very expensive. Its best to use tools to find out how the database is being used before making these decisions. I use PG Analizer tool for PostgreSQL

  • @kritibindra4232
    @kritibindra4232 ปีที่แล้ว

    Also which software did you use in this video to create pictures and write content?

  • @GaneshSrivatsavaGottipati
    @GaneshSrivatsavaGottipati หลายเดือนก่อน

    what if we have read replicas and still have partitioning?

  • @abhigujjar7439
    @abhigujjar7439 10 หลายเดือนก่อน

    Can you please share the notes

  • @imperfecto7734
    @imperfecto7734 9 หลายเดือนก่อน +1

    @arpit what's the benefit of partitioning the data but not sharding it. Can you give me a usecase please?

    • @AsliEngineering
      @AsliEngineering  9 หลายเดือนก่อน +2

      Partitioning allows your database to read/access/move the required subset of data easily and efficiently.
      1. Imagine if you partition data by time and create one partition for every hour and someone queries how many events happened in the last 10 hours, you would just need to access last 10 partition to fulfil this query. Others are not even required to be read.
      2. In a distributed setup, instead of moving individual rows/elements we can easily and efficiently move partitions across the cluster for balancing the load.

    • @imperfecto7734
      @imperfecto7734 9 หลายเดือนก่อน

      Understood! Thanks 🙏

  • @sachinjindal4921
    @sachinjindal4921 2 ปีที่แล้ว +1

    Awesome, can you give some practical examples.

    • @AsliEngineering
      @AsliEngineering  2 ปีที่แล้ว +1

      These are practical as they can get keeping it generic and not touching upon SRE side of things :) Every database comes it its own partitioning and sharding strategy and we need to go through their documentation to apply it.
      I talked about using a database proxy to bifurcate the request in one of the earlier videos, in case you are looking for that.
      Would recommend you picking a database and seeing how you can actually create shards and manage them. ElasticSearch can be a great start.

  • @aditigupta6870
    @aditigupta6870 3 หลายเดือนก่อน

    One shard also must be having replicas right? I mean if a shard is handling the first 2 partitions, then all data from those first 2 partitions will go to this shard, but what if the shard is down?

    • @AsliEngineering
      @AsliEngineering  3 หลายเดือนก่อน

      shared can have replicas to scale the reads. If the shard goes down, then either you auto promote replica to take over, or take the downtime.

  • @Bluesky-rn1mc
    @Bluesky-rn1mc 2 ปีที่แล้ว +1

    how foreign key constraints are managed when two tables are in different shards ?

    • @AsliEngineering
      @AsliEngineering  2 ปีที่แล้ว +6

      Foreign keys are dropped when you adopt sharding. You cannot maintain FK when data is partitioned across multiple shards.

    • @Bluesky-rn1mc
      @Bluesky-rn1mc 2 ปีที่แล้ว

      @@AsliEngineering thanks

  • @arbazadam3407
    @arbazadam3407 ปีที่แล้ว

    When you say we can have these partitions on the same server? That confuses me. On my linux server i installed MySQL which runs on port 3306. I have one MySQL process in this situation, so how can i spread the partition on this server.

    • @AsliEngineering
      @AsliEngineering  ปีที่แล้ว

      multiple databases within same MySQL server.

  • @shrad6611
    @shrad6611 6 หลายเดือนก่อน

    finally I understand what sharding is, thanks a ton

  • @dbads
    @dbads 2 ปีที่แล้ว +1

    💯

  • @gigachad400
    @gigachad400 ปีที่แล้ว +1

    One of the biggest disadvantages of sharding over a SQL server is you lose the ACIDity so you have to be careful while you doing it with SQL databases

  • @sachthecool
    @sachthecool ปีที่แล้ว

    Hi Arpit... You have nice videos. I like interviewes with people involved in growing high scale systems.
    However in this video, concept explained is wrong. Partition & Shards are same (term is used interchangeably). What you are referring as Shard is Nodes (or host container). You may want to correct the same. Hope this helps.

    • @AsliEngineering
      @AsliEngineering  ปีที่แล้ว

      I agree the terms are used interchangeably; but overall what i explained is correct also I cleared the same in the video as well.

  • @jineshbagrecha6278
    @jineshbagrecha6278 ปีที่แล้ว

    When to use master master, master candidate master replications?

    • @AsliEngineering
      @AsliEngineering  ปีที่แล้ว

      master master - scaling writes beyond one machine
      master replica - scaling reads

  • @GaganJain2508
    @GaganJain2508 10 หลายเดือนก่อน

    Does it mean Sharding and replication are the same? 22:16

  • @user-nu5nn7by6t
    @user-nu5nn7by6t 3 ชั่วโมงที่ผ่านมา

    How we know in which shard our data resides?

    • @AsliEngineering
      @AsliEngineering  ชั่วโมงที่ผ่านมา

      That depends on your routing strategy - Range/Hash/Static. In any case, you pick a partitioning key and depending on the approach you deduce which shard to go to.

  • @aadimanchekar1032
    @aadimanchekar1032 ปีที่แล้ว

    How do we know that in which partition does the data lie?

  • @pranavnadimpalli4929
    @pranavnadimpalli4929 ปีที่แล้ว

    22:34 cross share queries are expensive

  • @iHariPatel
    @iHariPatel 7 หลายเดือนก่อน

    As my view Partition is more complex because you have to work with partition key! With wrong query accidentally query scan all partition’s.

  • @mudassarh4268
    @mudassarh4268 2 ปีที่แล้ว

    Sharding strategies could have been taken up like range based and hash based sharing with their user case

    • @AsliEngineering
      @AsliEngineering  2 ปีที่แล้ว +1

      Sir. Video would have been too long. No one would have watched it. But definitely planning it for the next one.

    • @mudassarh4268
      @mudassarh4268 2 ปีที่แล้ว

      Definitely sirji that could have added another 30 mins of content. Awesome content as always and looking forward to further stuff 👍

  • @ManojYadav-ls6wo
    @ManojYadav-ls6wo หลายเดือนก่อน

    12:10
    20:12 👍👍

  • @anupkut
    @anupkut 4 หลายเดือนก่อน

    I think we should not consider only read replicas as sharding concept.

  • @abhishekdhillon7110
    @abhishekdhillon7110 6 หลายเดือนก่อน +1

    dude, the way you have explained higher availability as an advantage of sharding is not right. When you have a sharded DB and various shards live on different servers, if one of the shards go down, availability is not an advantage since you can't perform any operations on that specific shard which is not available. For example, if you have two shards named A and B, if shard is down or not available, you can't read anything from that shard so all of the queries that are expected to read from shard A would fail unless you have a read replica of that shard. I feel that there is a better way to explain it. However, thanks for all your efforts and your content is helpful to a large extent.

    • @AsliEngineering
      @AsliEngineering  6 หลายเดือนก่อน

      Yes we cannot perform operation on that shard but we can still serve requests that can be served from the other shards. Hence the system still remains partially available.

  • @kumarshubham4640
    @kumarshubham4640 2 หลายเดือนก่อน

    Why course price exceeded by 20k in 1 year?

    • @AsliEngineering
      @AsliEngineering  2 หลายเดือนก่อน +1

      In 2 years, not one.
      The course has changed completely and I go much more in-depth and the sessions go for 4 hours each. Earlier it used to be 2.5

  • @akshatreddy9870
    @akshatreddy9870 2 หลายเดือนก่อน

    Hi

  • @arun10071990
    @arun10071990 5 หลายเดือนก่อน

    I think sharding has specific use cases not every solution requires sharding. The way he arrives at sharding solution is totally absurd.
    If one really wants to scale the writes he can also upscale the master db servers. Why to shard then ?

    • @AsliEngineering
      @AsliEngineering  5 หลายเดือนก่อน

      When did I not consider vertical scaling?

    • @arun10071990
      @arun10071990 5 หลายเดือนก่อน +1

      @@AsliEngineering it's not about vertical scaling it's about we can scale database with horizontal scaling and that too without using sharding
      Like multiple master servers for writes and multiple slave servers to handle reads

  • @sharoonaustin551
    @sharoonaustin551 ปีที่แล้ว

    Small suggestion ad beech me mat daala karo bro, concentration toot jaata hai

    • @AsliEngineering
      @AsliEngineering  ปีที่แล้ว +1

      TH-cam daalta hai. I just enable them. It is upto their algorithm to decide where to place.

    • @AsliEngineering
      @AsliEngineering  ปีที่แล้ว +2

      And I totally understand your frustration with ads but the world runs on them. Can't do much without it.

  • @luisdanielmesa
    @luisdanielmesa 8 หลายเดือนก่อน

    We both worked for Amazon and you know nobody there would have taken this course... So you're either lying or... nah, you're lying.

    • @AsliEngineering
      @AsliEngineering  8 หลายเดือนก่อน

      15 SDE-2s, 3 SDE-3, 1 PE and 1 HoE took my course. If you do not to believe it is upto you.

    • @AsliEngineering
      @AsliEngineering  8 หลายเดือนก่อน

      Fun fact, after I replied to your comment I went on a 1:1 call and it was with an SDE-2 at Amazon working in CCF org :D

  • @jose000
    @jose000 2 ปีที่แล้ว

    Iio

  • @akshatreddy9870
    @akshatreddy9870 2 หลายเดือนก่อน

    Very bad. Hindu never shave off moustache and keep beard. Mussalman banne ka irada hain keya ? Please understand that you are Sanatani

  • @akshatreddy9870
    @akshatreddy9870 2 หลายเดือนก่อน

    Either shave both beard and moustache or keep both moustache and beard. Don't just shave moustache only and keep beard.

    • @iMakeYoutubeConfused
      @iMakeYoutubeConfused 2 หลายเดือนก่อน

      He's put so much effor into the content of this video and this is all what you've got to say?

  • @eatajerkpal99
    @eatajerkpal99 2 หลายเดือนก่อน

    Hey arpit acan drop link for the notes that you presented in this video, thanks!

    • @eatajerkpal99
      @eatajerkpal99 2 หลายเดือนก่อน +1

      found them on your github, i wont spam anymore. thanks!!

  • @amogu_07
    @amogu_07 2 หลายเดือนก่อน

    thank you so much , clearly understood!!