Key based Sharding | Shard Key | Hash function | Advantages and disadvantages | 2021

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ก.ค. 2024
  • This is the twenty-second video in the series of System Design Primer Course. We talk about one more important component of System Design: Key based sharding.
    We want software engineers and aspiring software engineers to develop basics and get ready for the world of interviews as well as excelling as a Software Engineer.
    More on the video:
    www.citusdata.com/blog/2018/0...
    blog.yugabyte.com/four-data-s...
    ------------------------------------------------------------------
    Recommendations
    ------------------------------------------------------------------
    Our full courses on youtube:
    ✒ System Design Primer Course: • System Design Primer C...
    ✒ REST APIs made easy: • REST APIs MADE EASY
    Some paid courses that we recommend:
    ✒Educative.io: bit.ly/3qnW5ku
    ✒Interviewready.io: get.interviewready.io/ (Use coupon code SUDOCODE for extra discount)
    ------------------------------------------------------------------
    About Us
    ------------------------------------------------------------------
    Created and Instructed by:
    Yogita Sharma
    ✒ LinkedIn - / yogita-sharma-83400b55
    ✒ Instagram - / sudo.code1
    ✒ Facebook - / sudo.code
    ✒ Medium - / yogita088
    Post-production(editing, thumbnail etc) managed by:
    CiKi
    ✒ Website: www.ciki.co.in
    ✒ LinkedIn: / 74735937
    Colors and design by:
    Naini Todi
    ✒ LinkedIn - / nainitodi
    Both Arpit and Yogita are software engineers and want to help other software engineers become better by providing high quality and well researched content by adding their creativity and teaching twist.
    ------------------------------------------------------------------
    Join Us
    ------------------------------------------------------------------
    Hangout with sudoCode:
    ✒Discord Server: / discord
    For business:
    ✒Email: sudocode.yogita@gmail.com
    ------------------------------------------------------------------------------------------------------------------------------------
    Timestamps:
    0:00 - Intro
    0:27 - What is key based sharding ?
    1:35 - Shard key and Hash function
    3:44 - Advantages of key based sharding
    4:37 - Disadvantages of key based sharding
    5:44 - How to use a sharding key ?
    7:20 - Outro

ความคิดเห็น • 56

  • @roshedulalamraju7936
    @roshedulalamraju7936 ปีที่แล้ว

    Watched lot of videos but your one is the best. Love you. Thanks.

  • @akshayshah8264
    @akshayshah8264 2 ปีที่แล้ว

    A concise and to the point video. Thank you

  • @TheANKUSHVIDEOYOUTUB
    @TheANKUSHVIDEOYOUTUB 3 ปีที่แล้ว

    Good to have very informative and clear understanding video from Indian
    Keep it up and wish for your great journey 🙏

  • @sreenivasnaidu6904
    @sreenivasnaidu6904 2 ปีที่แล้ว

    simple and very informative

  • @vijaysharma5679
    @vijaysharma5679 2 ปีที่แล้ว +1

    Hi Yogita, Your videos are very helpful for understanding system design concepts throughly. Can you please make some videos on Cassandra and Hbase too? More specifically on Data Modeling, and using some existing system designs examples of where can we use MySql, Cassandra, and Hbase.
    Thanks for your wonderful contribution to our preparation.

  • @HamidAli-gy4um
    @HamidAli-gy4um 2 ปีที่แล้ว

    It is very Informative Thank you

  • @basawarajshivashetty5232
    @basawarajshivashetty5232 3 ปีที่แล้ว

    It's great source to learn system designing..mam can you please make videos on interview design questions

  • @neha6000
    @neha6000 ปีที่แล้ว

    Thank for nicely explanation

  • @thecloudbaba8668
    @thecloudbaba8668 2 หลายเดือนก่อน

    wonderful @sudoCode chossing a right Sharding key or combinations is crucial and needs advised an deep decisions ... kudos to you!

  • @vazzdoin
    @vazzdoin ปีที่แล้ว

    good work..

  • @hi2ritesh
    @hi2ritesh 3 ปีที่แล้ว +1

    very informative Yogita, thank you for making this video. One question what will happen if we have to move multiple tables into different physical sharding? in that case fetching the data requires to contact multiple shards, right?

  • @subee128
    @subee128 หลายเดือนก่อน

    Thank you very much

  • @kunalsoni7681
    @kunalsoni7681 3 ปีที่แล้ว

    Wow it's so informative video 😊

    • @sudocode
      @sudocode  3 ปีที่แล้ว +1

      Thanks a lot 😊

  • @SakshiSingh-arcane05
    @SakshiSingh-arcane05 ปีที่แล้ว

    very informative yogita, thanks a lot!
    I have a question, say Tinder supports location based sharding although if I go to another city, I will have to be immediately added to that DB, and if we consider multiple such movements, how would tinder handle such dynamic changes?

  • @ekoprasetyooo
    @ekoprasetyooo ปีที่แล้ว

    thanks for the video. I have a question, what the different with partition key? looks like same ya for the function after all. Thanks

  • @TheDiscreet
    @TheDiscreet 2 ปีที่แล้ว

    Very good introduction to sharding... How is replication taken care of in case of sharded DB servers.. How is it made fault tolerant..

  • @eros_1234
    @eros_1234 3 ปีที่แล้ว +1

    Nice lecture Mam😇😇
    Keep it up.✌✌

    • @sudocode
      @sudocode  3 ปีที่แล้ว +1

      Thanks a lot

  • @deepakjiitn
    @deepakjiitn 4 หลายเดือนก่อน

    One disadvantage is that in such sharing range queries can’t be performed. For example list top 10 employees based on their salary.

  • @sundaramjha1776
    @sundaramjha1776 3 ปีที่แล้ว

    Please if u can create a video on how we do mysql sharding from scratch by practical example. Appreciate ur effort. Great video content. Keep continue 👍

    • @sudocode
      @sudocode  3 ปีที่แล้ว +1

      Noted. It is in pipeline.

  • @fadeddota4956
    @fadeddota4956 หลายเดือนก่อน

    Rather than moving data to shard three cant we modify hash function in such a way that every write operation will store to shard three?

  • @ravikumar-yq5df
    @ravikumar-yq5df 3 ปีที่แล้ว

    Please also consider implementing these concepts

  • @babitanewbaby
    @babitanewbaby 3 ปีที่แล้ว

    Is the right shard key _id :hashed ? If not explain what is the reason

  • @TheDiscreet
    @TheDiscreet 2 ปีที่แล้ว

    I have a confusion on where load balancer for DB servers come into picture if DB is sharded.. I am not able to visualize Load balancer + sharding DB servers.. DB sharding is done to distribute data.. then we would have different DB servers.. but load balancer cannot send request based on the load to any of the sharded DB server instances? Kindly shed some light on this

  • @sunilkj8281
    @sunilkj8281 2 ปีที่แล้ว

    Hi Yogita, thanks for the video, very nicely explained. One doubt...
    Since the records are distributed in shards based on the sharding key(say city), how will each shard have equal number of records?

    • @answerme4147
      @answerme4147 2 ปีที่แล้ว +2

      In case of city Name as a sharding key, There is not any guarantee of equal no. of records in each shard,

  • @rupeshjha4717
    @rupeshjha4717 3 ปีที่แล้ว

    Short, clear, and easy to understand!!
    Ma'am when we will use the city as shard-key then let's say a user is in berlin on 20 Feb 2021 and now on 24th Feb 2021 he moved to Banglore, so in that case how the situation would be handled because earlier the write operating was going to key = Hash(berlin) shard but now the key will be changed as key = Hash(Banglore). So if that user sends a read or write operation then the data is not there in the shard having Key = Hash(Banglore) ??

    • @sudocode
      @sudocode  3 ปีที่แล้ว +1

      Exactly, that's why in key based sharding we choose keys which are static and don;t change frequently. If a user's city is changing frequently then we don't choose that as sharding key

  • @AshishShukla-ro8oy
    @AshishShukla-ro8oy 3 ปีที่แล้ว

    weather it can be the case that out data is not equally distributed among all the shard's and there is large load on one of the shard's....weather there is a way to over come this? sorry if my ques is bogus...

  • @swaroopas5207
    @swaroopas5207 2 ปีที่แล้ว

    Nice Video. Suppose we have 2 tables in the database. Should both the tables be sharded if we go for sharding? If so how can we derive the sharding key in multiple tables cases?

    • @sudocode
      @sudocode  2 ปีที่แล้ว

      Depends on table sizes. Both shall have different sharding keys depending on the use case.

  • @ChandraShekhar-by3cd
    @ChandraShekhar-by3cd 3 ปีที่แล้ว +1

    Also please cover topics such as :
    - Consistent Hashing
    - Quad Tree
    - Web Socket , HTTP and LONG POLLING over HTTPS
    - REDIS CACHE
    - CASSABDRA
    Thanks

    • @sudocode
      @sudocode  3 ปีที่แล้ว +1

      All of them are pipeline!

    • @brajkishoredubey9319
      @brajkishoredubey9319 ปีที่แล้ว

      @@sudocode any update ? I am specifically looking for consistent hashing

  • @shrikantkhadilkar4019
    @shrikantkhadilkar4019 2 ปีที่แล้ว

    Quick question: For systems like Netflix - if we choose a sharding key based on the city and the user is travelling to a different city - how does the system handle this? What are the strategues that we could use?

    • @sudocode
      @sudocode  2 ปีที่แล้ว

      Mostly they key won’t be chosen as per city. Key is chosen in a way such that data can be equally distributed. In case of Netflix data could be divided on the basis of categories, origin country or even alphabetically or creation date ranges etc.

  • @streammxc
    @streammxc 3 ปีที่แล้ว

    Nice video! One question here, I am kind of wondering how to query on the sharded database? Will the query run on each of them?

    • @sudocode
      @sudocode  3 ปีที่แล้ว

      Nope, the algorithmic sharding logic will take care of routing the query. This is one of the advantages of sharding that the read and write query performance increases a lot since these queries run only on relevant shard and not the whole data.

    • @streammxc
      @streammxc 3 ปีที่แล้ว

      ​@@sudocode If the shard key is user's first name, but the query selecting all the users that have the same last name, for example, Smith. In that case, it has to query through all the shards, is that correct?

    • @kunal_tanti
      @kunal_tanti 3 ปีที่แล้ว +1

      @@streammxc Yes, in that case you have to send the query to all the shards and aggregate the result. That is called scatter and gather.
      And this is one bottleneck in key based sharding.. any non key based read query will not be efficient.

  • @ChandraShekhar-by3cd
    @ChandraShekhar-by3cd 3 ปีที่แล้ว +1

    Thanks a lot for such a detailed and informative video. Request you to upload some video on full system design on UBER, FACEBOOK or any other system so that we can apply all the concepts that we have learnt into one system. Thanks a lot for your time and hard work. Truly appreciate your dedication. Kind regards.

    • @sudocode
      @sudocode  3 ปีที่แล้ว

      Yes, will upload. :)

  • @JPN-bx3yd
    @JPN-bx3yd ปีที่แล้ว

    How do you handle changes in shard key? For instance, the record is stored in DB1. What will happen if the shard key changes for the record and say the hash function will return DB2? The hash function will return a different database and the record will not exist there. How do you handle this scenario?

    • @FoodpediaIndia
      @FoodpediaIndia ปีที่แล้ว

      Usually you don’t change shrad key much often. Let’s say you having shrad key on City Coulmn and persons are mapped to shards based on the city. Let’s say a person living in Newyork mapped to shrad 1 and he later moved to SFO, in this scenario, any read or write request will be passing through partition awared load balancer , which would first calculate the hash of that shrad 🔑 which is 1, and write updates the city column which is now SFO. Now any CDC I.e Change of Data Change will detect and reshrad this key to shard 2 ( SFO) associated and remember this shrad 2 may be replicated for better throughout so, we need to make sure we created this entry in all replicas.

  • @priyankataneja7347
    @priyankataneja7347 ปีที่แล้ว

    In this Key based Sharding we are first running the hash function on userId and based on the response we are adding data to Shards? In that case if I have 5000 rows I need to run hash function on 5000 userIds and then decide in which shard it should store?

  • @ajaypathak8107
    @ajaypathak8107 3 ปีที่แล้ว

    👍

  • @mohitagrawal1555
    @mohitagrawal1555 ปีที่แล้ว

    suppose the data is shared with location as key and now i want to fetch the information using the user Id and dont know the location. So, how could i find out in which shard the user with given user id exit ??

    • @abhinavjaglan1782
      @abhinavjaglan1782 7 หลายเดือนก่อน

      need to aggregate all the shards data and then do a filter on them

  • @babitanewbaby
    @babitanewbaby 3 ปีที่แล้ว

    I am not understand how to shard key choose, briefly explain in the comment box

  • @mukeshbisht2411
    @mukeshbisht2411 2 ปีที่แล้ว

    apart from good content, nice decorated heading is awesome:-)

    • @sudocode
      @sudocode  2 ปีที่แล้ว

      Yeah thanks

  • @sumonmal009
    @sumonmal009 3 ปีที่แล้ว +1

    shard key 1:15
    hash shard is known as algorithmic sharding 2:00
    disadvantage hash shard 4:50
    to solve the hash shard problem use consistent hashing