What is CONSISTENT HASHING and Where is it used?

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 พ.ค. 2024
  • Load Balancing is a key concept to system design. One of the popular ways to balance load in a system is to use the concept of consistent hashing. Consistent Hashing allows requests to be mapped into hash buckets while allowing the system to add and remove nodes flexibly so as to maintain a good load factor on each machine.
    The standard way to hash objects is to map them to a search space, and then transfer the load to the mapped computer. A system using this policy is likely to suffer when new nodes are added or removed from it.
    Consistent Hashing maps servers to the key space and assigns requests(mapped to relevant buckets, called load) to the next clockwise server. Servers can then store relevant request data in them while allowing the system flexibility and scalability.
    Some terms you would here in system design interviews are Fault Tolerance, in which case a machine crashes. And Scalability, in which case machines need to be added to process more requests. These two principles are allowed by Consistent Hashing, and hence it is an important building block to a system design architect's toolbox.
    Another term used often is request allocation. This means assigning a request to a server. Consistent hashing assigns requests to the servers in a way that the load is balanced are remains close to equal.
    Server architecture is a subjective concept, and there are outliers for many cases. Don't think of Consistent Hashing as a silver bullet for fault tolerance and scalability, but a useful concept for request allocation.
    Use it to solve software questions in interviews and real life. Best of luck!
    Prerequisite: • What is LOAD BALANCING...
    Recommended system design video course:
    interviewready.io
    00:00 Request Hashing
    03:00 Request Mapping
    06:02 Problems
    07:01 Virtual Servers
    09:40 Applications
    10:18 Thank you!
    Along with video lectures, this course has architecture diagrams, capacity planning, API contracts and evaluation tests. It's a complete package.
    References:
    www.hackerearth.com/practice/...
    www.tomkleinpeter.com/2008/03/...
    michaelnielsen.org/blog/consis...
    • Consistent Hashing - G...
    System Design:
    highscalability.com/
    • What is System Design?
    Code:
    github.com/coding-parrot/Syst...
    #consistent-hashing #system-design #load-balancing

ความคิดเห็น • 647

  • @pranay020692
    @pranay020692 4 ปีที่แล้ว +427

    Sitting in the hotel room, watching this 1 hour before my google interview in New York. Thanks Gaurav!

    • @gkcs
      @gkcs  4 ปีที่แล้ว +23

      All the best!

    • @gkcs
      @gkcs  4 ปีที่แล้ว +18

      @@pranay020692 wow, tough stuff. How'd you reckon it went?

    • @pranay020692
      @pranay020692 4 ปีที่แล้ว +60

      @@gkcs I believe, It went well. I have watched most of your system design videos, they were quite helpful. I am on the junior side 3 YOE so I think they went easy on me in Sys Design. Also, I was able to complete all coding questions in time. Google is always a long shot though. 🤞🤞

    • @karthikmucheli7930
      @karthikmucheli7930 4 ปีที่แล้ว +16

      @@pranay020692 hope you got the job

    • @Leptoszom
      @Leptoszom 3 ปีที่แล้ว +6

      You got the job, Bajpai?

  • @shreysom2060
    @shreysom2060 3 ปีที่แล้ว +73

    I used to see your "Competitive Programming" videos before getting into a company and now after getting learning things there ,I am watching your "System Design" it feels good to grow with this channel. Thank you so much 😊

  • @headoverbars8750
    @headoverbars8750 3 ปีที่แล้ว +7

    What an outstanding video!
    No shortage of tutorials on how to code or write algorithms out there buy not enough on Systems design...
    This is truly outstanding... been writing software 10 years and fringely do I touch these concepts, heck work within them daily yet either forgot or never knew.
    Thanks so much!!

  • @timurmukhtarov1319
    @timurmukhtarov1319 4 ปีที่แล้ว +9

    This was amazing! Havent seen other videos that talked about provisioning virtual servers/using multiple hash functions! Hooked!

  • @andreimarculescu911
    @andreimarculescu911 5 ปีที่แล้ว +591

    the best solution is not to use K hash functions, but to generate K replica ids for each server id. Designing K hash functions while maintaining random uniformity and consistency is hard. Generating K replica ids is easy: xxx gives K replicas xxx + '1', xxx + '2', ..., xxx + 'K'. Then you take these replicas and generate K points on the ring with the same hash function
    and this is what is actually used in practice. Chord algorithm is just an example of this technique to add K replicas for each server id

    • @gkcs
      @gkcs  5 ปีที่แล้ว +84

      That makes sense. K numbers assigned to each server would do the job :)

    • @pradipacharjee4915
      @pradipacharjee4915 5 ปีที่แล้ว

      Hi Andrei, can you just tell me how to choose idle replica count(k) ? for efficiently add or remove servers.

    • @dudejaa
      @dudejaa 5 ปีที่แล้ว +6

      The example that you took mentions xxx+1,+2,+3...+k. Correct me if I am wrong but if you assign k consecutive numbers to the same server the load wouldn't distribute (on adding or removing a server) uniformly. That could be one reason to look for different hash functions ?

    • @charchitpatodi8677
      @charchitpatodi8677 5 ปีที่แล้ว +6

      @@dudejaa Just a thought : he probably not means +1, +2... instead if xxx is id, M is ring capacity and k is number of servers then second position (after hash(xxx) )will be hash(xxx) + (M/k) OR hash(xxx+M/k).. And probably third position will be hash(xxx) + 2*(M/k) and so on till multiple of 'k'

    • @rishabhmalhotra7058
      @rishabhmalhotra7058 5 ปีที่แล้ว +25

      @Abhishek Dudeja xxx, xxx+1.. are ids for one server to take a hash on and then reach the respective points on the ring, not the points on the ring itself. And then the hash generated on xxx and on xxx+1.. would be completely different and random, and hence would plot k uniformly random points.
      @CHARCHIT PATODI I dont think that's the case cause if you think about it , if you add multiple servers each with k different points with that technique -> hash(xxx) + 2*(M/k)..till K, then you're not really randomizing and there would be no difference between adding 1 point or k points per server when it comes to choosing a server for a request. It would be like if you multiplied the ring length into k after choosing one point per server which would not get us what we want.

  • @akshatagrawal3300
    @akshatagrawal3300 3 ปีที่แล้ว

    You are simply amazing gaurav, system design concepts couldn't be explained better than this!

  • @jeffruan7701
    @jeffruan7701 4 ปีที่แล้ว +23

    Knowledgeable and confident presenter!

  • @Justinkol
    @Justinkol 6 ปีที่แล้ว +2

    Thanks for making these videos! I was always unsure about load balancing, but this helped explain a lot of my unanswered questions :)

    • @gkcs
      @gkcs  6 ปีที่แล้ว

      Glad it helped :)

  • @harshdusane8687
    @harshdusane8687 5 ปีที่แล้ว +1

    Awesome explanation. This has truly elevated my understanding of Hashing and Load Balancing in general. Keep up the good work!!!! :)

  • @bouzie8000
    @bouzie8000 2 หลายเดือนก่อน

    That virtual server solution blew my mind I'm so sorry. Geniuses have really paved the way for us in computer science.

  • @user-oy4kf5wr8l
    @user-oy4kf5wr8l 4 ปีที่แล้ว +1

    u r amazing Gaurav! i watched ur video one year ago, i didnt understand then, now i watch again lol ...i understand most of it... thank u !

  • @UlfAslak
    @UlfAslak 2 ปีที่แล้ว +70

    Notes to self:
    * The previous video gives the impression that there is a mapping from ranges of integers to server ids, and that consistent hashing is about to mapping request ids to integers in ranges resulting in more consistent routing of requests to same servers.
    -> I did realize that this would not work very well over time, as you would end up completely changing the ranges for higher-index servers with the addition of multiple servers.
    * In this video, requests ids map to an index in a ring with `M` indices. The "trick" then, is the map the server indices to indices in the ring using the same hash function that also hashes request ids. Now, to assign a server to a request, one simply looks clockwise for the nearest server.
    * To make it less likely that load will be unbalanced due to (what I would call) unlucky hashing, another idea is used: simply have multiple hash functions for the servers, such as to map them to multiple locations in the index ring! (clever).
    * @Andrei Marculescu points out that better than using multiple hash functions for server ids, it is easier to maintain multiple aliases for each server id ("...xxx gives K replicas xxx + '1', xxx + '2', ..., xxx + 'K'.") and thus map servers to multiple locations in the index ring.

    • @Luk3Stein
      @Luk3Stein 2 ปีที่แล้ว +2

      Thank you!! I was having so much doubts after watching, reading this made it more clear.

    • @codingfork6708
      @codingfork6708 2 ปีที่แล้ว

      How can we determine the value of `M`? Is [0, M-1] the range of the output of the hash function?

    • @UlfAslak
      @UlfAslak 2 ปีที่แล้ว

      @@codingfork6708 Correct. I think there are good heuristics for choosing M (and probably everyone uses the same standard values). Your hash function has to apply modulus M, otherwise you get an index out of range.

    • @nxpy6684
      @nxpy6684 ปีที่แล้ว

      Thank you! This helped me a lot!

  • @SP-db6sh
    @SP-db6sh ปีที่แล้ว +1

    This channel is like System-Design Wala , far far better than most paid courses, simple explanation

  • @raghuvamsi8740
    @raghuvamsi8740 4 ปีที่แล้ว +2

    After this video, I downloaded the entire playlist!! More love More support!! Gratitude _/\_

  • @jananiravichandran8370
    @jananiravichandran8370 6 ปีที่แล้ว +7

    Thanks for doing this! Your videos have really helped me understand things better =)

    • @gkcs
      @gkcs  6 ปีที่แล้ว

      Thanks Janani!

  • @johnleonardo
    @johnleonardo 2 ปีที่แล้ว +2

    your content is insanely good. seriously, the best! you were destined to teach others!

  • @shishirkumar8335
    @shishirkumar8335 5 ปีที่แล้ว +1

    Great video.
    One comment is using consistent hashing seems good option for distributed search scenarios (like you pointed for distributed cache, DB search algo) but not for use cases of load balancing where nodes are added to server large numbers of request (like web servers, applications etc).
    Please comment your view

  • @krishnasandeep4779
    @krishnasandeep4779 4 ปีที่แล้ว +1

    Your Videos are very informative. Thanks for making it Gaurav. Your explanation is crisp and Clear

  • @consistentthoughts826
    @consistentthoughts826 3 ปีที่แล้ว +2

    When you said try to think of solution, first thing come to my mind is "change the hash function"
    Thank god i'm understanding it well and its first time I am studying

  • @AbhishekKumar-ub8co
    @AbhishekKumar-ub8co 5 ปีที่แล้ว +3

    I loved the way with ease and simplicity you explained the problem using some pictorial diagram.
    Good work keep it up!!

    • @gkcs
      @gkcs  5 ปีที่แล้ว

      Thank you 😋

  • @fiveyearclub6024
    @fiveyearclub6024 5 ปีที่แล้ว

    Super helpful, thanks! I never got a CS degree and needed to learn more about sharding.

  • @jrajesh11
    @jrajesh11 3 ปีที่แล้ว +1

    Simply brilliant and clear explanation . Keep doing such awesome work.

  • @subhabera5775
    @subhabera5775 4 ปีที่แล้ว

    Legendary tutorial, specially I really like where you try to prove your logicswith mathematical equations, same goes for one of the video called "finding loop in a linked list". Thanks Gaurav again :)

  • @arnabthakuria2243
    @arnabthakuria2243 2 ปีที่แล้ว

    Learned a lot from the actual implementation in the attached git repo . Thanks

  • @AbhishekChoudhary-tu7ig
    @AbhishekChoudhary-tu7ig 3 ปีที่แล้ว +1

    I am a 3rd sem student and I guess I should not be bothering about these things but your explanations are sooooo gooood that I always wanna watch them :D

  • @rishabhagarwal9871
    @rishabhagarwal9871 5 ปีที่แล้ว +2

    A good video. I am really impressed. Thanks a lot.

  • @asafmesika
    @asafmesika ปีที่แล้ว

    Brilliant explanation! I read the Wikipedia article on this and Cassandra docs and your video clicked everything together!

  • @gautamtyagi8846
    @gautamtyagi8846 3 ปีที่แล้ว

    many thanks Gaurav for making this concept so clearly explained.

  • @SuiMizu
    @SuiMizu 5 ปีที่แล้ว +5

    You are a really good teacher, Gaurav! Please keep up your good work! :)

    • @gkcs
      @gkcs  5 ปีที่แล้ว

      Thanks!

  • @nafeezahid214
    @nafeezahid214 4 ปีที่แล้ว

    Excellent video Master. Thanks a lot.

  • @azeeztaiwo2802
    @azeeztaiwo2802 3 ปีที่แล้ว

    best explanation of consistent hashing i have seen so far.

  • @mattwilson1845
    @mattwilson1845 5 ปีที่แล้ว

    Awesome, thanks for making this video, really helped me understand. :D

    • @gkcs
      @gkcs  5 ปีที่แล้ว

      Glad to hear that :D

  • @deepakrao1100
    @deepakrao1100 5 ปีที่แล้ว +4

    boss code dal na ..!! studying for interviews with your videos, which btw the THE most helpful resource. Thanks for time you put into this !!!

    • @Arif.Amirov
      @Arif.Amirov 4 ปีที่แล้ว +1

      how did your interview go?

    • @gkcs
      @gkcs  3 ปีที่แล้ว

      A little late to arrive 🙈:
      github.com/coding-parrot/SystemDesignCourse/blob/master/service-orchestrator/src/main/java/algorithms/ConsistentHashing.java

  • @xiuwenzhong7375
    @xiuwenzhong7375 4 ปีที่แล้ว

    thx a lot, really helpful for people like me has no sense of system design.

  • @SayHelloMeetPatel
    @SayHelloMeetPatel 5 ปีที่แล้ว +2

    Very nice explanation. Really liked the video. Thanks. Keep making it. 👍

    • @gkcs
      @gkcs  5 ปีที่แล้ว

      Thank you!

  • @alexpanov4678
    @alexpanov4678 ปีที่แล้ว

    Thank you! It was clear to understand.

  • @jatinderarora2261
    @jatinderarora2261 5 ปีที่แล้ว

    Thanks Gaurav. Excellent video.

  • @satheeshprabhakaran5330
    @satheeshprabhakaran5330 4 ปีที่แล้ว

    Read the article about consistent hashing in wikipedia, this video has clearly articulated the core idea. Thank you!

  • @NehaKumari-my7cv
    @NehaKumari-my7cv 4 ปีที่แล้ว +2

    Hi Gaurav,
    Thanks for sharing such a nice concept.I have one doubt what happen if one server die suppose s1 for 2 hr and then again come back after that so in this case how request are handled.

  • @hellaren
    @hellaren 3 ปีที่แล้ว

    Thank you! It was extremely helpful

  • @gymbeestar
    @gymbeestar 5 ปีที่แล้ว

    This video is so helpful! Thanks!

  • @AbhishekKumar-ky3uc
    @AbhishekKumar-ky3uc 3 ปีที่แล้ว +4

    To be honest this video was more clear than the previos one in the playlist (what is load balancing), the pie chart concept in the previous one made me confused but this hopefully made it clear. Nice work!

    • @gkcs
      @gkcs  3 ปีที่แล้ว

      Thanks!

  • @tanvirt16
    @tanvirt16 3 ปีที่แล้ว +4

    Gaurav, thanks so much for your videos! Very informative and easy so follow despite the complexity of the concepts. Just had a couple questions from this video!
    #1
    So one of the original problems in the regular hashing solution was that when you add a new server, you'd have to destroy much of the local caches of the other servers because they become useless, which makes sense. So in this case, less changes occur, but how would you update the local caches to make sure you don't have to clear out the entire cache? Do you need some form of algorithm to determine what cache items should be evicted?
    #2
    Also, how about the algorithm required to determine what the "closest" server is in the ring which will serve the request? Is there a simple mathematical solution for that, or is it somewhat complex?
    It does seem that the additional complexity in maintaining a consistent hashing system is worth the advantages, just want to understand a bit about how complex it actually is, or if it's simply just a genius solution to a problem.

    • @soumyajitdas4433
      @soumyajitdas4433 ปีที่แล้ว

      Try looking into Chord Algorithm (en.wikipedia.org/wiki/Chord_(peer-to-peer) for #2
      Tl;dr; - every node in the hash ring maintains something called a finger table containing the information around it's predecessor node, successor node and also pointer to nodes (n+2, n+4, n+8 ... n+2^k). This way we can query any node and find the successor node to a particular hash value in O(log n) time.

  • @ashishmittal7048
    @ashishmittal7048 ปีที่แล้ว

    Thanks for the amazing video and describing the ring buffer based design for load balancing. I am wondering how this design will work efficiently when say for an example a subset of users are making too many requests? Because of consistent hashing the requests may land to the same machine , and certain machines might get more work assigned whereas all other machines are starving for the jobs.

  • @responsive_random
    @responsive_random 6 ปีที่แล้ว

    Clearly explained. Thank you!

    • @gkcs
      @gkcs  6 ปีที่แล้ว

      Thanks!

  • @vaibhav8257
    @vaibhav8257 3 หลายเดือนก่อน

    Thank you for Teaching this in such a nice manner.😊

  • @sivas09
    @sivas09 3 ปีที่แล้ว +2

    'n' being the number of servers and 'm' being possible hash values, would spacing out the servers at a value of m/n be a working solution?
    For ex - with m as 256 and n as 4, first server could be at 64, second be at 128, third at 192 and 4 at 256 - along those lines
    Understood the possibility of skewed allocations and the need for replicating ids tho. Hooked to your amazing content! kudos

  • @i-tingchen439
    @i-tingchen439 5 ปีที่แล้ว

    Very clear and helpful! Thank you.

    • @gkcs
      @gkcs  5 ปีที่แล้ว

      Thanks!

  • @osamaa.h.altameemi5592
    @osamaa.h.altameemi5592 4 ปีที่แล้ว

    fantastic explanation. Thx a ton.

  • @RihanPereira
    @RihanPereira 5 ปีที่แล้ว

    @Gaurav sen, hashing(applying h2 function) all the servers again, will reshuffle data and request routing of all existing nodes to entirely new nodes.

  • @xbeta84
    @xbeta84 4 ปีที่แล้ว +1

    This is great stuff!

  • @nankitable
    @nankitable 4 ปีที่แล้ว +5

    With multiple hash being applied, can there be case of collisions, i.e. multiple servers ending up on the same bucket? If not , why?
    If yes, how is it handled?

  • @chandrakantasaini6438
    @chandrakantasaini6438 ปีที่แล้ว

    Explain in an easy and nice way.

  • @SOULOFBUU7
    @SOULOFBUU7 6 ปีที่แล้ว +1

    Great explanation clear and concise

    • @gkcs
      @gkcs  6 ปีที่แล้ว

      Thanks!

  • @avinashkondaveti112
    @avinashkondaveti112 8 หลายเดือนก่อน

    Okay,I'm a little bit confused,some one correct me if I am wrong,Basically he's creating replicas of server by hashing multiple times(i.e k) to generate k replica servers,so that the load is evenly increased (or decreased)!

  • @tarushreesabarwal6618
    @tarushreesabarwal6618 2 ปีที่แล้ว

    Thanks Gaurav for this video, I am new to System Design so one naïve question: Even if we use consistent hashing to map multiple server ids using same hash function, physically the server is just one, for eg: I created 3 random server ids for a single server and used single hash function to compute the hash values s1, s2, s3 so it could be possible that one request is served by s1 and other by s3 but in actual there is just one physical server . I am trying to understand how by computing multiple hash values of single server would serve multiple request, the concept of virtual server is something I guess I miss here.

  • @smitmandavia5044
    @smitmandavia5044 3 หลายเดือนก่อน

    Hi Gaurav, Thanks for the video!
    How about we divide request into M groups and assign each group to a given server. By say keeping a map? If a server goes down, we can remap its ids to other servers randomly.
    If new server is added, we can take one group from each server?
    Is having a mapping somewhere makes requests slow??

  • @manveersingh5822
    @manveersingh5822 2 ปีที่แล้ว

    This was a pretty good video. Thanks Gaurav g!

  • @klbamniya
    @klbamniya 5 ปีที่แล้ว

    Thanks Gaurav for making on this topic . Could you please make an video on Object oriented chess design?

  • @debjyotibiswas3793
    @debjyotibiswas3793 5 ปีที่แล้ว +2

    You should write a book on system design, this is really great.

  • @angadpathania
    @angadpathania 4 ปีที่แล้ว

    Hey Gaurav... Great video... I'm a non techie and was trying to get familiar with these concepts... Essentially what ur saying is as your requests increase the load balancer should have those many server allocation hash functions so that there is a higher probability of equal distribution of load... Virtual servers here basically mean having those many server hash mapped points on the ring... Am I correct..? :)

  • @vinayaka1234
    @vinayaka1234 4 ปีที่แล้ว

    Hi Gaurav, Thanks explaining consistent hashing in simple ways. After finishing the video I got the doubt on what is the advantage of consistent hashing over round robin. Both can manage adding and removing of server dynamically.

    • @sadihassan8407
      @sadihassan8407 4 ปีที่แล้ว

      I think in round robin, you have to check all the servers in worst-case to find a free one. But in consistent hashing, one request is tied with one server which should be much faster.

  • @OmarNg7X
    @OmarNg7X 3 ปีที่แล้ว

    Great explanation. Thank you.

  • @keshavabhamidipaty3126
    @keshavabhamidipaty3126 4 ปีที่แล้ว +7

    Great video! I was wondering though, with this architecture, do you have to ensure that the hash functions don't ever collide though right? What would happen if an incoming request suddenly mapped to two servers that fell on the same point?

    • @gkcs
      @gkcs  4 ปีที่แล้ว +2

      It's answered in the other comments 🙂

  • @dacao0711
    @dacao0711 5 ปีที่แล้ว

    Thank you so much!

  • @VishalYadav-gk1kg
    @VishalYadav-gk1kg 5 หลายเดือนก่อน +1

    Very Nice Explanation Sir, Thank you !

  • @phaneendran4208
    @phaneendran4208 4 ปีที่แล้ว +4

    Hi Gaurav,
    Great series of videos. Thank you for sharing your experiences.
    I have one question on consistent hashing..
    Which component of the distributed system is responsible for implementing this technique.
    1) Is it load-balancer's job because it is a load distribution technique?
    2) Or is it application's responsibility.?
    Curious to hear your thoughts.
    Cheers!

    • @romeopeter1922
      @romeopeter1922 ปีที่แล้ว

      I don’t know if you still need answer to this but it’s the Load Balancer’s job because distributes the request and allocate them to the right servers.

  • @prakharsaxena796
    @prakharsaxena796 4 ปีที่แล้ว

    Amazing videos man!!

  • @nagarishyendarpanguluri6322
    @nagarishyendarpanguluri6322 3 ปีที่แล้ว

    Hi Gaurav, Great work! Quick question, How do we choose the number "m"?

  • @giobaldu
    @giobaldu 4 ปีที่แล้ว +2

    Great video!
    Question: where do the requests sit in practice? Is there a node acting as a scheduler dispatching request by request, or the requests are mapped immediately to a server and kept internally in memory? Or both, so that the requests can be rescheduled if the server goes down? (I suppose this would require the scheduler to periodically ping each server, or set a timeout). What happens if the scheduler goes down?
    Second question: would it be possible to use work-stealing instead do reduce inbalance? Whenever a server is out of work, it would steal a request from the back of the queue of another random server. Or could this skew too much the execution order of the requests?

    • @gkcs
      @gkcs  4 ปีที่แล้ว +3

      Thanks!
      The load balancer is a service which needs to tell the other services where a request is to be routed. It can either be queried per request (which is very expensive), or a snapshot of the current assignments can be cached by all services.
      If the snapshot changes at the load balancer, it can notify all interested clients.
      The service is distributed and backed by a 'reliable' database, so a single failure won't take the system down.
      Second answer: It sounds complicated and I have never seen it implemented on a large scale system.

  • @rakeshvarma8091
    @rakeshvarma8091 3 ปีที่แล้ว +1

    Gaurav, This video is wonderful
    Have small doubts
    Let's assume that request R1 is served by server S1.
    Now we have added a new server S2. Because of this let's assume the request R1 is now coming to S2.
    How the above scenario gets handled ?
    Is it like when a new server S2 is added , we have to move some portion of the data from the existing servers (S1) to the new server S2 based on its position on the ring?
    If it is the case, how can we do the distribution in real time ?

  • @pradyum88
    @pradyum88 4 ปีที่แล้ว +2

    Would the User caching still exist in consistent hashing?
    Great videos BTW!

  • @maitrivasa613
    @maitrivasa613 3 ปีที่แล้ว

    This question might have been asked before, but how do we choose the value of M for the ring? And do we increase M if the no of requests increase such that one slot in the ring can only contain either one request or one server

  • @iammehrabalam
    @iammehrabalam 4 ปีที่แล้ว

    @Gaurav Sen Can you plz explain or share any resource about
    Adding a node in the live system where consistent hashing is used and how insert and search query will work?

  • @srinivasasrikanthpodila4376
    @srinivasasrikanthpodila4376 2 ปีที่แล้ว +5

    Gaurav,
    The Addition/Deletion of Servers using the k-hash functions with the fixed ring size is a hard problem to solve to ensure the correctness. It could be simplified with generating the multiple ids of the same server.

    • @gkcs
      @gkcs  2 ปีที่แล้ว +2

      That's right 👍

  • @rockrock5838
    @rockrock5838 3 ปีที่แล้ว +1

    Really well explained man....

  • @SK-ur3hw
    @SK-ur3hw 5 ปีที่แล้ว +8

    Great video!! I thought that we can add a load factor or load limit like one server can have x requests. So once the load limit is reached, the incoming requests will point to next clockwise server. That way, no server will have too much load. But of course the virtual servers concept is good. Can you please add the code in the desc? Thanks. :)

    • @gkcs
      @gkcs  5 ปีที่แล้ว +2

      Sounds interesting. There are variations on consistent hashing which allow this.
      Code link: github.com/coding-parrot/SystemDesignCourse/blob/master/service-orchestrator/src/main/java/algorithms/ConsistentHashing.java 😁

    • @sreeram8942
      @sreeram8942 2 ปีที่แล้ว

      @@gkcs As you said in previous video about the User's cache data in a particular server , How does consistent hashing solve that issue ?

  • @59sharmanalin
    @59sharmanalin 2 ปีที่แล้ว

    Brilliant explanation Gaurav!! can anyone please explain how consistent hashing works in case of cassandra? Gaurav can you please make a video on that(you can also cover your fav. bloom filter in that :))

  • @jyotir124
    @jyotir124 2 ปีที่แล้ว

    Thanks Gaurav. One question, can consistent hashing be applied on SQL db

  • @rishabhmalhotra7058
    @rishabhmalhotra7058 5 ปีที่แล้ว

    Awesome stuff man :)

  • @RajeevSoni007
    @RajeevSoni007 3 ปีที่แล้ว

    @Gaurav Sen , this is the Cassandra ring architecture. right?

  • @eyalpery8470
    @eyalpery8470 2 ปีที่แล้ว

    I learned a lot, you're awesome

  • @xawnia
    @xawnia 4 ปีที่แล้ว +3

    Thanks a lot Gaurav, this was very clear! I was wondering what would happen if there is a clash between different (or the same) hash functions h(x)=h1(y) which server will the load get assigned to?

    • @vikassaran6430
      @vikassaran6430 3 ปีที่แล้ว

      same question .....do you know the answer

  • @nehamadaan3328
    @nehamadaan3328 3 ปีที่แล้ว +1

    @Gaurav Sen , Great Video! Thanks a lot ! Question - You mentioned at the end, its used in many many places. Are there places where systems don't use Consistent Hashing at all ? Also, are there systems using some other techniques for consistent hashing? Is this the only approach or one of the approaches to implement consistent hashing?

    • @gkcs
      @gkcs  3 ปีที่แล้ว +4

      Yes, definitely. Consistent hashing has it's own issues, and is usually only used for servers which need to maintain state (caches).
      Some databases also use consistent hashing. You can also try to reduce data migration by keeping master slaves for DB servers.

  • @gauravgarg9842
    @gauravgarg9842 4 ปีที่แล้ว

    Gaurav, nice video. However i had one doubt i mind. When a new node is added then how does key invalidation happens in existing servers.

  • @notthatguy1923
    @notthatguy1923 5 ปีที่แล้ว +1

    Gaurav - Awesome playlist for system design. Can you also include the explanation for when a server goes down and a request comes for a key which was saved in that server, how is the request handled? Are we going to replicate the data to not just one server but multiple servers to ensure availability. And if that is the case how to ensure consistency?

    • @gkcs
      @gkcs  5 ปีที่แล้ว +3

      We can fail the request and retry on the newly assigned server for this request key.

  • @perfectlyfantastic
    @perfectlyfantastic 4 ปีที่แล้ว +9

    8:33 it was told that k value should be log(M),Is it just a suggestion or its the value we should definitely consider

  • @arjundixit5913
    @arjundixit5913 2 ปีที่แล้ว

    Nice explanation but I have 2 doubts …please help me understand where I’m losing the track?
    Doubt 1: In the whole process we assumed that the no. of requests is constant (0-M), based on which the whole calculation of uniform distribution is carried on . But is that really the case practically ? I mean how would we know the range of the number of requests we are getting in advance?
    Doubt 2: If we use k hash functions for a particular server id and divide each by M, isn’t there a possibility of same id of 2 different server …specially if we have less number of numbers?

  • @siddhartheswaramoorthy6413
    @siddhartheswaramoorthy6413 3 ปีที่แล้ว

    Gaurav, In case of a server failure the next server in the ring will serve the load which means the user has to re-login right?, Why don't we use a session store using any in-memory data stores and store the session. So that we don't need to worry about the server going down we will still have the session data in the session store.

  • @sasirekhamsvl9504
    @sasirekhamsvl9504 2 ปีที่แล้ว

    Explained very well

  • @325venkat1
    @325venkat1 3 ปีที่แล้ว

    Gaurav: What is maximum value to use for the ring size i.e. M? And how to design the ring node size efficiently?

  • @krishnareddy3010
    @krishnareddy3010 6 ปีที่แล้ว +1

    Wow back to back !!!

  • @sridharbalabhadrapatruni1247
    @sridharbalabhadrapatruni1247 3 ปีที่แล้ว +1

    @Gaurav, What would happen to the caching that we talked about in the initial part of the discussion? I understand caching is not going to happen because the requests are too randomized for caching to occur. Is this algorithm so efficient that even without caching it's more efficient than having an algorithm that relies on caching?
    Also, as a performance engineer, i dealt with load balancing a few times, but never got to see these kinds of algorithms for load balancing. we have implemented algorithms that distributed load across servers based on several parameters such as - geographic proximity of the request to the server, hardware utilization (Server with least CPU, RAM, utilization gets the request), Least connections(Server serving the least active connections gets the new request), etc.
    Do you have anything to say about those logics, and whether they are related to the hashing algorithm we have seen...

    • @kaustubhparmar4274
      @kaustubhparmar4274 6 หลายเดือนก่อน

      May be late, but I think the caching will happen because the hashes will always return the same output for same input, so if the servers do not change then caching is not affected. But if the number of server changes we need consistent hashing so as to minimise the remapping of the request to the server.

  • @Wise___Man
    @Wise___Man 2 ปีที่แล้ว

    great explanation, thanks!

  • @NitishGupta-hs2rj
    @NitishGupta-hs2rj 5 ปีที่แล้ว

    Hi Gaurav.. by applying this circular algorithm for consistent hashing problem of local caching will still be there.. bcs every time your hash function changes your server range will also change then how this can be better approach wrt traditional hashing ?

  • @abdelrhmansamir1426
    @abdelrhmansamir1426 2 ปีที่แล้ว +1

    What would happen if there is a collusion when you calculate the virtual servers?
    I mean if h1(S0) = h2(S1) = 1.
    So there are 2 servers with the same ID right?

  • @jeyakumar4728
    @jeyakumar4728 4 ปีที่แล้ว

    Hi Gaurov,
    Wont removing / adding servers to the cluster affects the hash function modulo(%)
    Example:
    initially we have 4 servers hash(req for same id) % 4 -> s2
    if we remove 1 server :- Hash(req for same id) % 3 -> s1
    in this way, still the server 2 have stale cache data right?

  • @soumyaranjanmohanty7839
    @soumyaranjanmohanty7839 4 ปีที่แล้ว

    Great explanation Gaurav.

  • @manasdubey8667
    @manasdubey8667 4 ปีที่แล้ว

    Got it Gaurav, Very well explained. I am new to system designing and believe me I am enjoying it thoroughly.
    What I wanted to ask is, that how would you choose the value of K, to decide the number of hash functions?

    • @bsratuoh
      @bsratuoh 9 หลายเดือนก่อน

      One way he told us is to use the log function with the number of servers we have. There could be another way as well.