How to implement TinyURL (System Design Interview)

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 พ.ย. 2024

ความคิดเห็น • 120

  • @malcolmdinz1912
    @malcolmdinz1912 3 ปีที่แล้ว +15

    Best TinyURL design video I've found, and some concepts here can be applied to other design questions too. Thanks for making this!

  • @alexbordon8886
    @alexbordon8886 3 ปีที่แล้ว +5

    omg this is the best system design tutorial for tinyURL !!! Each minute hits the point instead of bullshit. 赞赞赞

  • @cynthia7000
    @cynthia7000 3 ปีที่แล้ว +18

    Good quality! Would love to see more of these system design videos!!!

  • @seeboonsoo
    @seeboonsoo 9 หลายเดือนก่อน

    No one explain better than this guy!!

  • @AnangPhatak
    @AnangPhatak ปีที่แล้ว +1

    Please make more videos on systems design. The content here is presented succinctly with non-complex simple diagrams and licid explanation.

  • @Nimzo003
    @Nimzo003 2 หลายเดือนก่อน

    Best system design video I've ever seen. I'd hire you on the spot.

  • @gsb22
    @gsb22 3 ปีที่แล้ว +13

    @13:11 Everything was going, you even mentioned security issue of using counter making the url guessable and then you said, we use 10-15 bit to add randomness. Now we cannot add bit that will make this number a bigger number because zookeper might reach that range when it starting from 1million. 2nd, lets say, we add random character to it, then the base62 encoding wont be 7 char long and then you have to take first 7 chars which WILL result in collision sometime in future.

    • @ericfries7229
      @ericfries7229 2 ปีที่แล้ว

      Convert the number from Zookeeper and the random bits to strings, then concatenate the strings instead of adding ints.

    • @Maffeos
      @Maffeos 2 ปีที่แล้ว +2

      ​@@ericfries7229 Could you elaborate? From what I understand, if we concatenate the number with the random bits, then the resulting string would be more than 7 chars. Say the new length is 10, base62 encoding would give another 10 char string, which can still result in collision, no?

    • @alperkocatas2353
      @alperkocatas2353 8 หลายเดือนก่อน

      Up. Same concern here... Looks like we need another method for randomness.

  • @tenaciousbali
    @tenaciousbali 8 หลายเดือนก่อน

    Comprehensive explanation and good pointers. Loved this!!!

    • @umeshhbhat
      @umeshhbhat 8 หลายเดือนก่อน

      maine be yahi bataya tha

  • @vaibhav_cs
    @vaibhav_cs ปีที่แล้ว

    Excellent approach and proper delivery of content !

  • @rohinirenduchintala
    @rohinirenduchintala ปีที่แล้ว +2

    The best comprehensive tinyURL video for system design i've seen so far. Please make more!

  • @bhavana1711
    @bhavana1711 4 ปีที่แล้ว +5

    This is the best video with example solution for this problem, keep it up!

  • @mariezhang6818
    @mariezhang6818 4 ปีที่แล้ว +2

    Best TinyURL design video! Good pace.

  • @neosapien247
    @neosapien247 4 ปีที่แล้ว +1

    This is the best video on this particular topic.

  • @DK-ox7ze
    @DK-ox7ze ปีที่แล้ว +8

    If we add 10-15 bits at the end of counter number then it will increase the base62 output size and exceed the 7 character limit. So can you clarify how that addition of random string works?

    • @jinushaun
      @jinushaun ปีที่แล้ว

      TLDR: md5(“123”) vs md5(“123,xyz”)
      The original solution hashed the returned counter value, which is a simple increasing number. The resulting hash is then base62 encoded.
      The new solution takes the counter number and appends some extra characters at the end. This new string is then hashed and base62 encoded.
      In this way, the string sent into hash is not guessable. This works because the numeric portion is still guaranteed not to collide.

    • @DK-ox7ze
      @DK-ox7ze ปีที่แล้ว +1

      But in this new solution also, there is a chance of collision because we are hashing the counter+random_chars and limiting the hash to a short length.

    • @SchoolScience
      @SchoolScience ปีที่แล้ว

      Even without appending extra digits, base62 does not guarantees 7 chars. That simply means, we are not confining ourselves to 7 chars. We probably need to tell the interviewer that we would get as small as possible hash but no guarantees. This is a possible tradeoff to avoid collision.

  • @tanujaphadke8219
    @tanujaphadke8219 3 ปีที่แล้ว

    this is the BEST video on this topic

  • @RandomShowerThoughts
    @RandomShowerThoughts ปีที่แล้ว

    this was very good, clean and concise. Really like this video, especially the logic as to not pick sql

  • @PallNPrash
    @PallNPrash 3 ปีที่แล้ว +2

    Excellent video!! You packed everything of importance in a nice, short video. GREAT job!! And THANK YOU!!

  • @vandal144
    @vandal144 4 หลายเดือนก่อน

    GOAT tinyURL vid

  • @rajathrao3209
    @rajathrao3209 2 ปีที่แล้ว +1

    Would love to see more of these system design videos!!!

  • @talktovideos4349
    @talktovideos4349 2 ปีที่แล้ว

    Thanks a lot for making this video

  • @europana7
    @europana7 2 หลายเดือนก่อน

    Resiliency can already be done by AWS identical Lamda in multi regions and Elasticache. … and Azure prob has an equivalent.

  • @mrowox
    @mrowox 2 ปีที่แล้ว

    Concerning the round robin approach to load balancing not taking server load into consideration, such that it might still forward request to an overloaded or slow server, I believe that due to the style of this approach, if one server is overloaded, then all the servers are possibly overloaded else seeing that it would have been distributing the loads equally except if the servers are not of equal capacity which will then bring the question of why will you spin up a server of lesser capacity

  • @9939364566
    @9939364566 11 หลายเดือนก่อน

    Amazing way of explanation.... I watched many YT videos and they all confused me. You nailed it ✌️ Tysm

  • @deep8843
    @deep8843 6 หลายเดือนก่อน

    Really helpful!

  • @Yoyomanmanholla
    @Yoyomanmanholla 2 ปีที่แล้ว +1

    LRU cache eviction has a big short coming. If we assume the cache is always at capacity with each new shortURL creation, then that means one of the top 20% URL will surely get evicted from cache, to make room for a random unpopular URL someone has submitted.
    This eviction approach means there will be a steady state of URLs in the cache that are not popular at all.

    • @boyangzheng269
      @boyangzheng269 8 หลายเดือนก่อน

      LRU is used to maintain the top 20%, so the usage count also matters.

  • @sindhumohana6164
    @sindhumohana6164 3 ปีที่แล้ว

    Best video on this Topic !

  • @GauravSharma-wb9se
    @GauravSharma-wb9se 2 ปีที่แล้ว +2

    does Base62Encoding of counter guarantee to return only 6 characters or 7 characters string ? if this is the case then when we do Base62Encoding on MD5 hashed value, in that case also it should return 6 or 7 characters of string...…please clear this doubt

    • @SchoolScience
      @SchoolScience ปีที่แล้ว

      Believe me brother, none of the guys were able to clearly explain this on TH-cam/Books/Blogs. Base62 does not guarantee in terms of result length. It generally increases in this case though.

  • @fannyzheng1077
    @fannyzheng1077 9 หลายเดือนก่อน

    Thank you

  • @bayareacarnatic
    @bayareacarnatic 2 ปีที่แล้ว +1

    If you are using a unique counter why do you still need md5 and base62 encoding.? And then you need additional complexity like integrate zoo keeper etc just to maintain counters.

    • @boyangzheng269
      @boyangzheng269 8 หลายเดือนก่อน

      md5 is not required but base62 encoding is otherwise image how long the url could be when the count increases to 1 trillion.

  • @d86clot
    @d86clot 4 ปีที่แล้ว

    Amazing video. Thanks for do it short and clear

  • @studyiq5015
    @studyiq5015 2 ปีที่แล้ว

    Great explanation Plz post more system design videos Thanks :)

  • @narindersharma303
    @narindersharma303 2 ปีที่แล้ว +2

    if we have to base60 the #, then why not precompute it and distribute it to servers, i.e. S1 will keep the hash60 of 0-1M , S2 will keep the hash from 1M-2M and so on.

  • @hamzasanwar
    @hamzasanwar ปีที่แล้ว +1

    Why base 62 not base 64

    • @abjbreal
      @abjbreal 6 หลายเดือนก่อน

      52 alphabets and 10 digits = 62 total characters

  • @coop4476
    @coop4476 4 ปีที่แล้ว +1

    This is excellent! Thank you!

    • @coop4476
      @coop4476 3 ปีที่แล้ว

      Fake

  • @jamiepearcey9335
    @jamiepearcey9335 3 ปีที่แล้ว

    Great video explanation.

  • @hyrdeshgangwar
    @hyrdeshgangwar 2 ปีที่แล้ว

    Good Stuff!

  • @arjunsankarlal5296
    @arjunsankarlal5296 3 ปีที่แล้ว

    Well explained!

  • @yuanyuanweng9784
    @yuanyuanweng9784 4 ปีที่แล้ว

    Thanks! best system design video. Waiting for more video

  • @ashwinnatty
    @ashwinnatty 4 ปีที่แล้ว +5

    Hi. Good video. I have a question though. If we do a base 62 of smaller numbers like 0, 1, 2 ... we wont be getting a 7 char long tiny URL right? So how do we make sure all our short URL's are 7 char long? Kindly clarify

    • @p516entropy
      @p516entropy 4 ปีที่แล้ว

      base62("000007") -> "F2nt9Syt"; base62("000023") -> "F2nt9T75"

    • @ashwinnatty
      @ashwinnatty 4 ปีที่แล้ว

      @@p516entropy : Hi. Thanks for responding. But now consider a bigger number , say 1000000, it's base 62 -> 11PVWGSpX6 which is greater than 7 characters. I am still unable to understand how to fix your short url using counter method to exactly 7 characters

    • @p516entropy
      @p516entropy 4 ปีที่แล้ว +1

      ​@@ashwinnatty Oh sorry I was hurry and now I understood, here is base62 algorithm has the same approach as base2 (000, 001, 010, 011, 100, 101) or base16 (0000... FFFF), but now that is base62. (0 000000
      ,
      1 0000001
      ,
      2 0000002
      ,
      3 0000003,
      4 0000004,
      59 000000X
      ,
      60 000000Y
      ,
      61 000000Z
      ,
      62 0000010
      ,
      63 0000011
      ,
      64 0000012
      ,
      65 0000013,
      56_800_235_583 0ZZZZZZ
      ,
      56_800_235_584 1000000,
      3_521_614_606_207 ZZZZZZZ). And indeed, as you can see no any collisions

    • @journeytoraceday
      @journeytoraceday 4 ปีที่แล้ว +2

      @@p516entropy sorry, still don't understand this. base62(101020)=FMA4abRo and base62(101021)=FMA4abRp. If we take the first 7 chars from each, we get a collision, "FMA4abR"

    • @gurumack
      @gurumack 4 ปีที่แล้ว

      @@journeytoraceday I have the same question...

  • @chronoel
    @chronoel 4 หลายเดือนก่อน

    is there a way to replace zookeeper? why not just store a record of ranges used in a db or redis cache. since the operation of generating a new range happens once in a while. specifically, when a new server is being initialized and another when the range of a server rans out and need to fetch a new one

  • @FlashFreelancers
    @FlashFreelancers 5 หลายเดือนก่อน

    How do we identify the same long url which got converted to tiny url already ? Since we are using counter each time it will get a new tiny url no ?

  • @DrDoctopus
    @DrDoctopus ปีที่แล้ว

    For caching, will it be better to have some kind of background job that populates the cache with the most popular URLs from the database? Else if you're always adding a URL to the cache, then it's no longer just the popular URLs but all of them (or at least the capacity of the cache).

  • @georgesmith9178
    @georgesmith9178 4 ปีที่แล้ว

    Thumbs up, thank you.

  • @kavyapallari2902
    @kavyapallari2902 3 ปีที่แล้ว

    Good video

  • @fjeld11
    @fjeld11 4 ปีที่แล้ว +3

    Thanks for the excellent video. A question, in solution 2 with zookeeper, how does each server records the range it was given and when the range exhausted? Is there a counter on each server?

    • @ketanmalwa4580
      @ketanmalwa4580 3 ปีที่แล้ว

      That logic can reside on the zookeeper service itself. For instance, the coordination service can use a custom hashing function based on the incoming request and forward it to specific servers depending on the range value.

    • @fjeld11
      @fjeld11 3 ปีที่แล้ว

      @@ketanmalwa4580 Got it, thanks a lot!

  • @chilly2171
    @chilly2171 2 ปีที่แล้ว +1

    what if the custom url entered by the user already exists in the database? it results in duplication.

  • @renguomin1
    @renguomin1 3 ปีที่แล้ว +1

    Why introducing a counter guarantees you no collisions?

  • @tgiflying
    @tgiflying 3 ปีที่แล้ว +1

    Why do MD5 or SHA256? Just pick 7 characters at random with replacement from the set of base62 symbols

  • @eleanorwang6226
    @eleanorwang6226 3 ปีที่แล้ว

    Very clear illustration! Thank you

  • @Freez2018
    @Freez2018 3 ปีที่แล้ว

    Hey Kevin, should we wait for new system design videos?

  • @funnyclipz520
    @funnyclipz520 2 ปีที่แล้ว

    why can't the database (mongodb) itself generate an auto increment ID and then server can generate MD5 or base62 after adding salt to it ?
    Since db upon creation would never fail to generate a unique primary key it would avoid single point of faliure.
    Making a distributed solution for generating ID seems overkill here..
    I am new to this so let me know if there is a problem in this solution.

  • @mrowox
    @mrowox 2 ปีที่แล้ว

    You mentioned zookeeper. what is the number of servers is not stable, could be 8 today, then 5 tomorrow, then 10 next tomorrow. How does zookeeper manage those ranges based on thee number of servers available. I really need to understand this. Please help

  • @dbenbasa
    @dbenbasa 4 ปีที่แล้ว +2

    question about using counters - isn't that mean that if the same user asks to shorten the same URL multiple times, he will get multiple short URLs? Is it ok from a requirement standpoint?

    • @PABJEEGamer
      @PABJEEGamer 3 ปีที่แล้ว

      Whenever you get a request to shorten URL 1st check should be to see if it already exists either in cache or main DB. If no, then only proceed with the short URL creation.
      Or you could use the insert "IF NOT EXISTS" feature provided by no sql solution like cassandra.

    • @gsb22
      @gsb22 3 ปีที่แล้ว +1

      You cant return same url ever again.

  • @kakakukukakakuku
    @kakakukukakakuku 3 ปีที่แล้ว +1

    A good candidate would be the person who has watched this video :-)

    • @blaycoder
      @blaycoder 2 ปีที่แล้ว

      🤣🤣

  • @michelmdf
    @michelmdf 8 หลายเดือนก่อน

    Why not just rely on a database cluster to generate a sequencial numeric id , and just persist the id and the long url ? So when a request comes in, we convert from base62 to decimal and find the record in the db. What is wrong of this idea ?

  • @ubaidmanzoorwani6254
    @ubaidmanzoorwani6254 2 ปีที่แล้ว

    Why do we have to use base62enode after MD5???
    Why is MD5 not enough??

  • @benjaminwestphal9685
    @benjaminwestphal9685 2 ปีที่แล้ว

    How does your hashing work o.O - you take an 128 bit md5, and then make a base62 out of it with 20+ chars. How are you sure there are not collisions when only taking a subset of the base62 aka md5 hash(just differently displayed)? o.O

  • @jochenjochen4246
    @jochenjochen4246 3 ปีที่แล้ว

    i am curious why 128 bit goes to about 20 chars by using MD5 + base 64/62. 128bit to 32 char in hex. but what next? thanks in advance

  • @niteshupadhyaya007
    @niteshupadhyaya007 2 ปีที่แล้ว

    The counter example is incorrect - Another solution could be to append the user id (which should be unique) to the input URL. However, if the user has not signed in, we would have to ask the user to choose a uniqueness key. Even after this, if we have a conflict, we have to keep generating a key until we get a unique one.

    • @spacepacehut3265
      @spacepacehut3265 ปีที่แล้ว

      it isn't incorrect because the counter itself will be unique everytime even if the random bits will be same. And so does the generated hash will be unique for every url.

  • @brownbearnishant
    @brownbearnishant 8 หลายเดือนก่อน

    do more videos brother

  • @SchoolScience
    @SchoolScience ปีที่แล้ว

    base62 encoding can result in a string of any length. And we are not supposed to take first 7 chars to avoid collision. So that means we take what ever output we get from base64.
    This is what I want to hear from these videos, but believe me none of them emphasize on this. They just say blabla and use base62.

    • @boyangzheng269
      @boyangzheng269 8 หลายเดือนก่อน

      We are not encoding the original URL because as u said it would result in a string of any length. Instead we are encoding the numbers (0 - 3.5 trillion) which would not exceed 7 chars because 62^7 > 3.5 trillion.

  • @cbest3678
    @cbest3678 3 ปีที่แล้ว +1

    Can someone explain why zookeeper itself is not single point of failure in this case?

    • @jamiepearcey9335
      @jamiepearcey9335 3 ปีที่แล้ว +2

      Zookeeper provides increased availability through redundancy by operating in a multi-node cluster configuration. Multiple zookeeper nodes would have to go down for service interruption. As the URL shorting service reserves a range of values rather than a single value, the number of roundtrips to this counter reservation system is also reduced, which in turn also helps to reduce system load, network traffic, and the number of services involved in handling each reservation. These all help to increase availability, and can also help to increase performance and reduce running costs. You can increase availability further by spanning your nodes and services across separate datacenters, to avoid your infrastructure becoming a single point of failure.

  • @nicholasmanning4307
    @nicholasmanning4307 4 ปีที่แล้ว +2

    Nosql is in no way related to a database not excelling at joins. Graph databases excel at joins and relationships and they are categorized as nosql

  • @lucianomonterovidela
    @lucianomonterovidela 3 ปีที่แล้ว

    One question, If we take the first 7 characters even with numbers differents, we can have colission. What I didnt undestand?

    • @gsb22
      @gsb22 3 ปีที่แล้ว

      With number that is less than 3.5 trillion, it would always result in a unique base 62 string, but the issue here is,, it would be guessable, because if 1 million -> xyz001 and 1million 1 would be xyz002. In order to tackle that, we could add random characters to the number which will result in base 62 string with len > 7 and now it is an issue.

  • @sengo669
    @sengo669 4 ปีที่แล้ว

    Base62(100000000000) + another 10 to 15 bit number would be super long. Do you think about it?

    • @sengo669
      @sengo669 4 ปีที่แล้ว

      So our goal is to generate a short URL not to make a longer URL.

    • @jamiepearcey9335
      @jamiepearcey9335 3 ปีที่แล้ว

      Base62 should be considerably shorter than base 10, and the value you describe will approximate to 7 characters. Base64 is considerably shorter still, but the extra two characters might not be suitable for urls! You may be attempting to convert a string of numbers to base62 which would could create a longer base64 value, as you would be starting off in a higher base unit.

  • @johnhuynh9445
    @johnhuynh9445 4 ปีที่แล้ว

    Let's say the requirements change and we want a unique_url every time the same long_url is entered. (Use Case: you want to conduct analytics on the different entry points to the long_url site). How would we accommodate unique short_urls for each successive insert of the same long_url?

    • @jamiepearcey9335
      @jamiepearcey9335 3 ปีที่แล้ว

      This design already allows us to associate the same long URL with one, or more short URLS.
      To prevent the same long URL being associated to multiple short URLS the service would first have to check if the long URLS is already in use, and if so return the existing URL, which isn't something that has been described in the requirements or implementation detail.

  • @shamim520778774
    @shamim520778774 5 ปีที่แล้ว +1

    what else can we use instead of zookeeper ?

    • @abhinavprakash3324
      @abhinavprakash3324 4 ปีที่แล้ว +1

      etcd is a similar strongly consistent, highly available, key-value store.

  • @kumarmanish9046
    @kumarmanish9046 3 ปีที่แล้ว

    What about database? He never talked about how to store in db

  • @mayurkoli4145
    @mayurkoli4145 4 ปีที่แล้ว +1

    Well the lecture is great i have a doubt, MD5 is generating unique hash each time so we can grab first 7 char of that hash, then why we need to convert it in base62.?

    • @jangrahul
      @jangrahul 4 ปีที่แล้ว

      Please let us know if you get the answer to this ques

    • @jangrahul
      @jangrahul 4 ปีที่แล้ว +3

      ok, got it. MD5 generates 128 bit output which is generally represented in hexadecimal, if you will take first 7 char of this string made up of hexadecimal you will not be able to make many combinations. only 16^7 combinations. However if you use base64 encoded string we you will able to have many combinations in 7 char i.e 64^7

    • @mayurkoli4145
      @mayurkoli4145 4 ปีที่แล้ว

      @@jangrahul got it man thanks u✌️✌️☺️

    • @gsb22
      @gsb22 3 ปีที่แล้ว +1

      beware this WILL result in collision if two users use the same url.

  • @stiffyBlicky
    @stiffyBlicky 2 ปีที่แล้ว

    This video is inaccurate base62encode(0) is not aAbB123

  • @dhruvdnar
    @dhruvdnar 3 ปีที่แล้ว +1

    You rushed through some important aspects. EG: Why would base 62 give 21-22 chars? Its because Base encoding encodes 6 bits at a time. 128/6 = 21

  • @bostonlights2749
    @bostonlights2749 4 ปีที่แล้ว

    I like your accent

  • @YT-yt-yt-3
    @YT-yt-yt-3 3 ปีที่แล้ว

    aka :)

  • @mtung05
    @mtung05 4 ปีที่แล้ว +8

    This is a straight rip off from Tushar Roy tinyUrl video

    • @elfchosen1477
      @elfchosen1477 4 ปีที่แล้ว +1

      I don't think so. This one is much better explained than Tushar Roy's one.

    • @gsb22
      @gsb22 3 ปีที่แล้ว

      somewhat.

  • @chriszeng5406
    @chriszeng5406 4 ปีที่แล้ว +1

    listening u talk gives me the biggest anxiety. but good video still.