hi. one question. If the app server gets say 1-100 ID range (or 1000-10000 ID range) for usage. How will they be converted to the 6 char alphanum key? If we do sha256 or anything, we again risk collision.
Thank you so much for the amazing content you share, the way you explain the concepts are so crisp and clear. I am preparing for my Design interviews and bumped to your videos and I am just glued here. Its a one stop shop for all the amazing System Design tips and tricks. Waiting for more these kind of videos !!
Great content as always, I love the aspect of comparing against other solutions out there. Thanks! Two advantages of having a KGS imho are: a. The ability to independently change the logic of key generation and store the keys and deploy that microservice. b. It can generate random short URLs (as against predictable ones in the counter logic), ensure there is no collision and store them.
@@ThinkSoftware I bought your course and the depth of analysis for each design was impressive, gave a lot of confidence for my FAANG interviews. The volume of content, in terms of number of design questions, was on the lower side though.
@@ganzee6928 Thanks for buying the course. Yes as far as the number of chapters are concerned, it is still far away from my target number of chapters and that is why I am giving unlimited access to all my students right now. There are three chapters in the pipeline 1. Stripe System Design, 2. Object Storage System Design and 3. Design of Distributed Datastore, with Stripe System Design coming very soon. As you can see creating quality content is very time consuming, specially when I myself being an architect in my current company is very busy in my primary job but then you have seen the quality of the content/chapters that I have written.
@@ThinkSoftware Completely understand the time constraint. Your course quality (specifically the depth and comparative analysis) is one of the best I have seen online. Looking forward to going through the new material when published. Thanks.
12:00 There's a standard CFT Lemma which says, if there are 2f+1 total nodes in a distributed system, then that system can tolerate atmost f nodes failure. So if changes are getting deployed to one server (which is minimum viable amount of servers which we can deploy things to), then that one server suffers some downtime while deployment, then the max nodes required to keep system running is 3. That's why we keep servers to a minimum of three. This doc can be used for reference, en.wikipedia.org/wiki/Quorum_%28distributed_computing%29.
Please make more videos, i recommend you to introduce some GOOD & Important concept (BUT NOT THOSE things already introduced by other people) on your youtube videos. so that for low level sde, they can learn a lot from you. And latter, they will buy your course if they want to know more details! Ur videos are the best. i am super willing to buy your course!
Thanks for creating these videos. I really like your approach of asking questions and make audience think. Would it be possible for you to create a video on system design for google sheet?
Considering 3+ app servers and the way LB route requests to them and also you don't know how many requests are coming to the service from other users, it is somewhat random (although could be random in a range).
The approach of using counter for generating short Urls would NOT be very optimal if user is given a choice to pick the short URL of their choice. In those cases, lets say 100-200 counters values could contain already used short URLs. The application server would need to check the database whether a value between 100-200 is already set or not in that case and if only free it would use it. Let me know your thoughts on tackling this extension of the problem
Thanks for the comment. This was one of the requirements that we discussed that we are not going to return same tiny URLfor a big URL. Let's see what others will say if we change this requirement.
@@ThinkSoftware The pace you work in is very helpful for me. The way you approach the problems and the speed at which you work are both very beneficial for me while I study system design.
For base64 encoding you need 6 bits. So now if you start encoding every 6 bits in 128bit md5 hash to a base64 character then you need total 128/6 characters
Great system design. Regarding API Design, I have been wondering if we should use the "redirect" approach and the get_long_url returns a 302 response or use the approach you are suggesting in your design. I am leaning towards what you suggested but it probably means we will need a layer of code to redirect after getting the long url. Curious, what are your thoughts on that.
In the non functional requirements, you said strong consistency. Can you explain how different components ( specifically datsstore/counter database) enable strong consistency during component failures? Thanks!
Thankyou for the great content. Need some clarification over key generation. suppose the counter value is 65535. Convering to base 64, it becomes "NjU1MzU". Now, as you replied in some comment below that if the base64 converted value is less than 6 character then we can add 0 or z in the prefix to make it 6 digit. But here for 65535 counter value, the base 64 is NjU1MzU which is greater than 6 in length. How will we make it 6 digit? If we trim some characters then it will be same issue like MD5 hash value trimming. pls help. Thanks in advance. :)
What you did is string encoding of 65535. What you need to do is first the integer 65535 (base10) to an equivalent integer value of base62 or base64 (e.g. equivalent integer value in base16 is FFFF). Once you have the integer equivalent in base62 or base64 then you encode it in string.
Hi, Thanks for the video. I have one doubt regarding the counter approach. You mentioned that the app server ranges will be 101-200 or 301-400 or even 1001-2000, etc but if we convert any number within these ranges to its base62 representation, it will generate key less than 6 chars since you have decided 6 chars should be a key length in your requirements. How you are going to resolve this problem
@@ThinkSoftware I think even if you consider 100 as 000100 and then try to convert it to base 62, i dont think it will end up in a 6 character long key. Someone please correct me if am wrong
This is after conversion to base62. Another example let's say after conversion it is abc. This can be converted to 000abc or zzzabc based on whichever character is used for base62 encoded 0 (i.e. 0 or z)
@@ThinkSoftware First, thank you for the video, its very well explained. Following up on the above question, assuming we have counters in db which are regular decimal number, we then use base64 encoding on the ranges on app side , what if the counters goes above say a million , then the base64 encoded value of 1million is greater than 6 length, now taking a substring from them will be same issue like MD5 ,correct?Thanks again for the video
I think you can always skip auth APIs, internal applications integrate with internal identity brokers, and external application integrate with external identity brokers. You almost will never implement authentication from scratch.
I've watched some of your videos and thanks for great videos. One thing I noticed is the voice quality is sometimes not good, too small or unclear.Maybe the microphone is too far or something.
Thanks for the feedback. Let me know do you find this issue in other videos as well, specially the newer ones. I am still learning to produce better videos day by day 🙂
Very nice eplanation, specially nice to see comparison of diff. ans. available, I have 1 doubt - in the final approach, you have told that we will keep a counter in the datastore, but since we hv around 68 Billion possible keys, then after some time won't the counter value reach 68 Billion? and store that huge number in DB would be a problem?
@@ThinkSoftware Thanks, got it, 64^6 = 68719476736, whereas 2^64 = 1.844674407371E+19, which is very big, so 68billion integer number can be accomodated easily
Thanks for the comment. Here I am passing the userToken in the API to emphasize the need to pass the userToken. The API section describes the API from a high-level. At the REST api level the userToken will be set in the header.
17 minutes into the video and all I can see is a generic approach to making a distributed system scalable and available while the title speaks of URL shortner system design. Am I wrong to expect TinyURL specific designs throughout the video as opposed to the last part of the video ?
TinyUrl problem is a very generic very simple problem that introduces the viewer to how to think about scalability and how to handle availability. it's not a complicated problem by any means like Twitter or Netflix or Facebook. most of the concepts used here will be taken for granted if you try to solve the more complicated problems. however, in the more complicated problems you will solve another type of problems that will require you to make a simple design more complicated. it's a good starter to system design videos..
if collision occurs what to be done (we are choosing the first 7 chars from MD5 hash + base64 + counter values) ?? We can try using another hash alorigthm or chossing specific index chars in output of (MD5 hash + base64 + counter values). Issues: 1) But for every request we need to check in DB if that short url already exists or not. 2)Lets say we need same short url for every request for long url, in that case, let's say the first request has generated short url with the solution I mentioned above then for every next request for same long url it has to perform that soltuon (extra steps) to generate same url. your opinion on this?
Instead of relying on an external data store for getting valid count ranges why don't we just append a UUID to every input URL, hash it with MD5 and bas62 encode it? This reduces the possibility of collisions and eliminates the network overhead. The individual app servers can generate UUIDs independently
This I have already discussed in the course and in this video that MD5 hash can't be used because for 6 character short url, we have to drop the bits from MD5 which will result in conflicts.
Hey. This video was great. You have made all the concepts abundantly clear. I had a quick question tho. While sending the write request, we ask the user to provide the API key right? Can't we just use that along with the counter variable which keeps on incrementing until a unique URL is generated? I'd think that it might tick up the overall complexity?
Thanks for the comment 🙂. Are you suggesting about different counters for different users? This will not work due to conflicts and you need to keep increasing the counter.
@@ThinkSoftware Another quick question. I'm sort of trying to design this system in nodeJS. So I have created 3 heroku accounts for 3 different servers and one for exposing the API service. I was pondering if creating the global cache in the API server will be a good idea or not. Like we could expose the read operation for the global cache through the API. Whenever a read request is sent to any of the servers from the API, it calls the cache endpoint on the API and if no match is found, the server checks in its own local cache and then ultimately resorts to accessing the database. What's your stance on this approach?
I don't really agree with the fact of having 3 servers running all the time just for anticipating maintenance or new releases. You can have 2 servers running and when you need to release a new version you just pop-up another instance and once it's successfully launched you remove one of the old instances and you do the same with the other instance. Kind of blue green deployment in a way.
Thanks for the comment. It totally depends on what type of infrastructure support your service has. If the underlying infrastructure provides support for adding host and removing host in a seamless manner and you don't have to go modify various configuration settings then you can go that route. Otherwise, it will be less hassle-free to have at least three servers (and I have rarely seen anyone using the route that you suggested in my 15+ years of experience). And also you need to understand the CICD pipeline that you have. We don't do deployment or maintenance once a month, nowadays the frequency for deployment is much higher, if not daily then still once a week or sometimes twice. Of course, when you are starting new (think a startup of two/three-person), you can still go with two servers because you might be at a stage where you are the only person (or two/three more) who knows when to perform the deployment. But I am talking, in general, about suppose if you are working in a cloud company at the scale of Stripe, Lyft, Uber, or FAANG.
@@ThinkSoftware Okay, so just for this single column, single row data value, do we have to deploy a distributed database like mysql here? since for rest of the system you are recommending to use nosql db which would not let you use sequences that easily...
@@sanketvictor You have many options. There are NoSQL databases that support transactions (light-weight transactions) and DB counters as well. You can use them as well. As a last resort, you can have a separate sql db deployment for this purpose.
Had a question on the range keys being maintained on each app server. Would multiple threads not have to synchronize with each other to get the next available offset in this range , leading to higher latency? I am assuming this counter logic is needed only when we run into collisions.
This goes into other detail discussion about how the replication is happening between the DB instances (synchronous vs asynchronous)...then what are the consequences if we are using synchronous vs asynchronous replication on the reads and writes.
Thanks for the comment. No one is bulk pulling IDs. It is just a counter that you increment atomically and get at the DB level by some threshold (e.g. 100 or 1000) and then when the app server get the counter value it can assume that it can give away Ids in the range [countValue, counterValue + threshold)
I think I have to make a video on it soon. People have misunderstood CAP theorem. First of all they don't understand in which scenarios CAP theorem is applicable. Secondly, according to CAP theorem, in scenarios where it is applicable, only in the presence of a partition, you can't have both high availability and strong consistency and you have to give up one for the other. It does not mean that when there is no partition, still you can't have both. And there is a third reason also, which I have answered in other comments before as well.
What if there are too many write requests which end up lots of app servers needing to fetch counter range, resulting in write failures and overload on counterDB ... In order to avoid this scenario, do you think instead of using a counter stored in DB and app servers going to it directly, we use a service (counter service) that pushes counter-ranges to a queue and app servers just pick a range from that? It can keep pushing to the queue at regular intervals. This design is more complex is only needed if there are too many write requests that we need to handle ..
We can always increase the range size if that is happen. With 100 as size, it means that servers are going to DB only 100th time and that the scenario you are mentioning is when somehow all the app servers exhaust their range of 100 at the same time.
Thanks for your video. one question, when you mentioned the non-functional requirements (@5:32) should include high available and strong consistency, but based on the CAP theorem, we can only choose one from them, how can we keep high availability and strong consistency at the same time?
Thanks for the comment. As we discussed, the service has strong consistency in the sense that once we created and returned a short URL to the user, if the user queries the big URL using that short URL, the service should be able to return the big URL. Otherwise it will not be a good user experience. Now the question arises that how a service is available and consistent at the same. It boils down to the fact that no service is 100% available. Question arises whether we are targeting five 9s or four 9s of availability and again it also boils down to whether we are talking about availability of the whole service or some part of it. The service itself can be highly available (as a whole) and at the same time fulfill consistency requirement as a whole (keyword is "as a whole") however the individual partitions (in the datastore) may have to prefer either availability or consistency. Now it is totally possible that one partition could be down due to network partition but it won't affect the functionality of other partitions, thus overall service still considered as highly available. And please note, I didn't discuss the design of datastore where we would discuss all these things (which will be a topic of a future video). I hope this clarifies.
I would like to add on to what has already been said - the question of choosing between availability and consistency arises only in cases of a partition. Otherwise in a normal running scenario a service could be both available and consistent. Now let's discuss the partition scenario . In this scenario . we could make the read service to be highly available while the write service might become unavailable in event of partition thus maintaining the consistency . So different microservice of a system could be configured differently to support availability or consistency ! Hope this helps
@@ThinkSoftware I would think, eventual consistency should be OK. As most of the time, the short URLs are created may not be immediately queried. The flow would be user creates a short URL, posts it to his followers and the followers may not click on it immediately. If we use a NoSQL Database like Cassandra, we can improve Read Performance tremendously
Thanks for the video but I have a question for you. How is this approach any different from zookeeper approach. your data source is still a single point of failure whereas zookeeper on the other hand is considered to be more reliable and high available.
The datastore is not a single point of failure. I didn't discuss this in this video but it is already discussed in the course that you can check out. Now regarding using zookeeper to generate the counter values, I think people suggest that without even thinking what it means. In an interview, you cannot just say that you will use zookeeper to generate the counter values, and zookeeper will magically handle generating this value and also provide high availability (i.e. no single point of failure). Generating a unique sequence is the crux of this tiny URL system design and an interviewer will ask you then how zookeeper achieve this. In practice, nodes/machines in the zookeeper fail and they fail all the time. In this case, when one server in the Zookeeper cluster fails, somehow the system needs to make sure that the others don't return a duplicate range. The only way to do that is to make all servers agree on which ranges have been given, and which have not. Now this lies in the distributed consensus problem. In fact, zookeeper uses the one as well for this. Now, if you want to use zookeeper just to assign different counter ranges then this is a bad use of zookeeper because you can achieve the same by just providing counter ranges in some configuration form where all the servers go and read from a file uploaded somewhere to get their ranges. You don't need to run and maintain a separate service for it. This was a long answer as it was not easy to explain in a few words. I hope it clarifies your question.
Will global cache will also trigger invalidation of app server cache ?? Because lets say in app server 1 you have not found sentiment stored as there was no mapping after going to db. And after some time on app server 2 for same mapping write request came then global cache will have mapping after going to db Now lets say for different user request came on app server 1 since we have not found sentiment store for same mapping we will return wrong data. So if any write on global cache happened it should update all app server to invalidate cache entry for that mapping ?
Its a very small trade off. 7 characters can generate 68B combos and we can easily afford to loose some chunks in case of server failures. Also, when the cron job runs to delete expired tiny url mappings, we can actually add these tiny urls back to the table containing the available urls so that they can be reused.
are you suggesting to convert long number value to bytes and these bytes to base64 encoded string. if yes then if we convert 67billion = 67000000000 to base64 encoded string it returns this value "AAAAD5mC3gA=" which is of length 12. Then how do you shorten it length 6 again? or lets say a long number in java is a 64 bit number, so with base64 encoding with 2 padding it will become 66 bits. So now 66/6 = 11. So the encoded string length can be 11. long n = 67000000000L; final ByteBuffer allocate = ByteBuffer.allocate(8); allocate.putLong(n); final byte[] bytes = allocate.array(); final String s = Base64.getEncoder().encodeToString(bytes); System.out.println(s);
Both the approaches you have mentioned are not correct. Consider a base 10 number 15. When you convert it to base 16, it becomes F. 10000 in base 10 become 2710 in base 16. In the same way you convert an integer to base62 or base64 which ever you choose. More information in the course. There is free preview chapter on tinyurl design.
Question - in your approach to generate URL using counter maintained in DB (discussed at 38-40 mins in your video), I'm unable to understand one thing. Let's say we use counters to generate number range but we are limited to 6 characters that means we can only generate shortlinks from 0 to 100000 and we will exceed 6 character limit. How can we apply 62 power 6 combinations. Can you please explain with a sample URL how will it look like?
6 characters use base62 encoded number and not decimal number which is base10 encoded. 100000 that you mentioned is base10. We will convert it to base62 which will be way smaller.
@@ThinkSoftware are you suggesting to convert long number value to bytes and these bytes to base64 encoded string. if yes then if we convert 67billion = 67000000000 to base64 encoded string it returns this value "AAAAD5mC3gA=" which is of length 12. Then how do you shorten it length 6 again? or lets say long number in java is a 64 bit number, so with base64 encoding with 2 padding it will become 66 bits. So not 66/6 = 11. So the length can 11.
Semaphore mutex is used in single codebase. Here we have multiple servers. If want to use mutex in DB side or in cache the problem will the the same. That Db or cache will become bottleneck. So that's why he's incrementing it by 100, to reduce load on DB.
You talked about updating the counter values inside a transaction construct. But then we're going to store this in a No-Sql datastore, is this transaction construct possible in DynamoDB or Cassandra like DBs? Maybe Spanner or cockroach offers this but my question is in a generic sense. Thanks for the Awesome work!
Thanks for the comment. Other databases provide similar constructs as well. E.g. Cassandra provides atomic writes and light weight transactions for updating a single record etc.
@@ThinkSoftware When generated new mapping between short and big URLs, we have to both add it into mapping table and update the opt URL count in user table. Should these two write-operations be done in one transaction?
Does the application server save the long URL to the DB as soon as it creates it? (before returning to the user). Otherwise if App Server 2 gets a request to fetch a long URL, it might think it does not exist, because App Server 1 didn't save it to the DB
@@ThinkSoftware but what if app server 1 can create 1-101 urls and app server 2 can create 101-200 urls. app server 1's 99th url creation request is for url www.123.com. The server creates the shortened url. App server 2 gets a request to shorten the exact same url app server 1 did some time later. The request is app server 2's 102nd request. Now one long url has two different short urls. How does the system not create many different short urls for the same long url without a url shortening service? Or are the system designers accepting the cost and overhead of having many different short urls for the same long url?
It all depends on the requirements and design changes if the requirements changes. Here in our requirements we didn't have any requirements for always have a single short url for a big url. Instead our very first functional requirement is always return unique short url if you check again.
Hi @ThinkSoftware, I have a question. let's suppose we have 3 app servers, and we try to deploy a new build. obviously, we can't deploy a new build on all the servers at the same time, so now if we go one by one then it's possible that a single user try to access our service gets a response from a machine where the new build has been deployed and if the user tries to access the same service gets a response from another machine(where new build-id not there yet, as it's gone routed by load balancer) is this a valid case. in a real-world how this thing works?
@@ThinkSoftware I am not sure, how this will be possible in every case. I know spring cloud provides us a way to keep our configurations in a central place & we can make use of it to make feature enable after deployment but let's suppose we are trying to deploy a front-end app(server-side rendering) and my change was just to change the background color of a particular div/model from yellow to blue, how can I enable the feature only after deployment in all the app-servers?
It depends from what are you trying to achieve. If we just consider your example, you should be come up with an intelligent way to just update the background color without even needing the deployment. In reality, when we are working on cloud service, our feature work include design, implementation, testing and even deployment strategy that is most suited for that feature enablement.
@@rahulbatheja7933 The answer to your question is blue-green deployment. In cloud anyway you have the feasibility to increase the instance as you require. So while your blue server is catering the customer request , you can do the deployment in green server and once the deployment is complete, this green server will take place of blue server and that transition is very smooth. This is called rolling update where you will not see any service disruption.
A 64 bit long integer has long way to go if you calculate it and that at that time counter going out of range is least of what you should worry about 🙂
it is not a single machine. It is just logical diagram of data store. If we would have gone inside that logical component then the discussion could go about how to achieve all non functional requirements in that data store. Check my other video on intro to distributed systems in which I discussed a bit in detail about it.
If you check my course, i have discussed this there in detail how this will be achieved via this approach. Good thing is the tinyURL system design is free preview chapter.
using base64 or base62 or base128 has nothing to do with using 6 or 7 characters. With 6 chars, you can use base62 or base64 or even any other base encoding as long as those characters are supported by HTTP request in the query path.
at 27:29 you say something like "in one of the online resources the author has used + and / characters but these are invalid characters". True but I would highly recommend not trying to be-little others in your videos. I can find several flaws in your videos also and I am sure other youtubers will also. But the professional ones will stick to the topic won't take it as an opportunity to criticize others work. I have seen some other Tiny URL videos where the author explained the concepts much better than yours. By the way - I also purchased your paid course and I am regretting the purchase. It seems like you started this project as a hobby project 3 years back, you are selling it online but more recently you have lost interest in it and there is not much content. All I see is those mock interviews which have little benefits. One shouldn't really be paying for that kind of content. There is plenty of free material online which is much better.
Thanks for the comment. I only mentioned that an online course mentioned these characters but using them are invalid. However I didn't mention any name so how am I be-little someone? I just highlighted a mistake to avoid. Also I appreciate your feedback about my course. There are already 10 chapters on design of 10 different systems like tinyurl, twitter, Netflix, Uber, stripe, distributed queues, etc. I hope you did go through them. So I do like to know what you didn't like. I do plan to add few more chapters on the design of object storage, distributed db, etc. before considering my course as complete.
@@ThinkSoftware I completely go with you as you jave not mentioned anybody's reference; moreover, saying something is incorrect is a better way to guide the bigger community.
Thanks for watching this video. Please let me know your feedback :)
Great Video as always.
Thanks for the comment :)
First time watched your presentation. It was very intuitive and good explanation. Thanks for your support to educate us.
hi. one question. If the app server gets say 1-100 ID range (or 1000-10000 ID range) for usage. How will they be converted to the 6 char alphanum key? If we do sha256 or anything, we again risk collision.
Just like you convert a decimal number (base10) to hexa number (base16), same will be done for converting it to base62/base64 whatever you choose.
Thank you so much for the amazing content you share, the way you explain the concepts are so crisp and clear. I am preparing for my Design interviews and bumped to your videos and I am just glued here. Its a one stop shop for all the amazing System Design tips and tricks. Waiting for more these kind of videos !!
Thanks for the comment :)
I have almost covered all of your videos please try to upload videos once in a week they are really helpful for us.
Thanks for the comment. Will start uploading more videos soon.
You have a very nice way of explaining. I like the way you support your logic with the reason. Great work!!
Thanks for the comment 🙂
Great content as always, I love the aspect of comparing against other solutions out there. Thanks!
Two advantages of having a KGS imho are:
a. The ability to independently change the logic of key generation and store the keys and deploy that microservice.
b. It can generate random short URLs (as against predictable ones in the counter logic), ensure there is no collision and store them.
thanks for the comment :)
@@ThinkSoftware I bought your course and the depth of analysis for each design was impressive, gave a lot of confidence for my FAANG interviews. The volume of content, in terms of number of design questions, was on the lower side though.
@@ganzee6928 Thanks for buying the course. Yes as far as the number of chapters are concerned, it is still far away from my target number of chapters and that is why I am giving unlimited access to all my students right now. There are three chapters in the pipeline 1. Stripe System Design, 2. Object Storage System Design and 3. Design of Distributed Datastore, with Stripe System Design coming very soon. As you can see creating quality content is very time consuming, specially when I myself being an architect in my current company is very busy in my primary job but then you have seen the quality of the content/chapters that I have written.
@@ThinkSoftware Completely understand the time constraint. Your course quality (specifically the depth and comparative analysis) is one of the best I have seen online. Looking forward to going through the new material when published. Thanks.
12:00 There's a standard CFT Lemma which says, if there are 2f+1 total nodes in a distributed system, then that system can tolerate atmost f nodes failure. So if changes are getting deployed to one server (which is minimum viable amount of servers which we can deploy things to), then that one server suffers some downtime while deployment, then the max nodes required to keep system running is 3. That's why we keep servers to a minimum of three. This doc can be used for reference, en.wikipedia.org/wiki/Quorum_%28distributed_computing%29.
Thanks for the comment :)
Please make more videos, i recommend you to introduce some GOOD & Important concept (BUT NOT THOSE things already introduced by other people) on your youtube videos. so that for low level sde, they can learn a lot from you. And latter, they will buy your course if they want to know more details!
Ur videos are the best. i am super willing to buy your course!
I will try my best
i really like the way you explain generate counter from data base .. nice explanation.. keep it up
Thanks for the comment
Excellent pace. Thank you!
You're welcome!
like your channel, your bottom up design skills are unique. thank you!
Thanks for the comment 🙂
Excellent job in explaining the concept and tradeoffs in different approaches.
Thanks for the comment :)
Thank you! I love your videos. Very informative and helpful. Keep making great content please!
Thanks. I need comments like this to get more motivation for making more videos 😁
I am thankful to have found this channel. Amazing video.
Thanks for the comment 🙂
Thanks for creating these videos. I really like your approach of asking questions and make audience think. Would it be possible for you to create a video on system design for google sheet?
super like... well explained
Thanks 🙂
Very Nice explanation sir.please add more design videos
Thanks for the comment 🙂.
Detail explained and more clarified ,
Thanks for the comment
@@ThinkSoftware github.com/Urunov/upcoding
I initiated coding every day, here is TinyUrl source code.
nice explanation. but countered based approach is not random, which is one of the functional requirement listed at the beginning.
Considering 3+ app servers and the way LB route requests to them and also you don't know how many requests are coming to the service from other users, it is somewhat random (although could be random in a range).
Very nice video.Thank you!
Thanks for the comment 🙂
The approach of using counter for generating short Urls would NOT be very optimal if user is given a choice to pick the short URL of their choice. In those cases, lets say 100-200 counters values could contain already used short URLs. The application server would need to check the database whether a value between 100-200 is already set or not in that case and if only free it would use it. Let me know your thoughts on tackling this extension of the problem
I think I have discussed this in the course :)
Thanks for the video. Also Create a video on tinder system design.
Have you seen th-cam.com/video/XFQIW2R_Klk/w-d-xo.html
1. Think over the Functional requirements.
2. Think over the Non-functional requirements.
Thanks for the detailed explanation. How do you ensure that the same big URL is assigned the same TinyURL if it's submitted multiple times?
Thanks for the comment. This was one of the requirements that we discussed that we are not going to return same tiny URLfor a big URL. Let's see what others will say if we change this requirement.
Excellent video.
Thanks for the comment
@@ThinkSoftware The pace you work in is very helpful for me. The way you approach the problems and the speed at which you work are both very beneficial for me while I study system design.
Thanks for the nice words 😊
Hi, thanks for video! Can you explain transition from 128bit to 21 characters in md5 encoding approach?
For base64 encoding you need 6 bits. So now if you start encoding every 6 bits in 128bit md5 hash to a base64 character then you need total 128/6 characters
job well done !!
Thanks for the comment
Great system design. Regarding API Design, I have been wondering if we should use the "redirect" approach and the get_long_url returns a 302 response or use the approach you are suggesting in your design. I am leaning towards what you suggested but it probably means we will need a layer of code to redirect after getting the long url. Curious, what are your thoughts on that.
In the non functional requirements, you said strong consistency. Can you explain how different components ( specifically datsstore/counter database) enable strong consistency during component failures? Thanks!
Will make a future video on the design of distributed datastore..will discuss this things there.
how are you linking generating a short url with counters/integer values? I don't understand the algo/logic for that?
If you check I have explained it in other comments.
Thankyou for the great content. Need some clarification over key generation.
suppose the counter value is 65535. Convering to base 64, it becomes "NjU1MzU".
Now, as you replied in some comment below that if the base64 converted value is less than 6 character then we can add 0 or z in the prefix to make it 6 digit.
But here for 65535 counter value, the base 64 is NjU1MzU which is greater than 6 in length. How will we make it 6 digit? If we trim some characters then it will be same issue like MD5 hash value trimming.
pls help. Thanks in advance. :)
What you did is string encoding of 65535. What you need to do is first the integer 65535 (base10) to an equivalent integer value of base62 or base64 (e.g. equivalent integer value in base16 is FFFF). Once you have the integer equivalent in base62 or base64 then you encode it in string.
Hi, Thanks for the video. I have one doubt regarding the counter approach. You mentioned that the app server ranges will be 101-200 or 301-400 or even 1001-2000, etc but if we convert any number within these ranges to its base62 representation, it will generate key less than 6 chars since you have decided 6 chars should be a key length in your requirements. How you are going to resolve this problem
100 can be considered 000100
@@ThinkSoftware I think even if you consider 100 as 000100 and then try to convert it to base 62, i dont think it will end up in a 6 character long key. Someone please correct me if am wrong
This is after conversion to base62. Another example let's say after conversion it is abc. This can be converted to 000abc or zzzabc based on whichever character is used for base62 encoded 0 (i.e. 0 or z)
@@ThinkSoftware First, thank you for the video, its very well explained. Following up on the above question, assuming we have counters in db which are regular decimal number, we then use base64 encoding on the ranges on app side , what if the counters goes above say a million , then the base64 encoded value of 1million is greater than 6 length, now taking a substring from them will be same issue like MD5 ,correct?Thanks again for the video
Base 64 with 6 chars can denote upto 6^64 values. You misunderstood how the base64 conversion is being done.
I think you can always skip auth APIs, internal applications integrate with internal identity brokers, and external application integrate with external identity brokers. You almost will never implement authentication from scratch.
Thanks for the comment
I've watched some of your videos and thanks for great videos. One thing I noticed is the voice quality is sometimes not good, too small or unclear.Maybe the microphone is too far or something.
Thanks for the feedback. Let me know do you find this issue in other videos as well, specially the newer ones. I am still learning to produce better videos day by day 🙂
What if the counter db fails? It is not fault tolerant in that case right?
How do we convert int value to 6 char short URL?
Coversion can look like following, considering base62:
0 -> 000000
1 -> 000001
9 -> 000009
10 -> 00000a
11 -> 00000b
35 -> 00000z
36 -> 00000A
37 -> 00000B
61 -> 00000Z
62 -> 000010
63 -> 000011
and so on.
Very well explained. Might not be reasonable question but possible. How do we scale it to some arbitrary large(billion) of txns per day?
Thanks for the comment 🙂. We scale by horizontal scaling our app servers and have database sharding.
Very nice eplanation, specially nice to see comparison of diff. ans. available, I have 1 doubt - in the final approach, you have told that we will keep a counter in the datastore, but since we hv around 68 Billion possible keys, then after some time won't the counter value reach 68 Billion? and store that huge number in DB would be a problem?
Thanks for the comment. Check 64 bit integer has 2^64 possible values.
@@ThinkSoftware Thanks, got it, 64^6 = 68719476736, whereas 2^64 = 1.844674407371E+19, which is very big, so 68billion integer number can be accomodated easily
Nice video. Question: How are spam and malicious links handled beside login requirement?
Thanks for the comment 🙂. This question would be handled in the in coming book 😉 What do how these links can be handled?
Why put usertoken in api signature? Its usually in the header.
Thanks for the comment. Here I am passing the userToken in the API to emphasize the need to pass the userToken. The API section describes the API from a high-level. At the REST api level the userToken will be set in the header.
As long as you take care of security/rate limiting and DDOS attacks, I don't think randomization is really needed
let's see what others think
17 minutes into the video and all I can see is a generic approach to making a distributed system scalable and available while the title speaks of URL shortner system design. Am I wrong to expect TinyURL specific designs throughout the video as opposed to the last part of the video ?
Thanks for the comment. This was my first system design video so I discussed several other concepts.
TinyUrl problem is a very generic very simple problem that introduces the viewer to how to think about scalability and how to handle availability. it's not a complicated problem by any means like Twitter or Netflix or Facebook. most of the concepts used here will be taken for granted if you try to solve the more complicated problems. however, in the more complicated problems you will solve another type of problems that will require you to make a simple design more complicated. it's a good starter to system design videos..
storing user as foreign key, does it not make the duplicacy of data to be high. As all 100 user can ask to create a tiny url for same big url?
if collision occurs what to be done (we are choosing the first 7 chars from MD5 hash + base64 + counter values) ??
We can try using another hash alorigthm or chossing specific index chars in output of (MD5 hash + base64 + counter values).
Issues:
1) But for every request we need to check in DB if that short url already exists or not.
2)Lets say we need same short url for every request for long url, in that case, let's say the first request has generated short url with the solution I mentioned above then for every next request for same long url it has to perform that soltuon (extra steps) to generate same url.
your opinion on this?
Thanks for the comment. I already discussed this issue and provided better algorithm in the video. May be you didn't watch the video complete.
Instead of relying on an external data store for getting valid count ranges why don't we just append a UUID to every input URL, hash it with MD5 and bas62 encode it? This reduces the possibility of collisions and eliminates the network overhead. The individual app servers can generate UUIDs independently
This I have already discussed in the course and in this video that MD5 hash can't be used because for 6 character short url, we have to drop the bits from MD5 which will result in conflicts.
Hey. This video was great. You have made all the concepts abundantly clear. I had a quick question tho. While sending the write request, we ask the user to provide the API key right? Can't we just use that along with the counter variable which keeps on incrementing until a unique URL is generated? I'd think that it might tick up the overall complexity?
Thanks for the comment 🙂. Are you suggesting about different counters for different users? This will not work due to conflicts and you need to keep increasing the counter.
@@ThinkSoftware I was just thinking using the user info and the url provided to generate the unique url. Not sure how that'd pan out.
@@ThinkSoftware Another quick question. I'm sort of trying to design this system in nodeJS. So I have created 3 heroku accounts for 3 different servers and one for exposing the API service. I was pondering if creating the global cache in the API server will be a good idea or not. Like we could expose the read operation for the global cache through the API. Whenever a read request is sent to any of the servers from the API, it calls the cache endpoint on the API and if no match is found, the server checks in its own local cache and then ultimately resorts to accessing the database. What's your stance on this approach?
You will start seeing issues when you have lots of short url in the database.
I think already discussed the pros and cons of local vs global cache in the video.
I don't really agree with the fact of having 3 servers running all the time just for anticipating maintenance or new releases. You can have 2 servers running and when you need to release a new version you just pop-up another instance and once it's successfully launched you remove one of the old instances and you do the same with the other instance. Kind of blue green deployment in a way.
Thanks for the comment. It totally depends on what type of infrastructure support your service has. If the underlying infrastructure provides support for adding host and removing host in a seamless manner and you don't have to go modify various configuration settings then you can go that route. Otherwise, it will be less hassle-free to have at least three servers (and I have rarely seen anyone using the route that you suggested in my 15+ years of experience). And also you need to understand the CICD pipeline that you have. We don't do deployment or maintenance once a month, nowadays the frequency for deployment is much higher, if not daily then still once a week or sometimes twice. Of course, when you are starting new (think a startup of two/three-person), you can still go with two servers because you might be at a stage where you are the only person (or two/three more) who knows when to perform the deployment. But I am talking, in general, about suppose if you are working in a cloud company at the scale of Stripe, Lyft, Uber, or FAANG.
Which datastore you will need to use for maintaining the counter? Will that also be stored inside the nosql db?
Any database that either support transactions or database sequence/counters.
@@ThinkSoftware Okay, so just for this single column, single row data value, do we have to deploy a distributed database like mysql here? since for rest of the system you are recommending to use nosql db which would not let you use sequences that easily...
@@sanketvictor You have many options. There are NoSQL databases that support transactions (light-weight transactions) and DB counters as well. You can use them as well. As a last resort, you can have a separate sql db deployment for this purpose.
Had a question on the range keys being maintained on each app server. Would multiple threads not have to synchronize with each other to get the next available offset in this range , leading to higher latency? I am assuming this counter logic is needed only when we run into collisions.
We would be using interlockedIncrement() to get the next counter value in the range
can we have a separate database instances for reading and for writing? So that lookup and creation can happen at the same time
This goes into other detail discussion about how the replication is happening between the DB instances (synchronous vs asynchronous)...then what are the consequences if we are using synchronous vs asynchronous replication on the reads and writes.
If you're bulk pulling the IDs per web server, how are you safely sharing and incrementing that value between req?
Thanks for the comment. No one is bulk pulling IDs. It is just a counter that you increment atomically and get at the DB level by some threshold (e.g. 100 or 1000) and then when the app server get the counter value it can assume that it can give away Ids in the range [countValue, counterValue + threshold)
You mentioned that we want high availability and strong consistency. How can it be possible as per CAP theorem
I think I have to make a video on it soon. People have misunderstood CAP theorem. First of all they don't understand in which scenarios CAP theorem is applicable. Secondly, according to CAP theorem, in scenarios where it is applicable, only in the presence of a partition, you can't have both high availability and strong consistency and you have to give up one for the other. It does not mean that when there is no partition, still you can't have both. And there is a third reason also, which I have answered in other comments before as well.
What if there are too many write requests which end up lots of app servers needing to fetch counter range, resulting in write failures and overload on counterDB ... In order to avoid this scenario, do you think instead of using a counter stored in DB and app servers going to it directly, we use a service (counter service) that pushes counter-ranges to a queue and app servers just pick a range from that? It can keep pushing to the queue at regular intervals. This design is more complex is only needed if there are too many write requests that we need to handle ..
We can always increase the range size if that is happen. With 100 as size, it means that servers are going to DB only 100th time and that the scenario you are mentioning is when somehow all the app servers exhaust their range of 100 at the same time.
Thanks for your video. one question, when you mentioned the non-functional requirements (@5:32) should include high available and strong consistency, but based on the CAP theorem, we can only choose one from them, how can we keep high availability and strong consistency at the same time?
Thanks for the comment. As we discussed, the service has strong consistency in the sense that once we created and returned a short URL to the user, if the user queries the big URL using that short URL, the service should be able to return the big URL. Otherwise it will not be a good user experience. Now the question arises that how a service is available and consistent at the same. It boils down to the fact that no service is 100% available. Question arises whether we are targeting five 9s or four 9s of availability and again it also boils down to whether we are talking about availability of the whole service or some part of it. The service itself can be highly available (as a whole) and at the same time fulfill consistency requirement as a whole (keyword is "as a whole") however the individual partitions (in the datastore) may have to prefer either availability or consistency. Now it is totally possible that one partition could be down due to network partition but it won't affect the functionality of other partitions, thus overall service still considered as highly available. And please note, I didn't discuss the design of datastore where we would discuss all these things (which will be a topic of a future video). I hope this clarifies.
@@ThinkSoftware Thanks a lot !~
I would like to add on to what has already been said - the question of choosing between availability and consistency arises only in cases of a partition. Otherwise in a normal running scenario a service could be both available and consistent. Now let's discuss the partition scenario . In this scenario . we could make the read service to be highly available while the write service might become unavailable in event of partition thus maintaining the consistency . So different microservice of a system could be configured differently to support availability or consistency ! Hope this helps
Thanks for extending my answer with great explanation 👍
@@ThinkSoftware I would think, eventual consistency should be OK. As most of the time, the short URLs are created may not be immediately queried. The flow would be user creates a short URL, posts it to his followers and the followers may not click on it immediately. If we use a NoSQL Database like Cassandra, we can improve Read Performance tremendously
Thanks for the video but I have a question for you.
How is this approach any different from zookeeper approach. your data source is still a single point of failure whereas zookeeper on the other hand is considered to be more reliable and high available.
The datastore is not a single point of failure. I didn't discuss this in this video but it is already discussed in the course that you can check out. Now regarding using zookeeper to generate the counter values, I think people suggest that without even thinking what it means. In an interview, you cannot just say that you will use zookeeper to generate the counter values, and zookeeper will magically handle generating this value and also provide high availability (i.e. no single point of failure). Generating a unique sequence is the crux of this tiny URL system design and an interviewer will ask you then how zookeeper achieve this. In practice, nodes/machines in the zookeeper fail and they fail all the time. In this case, when one server in the Zookeeper cluster fails, somehow the system needs to make sure that the others don't return a duplicate range. The only way to do that is to make all servers agree on which ranges have been given, and which have not. Now this lies in the distributed consensus problem. In fact, zookeeper uses the one as well for this. Now, if you want to use zookeeper just to assign different counter ranges then this is a bad use of zookeeper because you can achieve the same by just providing counter ranges in some configuration form where all the servers go and read from a file uploaded somewhere to get their ranges. You don't need to run and maintain a separate service for it. This was a long answer as it was not easy to explain in a few words. I hope it clarifies your question.
Wouldn't it be a good idea to put Token in GET request as well to prevent attack such as DDOS?
which token are you talking about?
Can we use timestamp and node id instead of maintaining the token counter ?
What do you think what are the issues in your approach?
Will global cache will also trigger invalidation of app server cache ??
Because lets say in app server 1 you have not found sentiment stored as there was no mapping after going to db.
And after some time on app server 2 for same mapping write request came then global cache will have mapping after going to db
Now lets say for different user request came on app server 1 since we have not found sentiment store for same mapping we will return wrong data.
So if any write on global cache happened it should update all app server to invalidate cache entry for that mapping ?
We cannot use local cache in that case. There is no way to invalidate local cache when write happen. I think I already discussed that in the video.
If one of the app-server goes down, won't we lose a range of usable values for url generation? I would try to think of a way to reclaim a range
Its a very small trade off. 7 characters can generate 68B combos and we can easily afford to loose some chunks in case of server failures. Also, when the cron job runs to delete expired tiny url mappings, we can actually add these tiny urls back to the table containing the available urls so that they can be reused.
are you suggesting to convert long number value to bytes and these bytes to base64 encoded string.
if yes then if we convert 67billion = 67000000000 to base64 encoded string it returns this value "AAAAD5mC3gA=" which is of length 12. Then how do you shorten it length 6 again?
or lets say a long number in java is a 64 bit number, so with base64 encoding with 2 padding it will become 66 bits. So now 66/6 = 11.
So the encoded string length can be 11.
long n = 67000000000L;
final ByteBuffer allocate = ByteBuffer.allocate(8);
allocate.putLong(n);
final byte[] bytes = allocate.array();
final String s = Base64.getEncoder().encodeToString(bytes);
System.out.println(s);
Both the approaches you have mentioned are not correct. Consider a base 10 number 15. When you convert it to base 16, it becomes F. 10000 in base 10 become 2710 in base 16. In the same way you convert an integer to base62 or base64 which ever you choose. More information in the course. There is free preview chapter on tinyurl design.
Question - in your approach to generate URL using counter maintained in DB (discussed at 38-40 mins in your video), I'm unable to understand one thing. Let's say we use counters to generate number range but we are limited to 6 characters that means we can only generate shortlinks from 0 to 100000 and we will exceed 6 character limit. How can we apply 62 power 6 combinations. Can you please explain with a sample URL how will it look like?
6 characters use base62 encoded number and not decimal number which is base10 encoded. 100000 that you mentioned is base10. We will convert it to base62 which will be way smaller.
@@ThinkSoftware are you suggesting to convert long number value to bytes and these bytes to base64 encoded string.
if yes then if we convert 67billion = 67000000000 to base64 encoded string it returns this value "AAAAD5mC3gA=" which is of length 12. Then how do you shorten it length 6 again?
or lets say long number in java is a 64 bit number, so with base64 encoding with 2 padding it will become 66 bits. So not 66/6 = 11.
So the length can 11.
Can't we just use the Semaphore concept to keep the counter value so as to avoid any conflict?
Semaphore mutex is used in single codebase. Here we have multiple servers. If want to use mutex in DB side or in cache the problem will the the same. That Db or cache will become bottleneck. So that's why he's incrementing it by 100, to reduce load on DB.
You talked about updating the counter values inside a transaction construct. But then we're going to store this in a No-Sql datastore, is this transaction construct possible in DynamoDB or Cassandra like DBs? Maybe Spanner or cockroach offers this but my question is in a generic sense. Thanks for the Awesome work!
Thanks for the comment. Other databases provide similar constructs as well. E.g. Cassandra provides atomic writes and light weight transactions for updating a single record etc.
@@ThinkSoftware but between two tables how will u acheive transacation in cass.
@@arunagiriswarane1155 which scenario in our design requires updating two tables in a transaction?
@@ThinkSoftware When generated new mapping between short and big URLs, we have to both add it into mapping table and update the opt URL count in user table. Should these two write-operations be done in one transaction?
@@amandama6165 I agree this is a problem. Also, lightweight transactions require a quorum so it's not exactly scalable.
Does the application server save the long URL to the DB as soon as it creates it? (before returning to the user). Otherwise if App Server 2 gets a request to fetch a long URL, it might think it does not exist, because App Server 1 didn't save it to the DB
Yes. This I think I have discussed in the video as well.
@@ThinkSoftware but what if app server 1 can create 1-101 urls and app server 2 can create 101-200 urls. app server 1's 99th url creation request is for url www.123.com. The server creates the shortened url. App server 2 gets a request to shorten the exact same url app server 1 did some time later. The request is app server 2's 102nd request. Now one long url has two different short urls. How does the system not create many different short urls for the same long url without a url shortening service? Or are the system designers accepting the cost and overhead of having many different short urls for the same long url?
It all depends on the requirements and design changes if the requirements changes. Here in our requirements we didn't have any requirements for always have a single short url for a big url. Instead our very first functional requirement is always return unique short url if you check again.
Hi @ThinkSoftware, I have a question. let's suppose we have 3 app servers, and we try to deploy a new build. obviously, we can't deploy a new build on all the servers at the same time, so now if we go one
by one then it's possible that a single user try to access our service gets a response from a machine where the new build has been deployed and if the user tries to access the same service gets a response from another machine(where new build-id not there yet, as it's gone routed by load balancer) is this a valid case. in a real-world how this thing works?
Yes this is always the case. So any new feature deployment need to make sure that you don't enable the feature till the department is complete.
@@ThinkSoftware I am not sure, how this will be possible in every case. I know spring cloud provides us a way to keep our configurations in a central place & we can make use of it to make feature enable after deployment but let's suppose we are trying to deploy a front-end app(server-side rendering) and my change was just to change the background color of a particular div/model from yellow to blue, how can I enable the feature only after deployment in all the app-servers?
It depends from what are you trying to achieve. If we just consider your example, you should be come up with an intelligent way to just update the background color without even needing the deployment. In reality, when we are working on cloud service, our feature work include design, implementation, testing and even deployment strategy that is most suited for that feature enablement.
@@ThinkSoftware Got it, thanks a lot for this awesome video
@@rahulbatheja7933 The answer to your question is blue-green deployment. In cloud anyway you have the feasibility to increase the instance as you require. So while your blue server is catering the customer request , you can do the deployment in green server and once the deployment is complete, this green server will take place of blue server and that transition is very smooth. This is called rolling update where you will not see any service disruption.
counter value can go out of range ?
A 64 bit long integer has long way to go if you calculate it and that at that time counter going out of range is least of what you should worry about 🙂
Sorry, what is the algorithm for short url? how to use the id from db to get short url?
This is discussed in the video.
@@ThinkSoftware Could you pls point me to the time in the video talking about this?
I have to watch the video now to get the time as it was long time ago when I upload this video.
If the datastore is a single machine, will that beat the purpose of high availability if it is down?
Of course
@@ThinkSoftware in your example, datastore is a single machine or you just use single drawing to show datastore?
it is not a single machine. It is just logical diagram of data store. If we would have gone inside that logical component then the discussion could go about how to achieve all non functional requirements in that data store. Check my other video on intro to distributed systems in which I discussed a bit in detail about it.
There is single point of failure using single db for generating keys
Thanks for the comment. There are various ways to handle it. Will be discussing in the course.
In the counter example with datastore, is it not a single point of failure ? How to fix that in case of counter in db ?
Thanks for the question. This has been discussed in the course. Short answer: master-slave architecture.
@@ThinkSoftware what if master fails before the new incremented value is replicated to slave ?
one requirement will not be achieved using this approach sir.
- short url should not be guessable.
If you check my course, i have discussed this there in detail how this will be achieved via this approach. Good thing is the tinyURL system design is free preview chapter.
if we increase the size of tiny url to 7, then can we use base64 encoding? or we have to use base128?
using base64 or base62 or base128 has nothing to do with using 6 or 7 characters. With 6 chars, you can use base62 or base64 or even any other base encoding as long as those characters are supported by HTTP request in the query path.
at 27:29 you say something like "in one of the online resources the author has used + and / characters but these are invalid characters". True but I would highly recommend not trying to be-little others in your videos. I can find several flaws in your videos also and I am sure other youtubers will also. But the professional ones will stick to the topic won't take it as an opportunity to criticize others work. I have seen some other Tiny URL videos where the author explained the concepts much better than yours. By the way - I also purchased your paid course and I am regretting the purchase. It seems like you started this project as a hobby project 3 years back, you are selling it online but more recently you have lost interest in it and there is not much content. All I see is those mock interviews which have little benefits. One shouldn't really be paying for that kind of content. There is plenty of free material online which is much better.
Thanks for the comment. I only mentioned that an online course mentioned these characters but using them are invalid. However I didn't mention any name so how am I be-little someone? I just highlighted a mistake to avoid. Also I appreciate your feedback about my course. There are already 10 chapters on design of 10 different systems like tinyurl, twitter, Netflix, Uber, stripe, distributed queues, etc. I hope you did go through them. So I do like to know what you didn't like. I do plan to add few more chapters on the design of object storage, distributed db, etc. before considering my course as complete.
@@ThinkSoftware
I completely go with you as you jave not mentioned anybody's reference; moreover, saying something is incorrect is a better way to guide the bigger community.