Grokking the Uber System Design Interview - Ride Sharing Service Design | OLA System Design

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 พ.ย. 2024

ความคิดเห็น • 217

  • @ThinkSoftware
    @ThinkSoftware  4 ปีที่แล้ว +19

    Thanks for watching the video.If you need to, you can play the video at 1.25x speed instead of skipping the video and in that case skip some important point. Please let me know in the comments below if you like this video. Thanks.

    • @winning_aadict
      @winning_aadict 4 ปีที่แล้ว

      This is awesome. Thank you sir

    • @motivationmentoring2298
      @motivationmentoring2298 4 ปีที่แล้ว

      Thank you it is amazing work keep it up

    • @vishalsarda9764
      @vishalsarda9764 4 ปีที่แล้ว

      Does the map service store some data? If yes, what data and in what format?

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Map Service stores map data. There are different formats of map data that you can find the details in the course :)

    • @namduong1580
      @namduong1580 3 ปีที่แล้ว

      Great tip, thank you!!

  • @neosapien247
    @neosapien247 4 ปีที่แล้ว +7

    This is the best video for this topic. A lot of videos describe how Uber is designed, not how you, the "candidate" would design Uber. They are not looking for your understanding of Uber's real architecture. They want to know how creative and logical YOU are if given the task. This is the perfect tutorial for that.

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว +1

      Thanks for the comment 🙂

  • @snehalchavan5796
    @snehalchavan5796 4 ปีที่แล้ว +3

    The only reason I watched this video was because I wasn't satisfied with the solution in 'grokking the system design interview'.
    I must say that you have done a very good job at explaining the ride share system design and it'll definitely help me in my interviews.

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks for the comment 🙂. You should also check my course on System Design Interviews Bible then.

  • @TarunPrasadKottary
    @TarunPrasadKottary 4 ปีที่แล้ว +7

    Finally, some fantastic videos on System design!

  • @bostonlights2749
    @bostonlights2749 4 ปีที่แล้ว +3

    For those who made it till the end,
    This is quality stuff

  • @zvezdalion
    @zvezdalion 2 ปีที่แล้ว +1

    Learning so much about how to think about system design. Your style of teaching really emphasizes that, rather than trying to cram all the specifics of how this or that software was made. Thank you!

  • @binitasinha2695
    @binitasinha2695 4 ปีที่แล้ว +1

    I have gone through many system design videos but only subscribed to your channel as I never have seen such an informative video ever. Thank you for sharing your knowledge. kudos

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Many thanks for the nice words 😊

  • @Brandonstiles
    @Brandonstiles 4 ปีที่แล้ว +4

    Great video! Good call on pointing out the need to pre-plan analytics and monitoring

  • @chanm2147
    @chanm2147 10 หลายเดือนก่อน +1

    Hello, what is the algorithm you mentioned at 53:28 to find the global optimized solution? Is it K-nearest neighbor?

  • @tarun29061990
    @tarun29061990 4 ปีที่แล้ว +1

    Wow!! Such a clear explanation about each and every service. Great work mate :) I have watched lot of videos on Uber system design but your video made it super clear. Great part is you compare the pros and cons of each approach. Please keep doing this in your future videos. Thanks :)

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks for the comment :) More videos will be coming soon. Till now I was busy in releasing my course.

    • @tarun29061990
      @tarun29061990 4 ปีที่แล้ว

      Think Software share the course link as well

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      @@tarun29061990 I have my latest video discussing the course with some discount coupons.

  • @sivakumaresan9602
    @sivakumaresan9602 3 ปีที่แล้ว +2

    Excellent video! Thank you for the detailed explanation.
    I do have one comment. At 59:30 you say that the 'Global index' method of maintaining secondary indexes would allow all records belonging to 1 driver to exist within 1 partition. That may not be the case since the data is partitioned based on the primary index, i.e., Trip ID. Hence, there is no way for us to enforce all data belonging to a driver to exist in the same partition.
    What makes 'global index' method different is that it would store the secondary indexes in a separate location (outside the data partitions) that would allow the queries to be routed to the exact partition(s) where the data exists. In other words, the mapping between each driver id and the location within the (Trip ID-based) partitions is stored separately as a Global index. Similarly there will be another global index for passenger ID. These global secondary indexes can be further partitioned based on it's values, i.e., based on Driver ID or Passenger IDs. I guess you were referring to my last point in your video, but it came out different. :) Also, note that 1 driver ID could exist across multiple (Trip ID-based) partitions. The global index for driver ID will let us know what partitions to look for, instead of doing a scatter-gather approach with Local indexes.

  • @shabnamkoosha
    @shabnamkoosha 3 ปีที่แล้ว +4

    By far the best design video I have watched. Very detailed and organized. Thank you.

  • @deepakchhabra5932
    @deepakchhabra5932 4 ปีที่แล้ว +1

    To answer your question in the end which Partition approach is preferable for secondary indexes: Local vs Global - then the obvious choice is global because if we want to query by tripId we already got a PK based partition, and when we need trips for driver or passengers then we should have Global index so that we don't do scatter gather, idea is to get to the details quickly. Now it also means my writes to trip table will be slower because of additional global indexes but Trip table is not so time critical like driver location or pricing etc, as long as trip creation is successful we can tolerate little bit of delay here, also, we picked relational DB as a choice for Trips because we want consistency so if we made that decision right then little bit of slowness is acceptable. I'm not sure if at DB level some kind of optimization is possible where secondary indexes could be built async- could be explored. Nevertheless, great content as always, tips you discussed are nice. Amazing work. Thank you.

  • @mohajeramir
    @mohajeramir 2 ปีที่แล้ว +1

    Really appreciate you doing this. There is a wealth of information here.

  • @RY-it3de
    @RY-it3de 3 ปีที่แล้ว +1

    LMAO the shade thrown at the Grokking course about the Driver Location service discussion. I too was confused about the QuadTree / HT suggestion

  • @dhwanip1000
    @dhwanip1000 2 ปีที่แล้ว +1

    Thanks for the wonderful video. My question here is what areas to focus on in actual interviews?. Suppose, I am asked this question in the interview, how would you suggest to start and dive deep into each part? Also, not everyone would know everything about these services, in that case, how would you advice a candidate to tackle that situation?

    • @ThinkSoftware
      @ThinkSoftware  2 ปีที่แล้ว +1

      I have discussed this in some of my other videos and course. You should also look at mock interview videos that I have uploaded in this channel and the course.

  • @dhiwakarna8509
    @dhiwakarna8509 ปีที่แล้ว +2

    Highly appreciate your content, learning tons! I have a question around the routing/proxy service. Shouldn't its responsibility be very high level? (like authenticating all requests, rate limiting and routing requests to appropriate services). Here it seems to do very specific things like create a user connection object and maintaining it which is needed for a specific use case (trip matching). can you shed more light on this?

    • @ThinkSoftware
      @ThinkSoftware  ปีที่แล้ว

      I believe this is discussed in the course. The routing service is nothing but an API gateway. It performs all the functionality of an API gateway. However, not all API gateway support websockets or long pull. In order to support them, the architecture discussed in routing service is required.

    • @dhiwakarna8509
      @dhiwakarna8509 ปีที่แล้ว

      @@ThinkSoftware That makes sense. In a way we are doing this. In order to maintain a websocket, the request needs to stick to a specific instance of the api gateway and that is where all this book keeping comes in. The api gateways are not stateless here.

  • @AkashJadhavIT
    @AkashJadhavIT 4 ปีที่แล้ว +3

    Thank you for all these amazing videos on systems design. We have learned great concepts from you sir.
    One request from my side. Would it possible for you to make a series on "object oriented design interview". eg. 'Designing the parking lot'.
    Any resources you would recommend for the same?

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว +3

      Thanks for the comment. I didn't find any good single resource to cover all topics and that is why started this channel. Right now you can find the information spread all over internet and even not complete. I have also noted your request and will make video on object oriented design soon.

    • @AkashJadhavIT
      @AkashJadhavIT 4 ปีที่แล้ว +1

      @@ThinkSoftware thanks. looking forward to hear from you

  • @arpitamathur4859
    @arpitamathur4859 3 ปีที่แล้ว +2

    In the Grokking the system design course, i think it will get the initial positions of the drivers from the quad tree and would then subscribe the customer to those drivers. Hence later it can use the DriverLocation HT to get the updated locations.

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว +1

      There are many loose ends. First of all the main purpose of keeping DHT was that it is slow to update/query the Quadtree. Then it means Quadtree may have stale data. Then secondly how the subscription will work? And then thirdly how to find nearby drivers to send trip requests?

    • @Funsiestype
      @Funsiestype 3 ปีที่แล้ว +1

      Initial set of drivers don’t matter. You can show some random fake cars. What’s the purpose of it other than exposing useless privacy info. You have to ask clarifying questions for requirements. Just because you can doesn’t mean you should.

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      Exactly ^

  • @kaushalrjain6326
    @kaushalrjain6326 4 ปีที่แล้ว +2

    Fantastic Explanation. Was waiting for this. Really helpful. Thnx.

  • @wayneli7814
    @wayneli7814 4 ปีที่แล้ว +1

    Hi @Think Software, thanks for your great video! I want to share some of my thoughts about the drive location service, you make an assumption before you calculate the grid id from Long/lat, which is the grid size is fixed. But in practical, the grid size is dynamic(because the density of different area is different), so it should be very hard to get the gridId by long/lat in constant time. I would say the best way of the lookup service is still to transverse the quadtree, but what we can do is to find a way to optimize the update issue. What is your thoughts?

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks for the comment. Usually the grid size will remain the same but the search size would be increased in that case and please note the grids which are empty will not have any record in the in-memory datastore/cache.

    • @wayneli7814
      @wayneli7814 4 ปีที่แล้ว

      Think Software I see, thanks for your reply. That makes sense, we can have the grid to cover a very small area, and don’t store the grids that locate in ocean. BTW, I just checked more info about google s2, looks like they are using the “Hilbert curve” algorithm to convert the coordinations into a 64 bit number, then they can find the nearby grids by just simply increment/decrement the number.

  • @MrManjax
    @MrManjax 2 ปีที่แล้ว +1

    Excellent design video with a number of useful details covered 👌 Thanks for your efforts.
    In regards to your question about partitioning Trip info, I would suggest your approach wherein you use a combination of {Trip id, driver id} to identify the shard/partition. Since it's more likely to query all trips related to a driver, it would enhance the locality of data lookup and have the potential of improving System throughput.
    As regards your other question about handling hot zones in a busy metropolitan area, it would be great to incorporate something like this -
    Define a certain threshold of the number of drivers within a certain area at the passenger's request. If the lookup discovers a greater number of drivers, it should automatically sub-divide that particular cell into a finer granularity (microcell?) in order to find the most optimal driver match to provide a response to the Dispatch service.
    BTW, would it be possible to publish a picture with all the services (along with their sub-services) called out in one place along with the cache/datastore they are tied to?

    • @ThinkSoftware
      @ThinkSoftware  2 ปีที่แล้ว

      Thanks for the comment 🙂. More details are in the course.

  • @navoriion
    @navoriion 4 ปีที่แล้ว +1

    Takes lot of effort & time to make videos. Keep up the good work. Do you read company blogs to understand the architecture of the system ?

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว +1

      Yes making a video usually takes atleast 3-4 weeks.. 2/3 weeks for all the research. 1/2 weeks to come up with my own design and solution for the problem and then 1 week for video production/editing/upload etc.

  • @manju528
    @manju528 4 ปีที่แล้ว

    I liked your system design videos! They all are explained in very simple language. I yet to finish watching all of them though. Thanks for making these videos.

  • @sthirumalai
    @sthirumalai 4 ปีที่แล้ว

    Hi, The presentation and thought process was excellent.
    On scaling the driver location module wrt. the grid service. In case of hot nodes which are partitioned based on the grid id. I think we can go for the auto scaling on the compute layer where as on the data layer which serves the grid vs drivers key/val can't we shard the grid by adding more precision to the lay/long pairs? By that way the hot data will be distributed even to more shards and the app may scale well. Please correct me if I am wrong.

  • @npriyanka89shivu
    @npriyanka89shivu 4 ปีที่แล้ว +2

    Amazing explanation. Thank you!

  • @brgowd
    @brgowd ปีที่แล้ว +2

    Very informative

  • @pujabhandari5471
    @pujabhandari5471 ปีที่แล้ว +1

    Very informative and interesting video I really like it

  • @rng19
    @rng19 2 ปีที่แล้ว +1

    Thanks a lot. Your videos are very helpful.

  • @TheRohkan
    @TheRohkan 3 ปีที่แล้ว

    the right amount of depth .. thanks for this

  • @karanbhatia2834
    @karanbhatia2834 ปีที่แล้ว

    I have a question regarding the routing service.
    We have 2 types of servers: proxy servers and the dispatch servers. What is the reason for having these 2 types? What is the difference in their role that they are categorized separately> Can't we just have one layer of servers that basically talk to other services in the system?

    • @ThinkSoftware
      @ThinkSoftware  ปีที่แล้ว +2

      Proxy servers receive request from the client. They are sitting behind load balancer and forwards the request to the internal microservices based on the endpoint used in the API call. Dispatch servers, on the other hand, receive requests from the internal services (when internal services want to send message to some client app). A request from an internal service goes to a dispatch server behind load balancer (load balancer on the internal side facing internal services). The dispatch server need to dispatch the request to the right proxy server where a websocket connect or a long pull request is held for a particular client.

  • @sunny-14689
    @sunny-14689 2 ปีที่แล้ว

    Nice video, can a driver be assigned to multiple nearby gridids, may help avoid larger area query if none found in first attempt since a driver can belong to multiple lists thereby increasing chances of finding nearby driver.

  • @metallaser4409
    @metallaser4409 3 ปีที่แล้ว +1

    Nice video! Can you explain the difference between an API gateway and routing service? Would you still have an API gateway in your architecture to route requests to either map/user/routing service?

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว +1

      You can consider them as same.

    • @metallaser4409
      @metallaser4409 3 ปีที่แล้ว

      @@ThinkSoftware Thanks for responding. If I were to use Amazon's API gateway, would that be able to handle sessions/websocket connections for me? I thought API gateways were stateless. Is that incorrect?

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      It depends on the implementation. I think Amazon API gateway supports websockets. Also what do you mean by stateless here in case of API gateway? To me it seems like you have wrong understanding of what stateless mean.

  • @kamalsmusic
    @kamalsmusic 4 ปีที่แล้ว

    So regarding using the global in memory cache to store transient information... we would have to make the cache quite large to ensure that the data is not evicted a lot, right? If there are frequent evictions, then we may have to re-determine the drivers location from the client and then insert again into the cache?

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Yes. The other option would be store this in database but that would put a lot of pressure on the database. So basically, the distributed in memory case here is used to store transient information as there is no point storing them in the database and so the purpose of using cache here is not to remove pressure from DB as there is no DB but cache here itself is being used as transient data store.

  • @priyavenkatesan1804
    @priyavenkatesan1804 4 ปีที่แล้ว

    Excellent explanation! One small improvement I would do wrt to driver location service is to use slippy tiles for id calculation. No biggie. Learnt a lot. Thank you very much

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks for the comment. How would you figure out the tile or grid Id if using slippy tiles? And how would you know the id's of neighboring tiles if using sleepy tiles?

    • @priyavenkatesan1804
      @priyavenkatesan1804 4 ปีที่แล้ว

      wiki.openstreetmap.org/wiki/Slippy_map_tilenames#Tile_servers.
      Key in the cache will be the tile number and value will be a list of drivers in that tile. Same as you have, except that the key is a number. Does that work?

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Is there a way to calculate tile Id from location without using some lookup table? Also i think all the tiles should be assumed to be at the same zoom level which is then nothing but equal to what I have suggested?

    • @priyavenkatesan1804
      @priyavenkatesan1804 4 ปีที่แล้ว +1

      Yes, we can write a small function that returns the tile id based on lat, lon. True, the zoom level will be the same. Like you say .7 * .7 sq m might map to zoom 15 or so. The approach you have is great. The main advantage of this calculation is that you can use a number as the key. May be this helps with partition. I can't tell. The other advantage is offers is in a situation where you have dense operation, you can dynamically recalculate the tiles based on zoom level. Say you decrease the zoom level, then you might have more drivers mapped to that area. The negative to this approach could be the time taken for dynamic recalculation. I haven't determined the cost in this case. However, in your approach you are quering for more neighboring tiles. This is not bad either. Just a different approach.

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว +1

      OK thanks for clarification 🙂

  • @OmkarGhanekar
    @OmkarGhanekar 2 ปีที่แล้ว +1

    Once the websocket connection is established, for every new piece of data that the client is sending to the service, how does the LB ensure that the data gets sent to the specific server which is holding the websocket connection for the user? I'm assuming a dumb RoundRobin LB can't be used in this situation.

    • @ThinkSoftware
      @ThinkSoftware  2 ปีที่แล้ว +2

      Thanks for the question. Short answer: yes a LB need to support websocket. If a LB does not support websocket then it cannot be used in this scenario. Long answer: for this I may have to make a video :)

  • @SvetlanaTokarev
    @SvetlanaTokarev ปีที่แล้ว

    Thank you very much for your videos. Good explanations. Just One thing you are not discussing about selection type of db as part of design. Why?

    • @ThinkSoftware
      @ThinkSoftware  ปีที่แล้ว

      There are other videos discussing the DB selection. Thanks

  • @大盗江南
    @大盗江南 3 ปีที่แล้ว +1

    Buddy, keep doing these amazing videos plz on low level design too.
    A question plz, for sd1, i think those concepts covered by your video are enough, right? for sde 2, i will buy your course ^ ^

  • @GokhanArik
    @GokhanArik 3 ปีที่แล้ว

    I don't think they proposed distributed hash table to be used in the lookup. Their argument is the driver reports location every 3 seconds, but we update QuadTree every 15 seconds. 15 seconds is still a short time and drivers' location will not change much. To get nearby drivers we will still use QuadTree and hash table will only be used to keep QuadTree up to date. Correct me if I'm wrong

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว +2

      I don't know if they have updated the text in between. The video is several months old now. Right now if they are storing in DHT to be update Quadtree every 15 sec later then it would be better to let the driver send the update every 15 sec directly to Quadtree. The reason for keeping to a DHT would be if they want to batch update the Quadtree but they didn't talk about batch update.

    • @MrAnshulji
      @MrAnshulji 3 ปีที่แล้ว

      Also the sharding for quadtree was based on driver_id (stored in leaf nodes), so it won't be like one node of the quadtree is on one server and the child node is on another. It was more like, the quadtree would exist in its totality on all servers, but some servers would have a subset of the driver ids at the leaf node. And eventually scatter-gather could be used to retrieve the drivers.
      In any case, that approach has been rendered inefficient by this grid-based solution that you described. I wonder why no one thought of this or suggested this C-Squares based approach in any of the other design videos.

  • @privacywanted434
    @privacywanted434 4 ปีที่แล้ว

    This has been a great explanation video!

  • @rishabhsharma1053
    @rishabhsharma1053 4 ปีที่แล้ว

    Awesome tutorial and explanation of the system!!! I have a doubt regarding the partitioning based on both driver & rider ids to be able to trip history for drivers & riders from a single partition. Will this implementation mean that entire trip history data would be duplicated across shards?

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks for the comment. What do you think?

    • @andrearmstrong3715
      @andrearmstrong3715 4 ปีที่แล้ว

      It seems to me that there will indeed be replication, but that's totally fine. Its a classical case of time vs. space complexity. It should be straightforward for a user, be it a rider or passenger, to find all of his trip history within the same shard.
      That being said, another option is to shard by timestamp, which could prevent data duplication. However, shards holdinh newer trips would get most of the load, so it is not a great design.
      I would combine the first sharding approach (duplicating data) with a local LRU cache of trip history, which benefits users who check it out constantly.

  • @amit1114
    @amit1114 4 ปีที่แล้ว

    Awesome explanation of the system. Thank you for sharing your knowledge!
    Regarding Routing Service: How would you make routing service highly available if you are using long polling or websocket and your passenger/driver is connected to 1 App Server. If that app server were to go down, all passengers/drivers connected to that App Server will loose connection and the data in Global Cache will be incorrect.

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks for the comment. Regarding your question: there are two things that will be happening. Once an app server goes down then the client app will see that the web socket or long poll connection has been terminated so in that case will retry to reconnect and this time the connection request will go to some other app server by the LB and that app server will update the cache. If the app does not come to connect then there will be a stale entry in the cache. When dispatch servers will try to use that entry to send notification to the client app, they will know that since the app server is down so the entry is stale and at that time entry will be removed from the cache.

    • @ankuj2004
      @ankuj2004 4 ปีที่แล้ว

      @@ThinkSoftware what happens in a scenario, where the proxy server connected to client dies. The entry in cache is stale. The dispatch server comes back before a new connection is established .

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      when the dispatch server will contact the proxy server which has just come up then the proxy server will return connection not found for that client and then dispatch server will invalidate the cache entry if it is same as what it read before contacting the proxy server.

    • @ankuj2004
      @ankuj2004 4 ปีที่แล้ว

      @@ThinkSoftware but dispatch server needs to respond back to the client. Even if it invalidates the cache entry, how will it communicate back the request response to client. It cant keep holding the response till the time entry in the cache is updated

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว +1

      In that case it will just send a push notification to the client app that there is new event.

  • @anuragkothare6181
    @anuragkothare6181 4 ปีที่แล้ว +1

    Good Explanation.

  • @iamavinashdaipule
    @iamavinashdaipule 4 ปีที่แล้ว

    Thanks for this amazing video. It answered so many of the questions I had about uber design or design in general. What happens in the driver location cache when the list of drivers is huge for id_latitue_longitude-->[list of the driver's id is 50,000 or more] does the cache have the capacity to store that many drivers ids in a field?

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว +1

      Thanks for the comment. The grid size is way smaller to have 50k drivers in it.

    • @iamavinashdaipule
      @iamavinashdaipule 4 ปีที่แล้ว

      @@ThinkSoftware Thanks for the reply, yes 0.01 *0.01 is a small area.

  • @尼安德鲁-n6j
    @尼安德鲁-n6j 2 ปีที่แล้ว

    Nice! For the driver location service, when grid id is used as key, list of drivers used as value, it relies on driver client to remove driver from list if location changes grid, but if the driver client dies, we might get stale data i.e. return driver who might be off work. How to avoid this issue?

    • @ThinkSoftware
      @ThinkSoftware  2 ปีที่แล้ว +1

      We can have last heartbeat time and then if it is more than some threshold then we ignore and remove those driver entries.

  • @harrylin6282
    @harrylin6282 4 ปีที่แล้ว

    Hi Think Software, when the DB is partitioned by trip id and a driver wants to list the trip histories, it is slower since it needs to go to all the partitions to grab data. But I am a little bit confused why it is slower, we can create multiple threads and query each partition in parallel ( each thread can go to each partition ), so the query time should be the same as querying in one partition. is it correct?

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks for the comment. There will not be a single customer query that you can parallelize. Think about thousands of customers querying their trips. Now if for all the customers you will scatter/gather on all partitions then it will be way slower than different customers queries going to different partitions.

  • @cbest3678
    @cbest3678 4 ปีที่แล้ว

    @Think software.. thanks for this video. I have few doubts about this:
    - So after this flow passenger -> trip dispatcher -> Driver location service ->trip dispatcher will apply matching -> will send those matched driver notification about the matcher passenger. How driver will get this notification about the matched passenger for the trip request ?

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      This is discussed in the course. The short answer is that the Trip Dispatcher service will call dispatch servers in the Routing service that will look up the address of the host in the cache where the WebSocket connection is held for the particular driver and then forward the request to that proxy server. The proxy server will then forward the request via WebSocket.

    • @cbest3678
      @cbest3678 3 ปีที่แล้ว

      @@ThinkSoftware Thanks for the response.. wondering how all service will call dispatch service .. I mean dispatch is in middle layer..so you mentioned when any service done processing the request it will send that response to dispatch service and then the dispatch service will see the cache and will contact appropriate app server in routing server. why do we need extra dispatch service is it adding more complexity ? I mean how other service will call dispatch service to send the response back ?

  • @YegorChumakov
    @YegorChumakov 3 ปีที่แล้ว +1

    Great videos! Thank you. I've subscribed.

  • @taishancorp7720
    @taishancorp7720 3 ปีที่แล้ว

    To solve hot partition problem of grid key ranges, can we first shard by key ranges and then and have a mapping of partitions where this key range could land ? Majority of cases there will be single partition, but for hot partitions we may need to scan more than one.

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      what advantage we would get with this?

    • @taishancorp7720
      @taishancorp7720 3 ปีที่แล้ว

      @@ThinkSoftware I mean for hot partitions , we can have finer precision shards on those. Key ranges don't need to be uniformly sized.

  • @xinlongzhang1187
    @xinlongzhang1187 4 ปีที่แล้ว +1

    Very nice video!

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks. Appreciate your feedback.

  • @ChintanShah22
    @ChintanShah22 4 ปีที่แล้ว

    Excellent explanation, thank you ! Is it expected to come up with this level of detail in a 50 minute interview ? It seems if one is not already very familiar with a particular system, thinking and brainstorming itself will take up a majority of the time.

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      In an interview, you might not be able to cover everything and it is expected but the interviewer can direct you to go deeper in the internal design of one or two components. This also becomes very important at higher engineering levels like L6+ or architects. These interviews at these levels can very quickly validate whether a candidate is suitable for L6+/architects or not.

    • @gdthegreat
      @gdthegreat 4 ปีที่แล้ว

      Will be this advantage if interviewee is with 1 year ex? For sde1 or L3?

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว +2

      More knowledge is always useful.

  • @taishancorp7720
    @taishancorp7720 3 ปีที่แล้ว

    In 35:20, you mentioned that dispatch servers need to find app server in routing service before it can send data back . Couldn't those routing service's app servers just used a long-poll request on the dispatch server ? Why we need the additional complexity here ?

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว +1

      Thanks for the comment. The driver apps even if using long poll they are waiting for trip requests and that request will be parked on a machine. How would you know in which machine the request is parked?

    • @taishancorp7720
      @taishancorp7720 3 ปีที่แล้ว

      ​@@ThinkSoftware Do you mean a request parked on a Dispatch server machine ? I feel this may be an implementation detail. From the HTTP frameworks I've worked with, when a HTTP request is received (here it's Dispatch server), it has the connection info in an object already. If it wants to support long-polling, it can "pause" the response for the connection and then when a driver has acknowledged (data is ready) "resume" the connection with the data in HTTP response. From client's point of view (Routing Service), I don't think it knows that this was a long-polled query versus a normal slow query.

  • @letstalkcareerwithradhika9028
    @letstalkcareerwithradhika9028 3 ปีที่แล้ว

    Could you help me understand the function of dispatch server farm in the routing service? I know you said that it will send the info from trip and other service to proxy farm. Why don’t have only proxy servers?

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      This is discussed in more detail in the course

  • @amolnagotkar3037
    @amolnagotkar3037 2 ปีที่แล้ว +1

    thnx for this video

  • @RahulSingh-ln7bg
    @RahulSingh-ln7bg 4 ปีที่แล้ว

    Bro, R Tree is hierarchical. How will you apply that here in distributed system? Matching based on this would be inefficient? You need better data structure to do it in real-time.

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      I didn't discuss using R-trees for this reason. There are other online resources that discuss using R-trees, quad trees etc. but I have already pointed out issues in that approach in this video.

  • @Claudius025
    @Claudius025 4 ปีที่แล้ว

    why would you use a double for the trip price and not a float? I have the same question for the latitude and longitude.

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Float can be used as well. Thanks for the comment.

  • @gouravgarg6756
    @gouravgarg6756 3 ปีที่แล้ว

    Great explanation!. Thank you.

  • @KangJangkrik
    @KangJangkrik 4 ปีที่แล้ว

    I would like to do correction for map service. This is simplified steps of finding closest route:
    1. Calculate angle of attack (AoA) from A to B with atan(y/x) * (180/PI) formula
    2. Calculate each reachable points' AoA
    3. Compare it by substracting each of those point's AoA with A-B AoA, results angle delta
    4. Go forward to point with smallest angle delta
    5. If there is no B point on current closest points, then repeat step 1
    In reality, we have to deal geometric equations unless we use Google's Route API which is costs a lot of money. Cheers!

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว +1

      Thanks for the comment 🙂

    • @KangJangkrik
      @KangJangkrik 4 ปีที่แล้ว

      @@ThinkSoftware you're welcome! I found this channel useful, so +1 subscriber for you

  • @RaunakShrivastava
    @RaunakShrivastava 3 ปีที่แล้ว +1

    Why don't the passenger or Driver communicate to User and Map services via the Routing service?

    • @ThinkSoftware
      @ThinkSoftware  2 ปีที่แล้ว

      Both solutions are possible. I am using different approaches in different videos for curious minds like you to ponder upon :)

  • @uditagrawal6603
    @uditagrawal6603 3 ปีที่แล้ว

    Can we use geohash in place of quadtree?

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      Yes. This is discussed in the course as well.

  • @Bommutiwariplayz0170
    @Bommutiwariplayz0170 4 ปีที่แล้ว

    Can you make video on map service and especially how type ahead location suggestion works

  • @ameyasawant5663
    @ameyasawant5663 3 ปีที่แล้ว

    For trip service, can we use bloom filter for searching for trips?

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      Searching for which trips?

    • @ameyasawant5663
      @ameyasawant5663 3 ปีที่แล้ว

      @@ThinkSoftware I meant with approach 1 we can use bloom filter to figure out which partition might have the trip-id. Just a suggestion. Your videos are awsome!

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      Please elaborate how?

  • @franciscog3378
    @franciscog3378 3 ปีที่แล้ว

    very helpful. Thank you

  • @winning_aadict
    @winning_aadict 4 ปีที่แล้ว

    Sir, can you also add a transcript of the video for people who prefer reading. Also, this would enable us to look up words that are new to us and may not be clear in the audio for various reasons.

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks for the comment. I am going to soon launch a book which will have this and more.

  • @shikhil1
    @shikhil1 4 ปีที่แล้ว

    This is great video and explained the approach in depth, but you could improve/explain in depth about searching of location in 2D Grid using two decimal points and finding the near by grid id.

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks for the comment

    • @shikhil1
      @shikhil1 4 ปีที่แล้ว

      @@ThinkSoftware can you please answer my query also? how your are going for finding location in 2d grid using decimal point and near by location?

  • @rahulsharma5030
    @rahulsharma5030 3 ปีที่แล้ว

    1. in design of routing service, once you upgrade http to websocket, after that let us say a client request, then the load balancer will forward to the backend server with which connection is established?
    2. If answer to 1 is yes, then will load balancer store this mapping, like which client is mapped to which backend node for web socket connection?
    3. Also i dont understand like if we upgraded to websocket, then what is the current type of connection between client and load balancer?If it is TCP or HTTP?As you mentioned in video i think its TCP, then how can we have millions of TCP with load balancer ,it is not possible.

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      Load balancer is only used for initial connection setup when the client made the initial http request. Once the connection is established, the load balancer is out of the picture and client directly talk with webserver with which it has TCP connection

    • @rahulsharma5030
      @rahulsharma5030 3 ปีที่แล้ว

      @@ThinkSoftware thank you!!!:)

    • @rahulsharma5030
      @rahulsharma5030 3 ปีที่แล้ว

      @Think Software. If i understand correctly:
      Then client sends request to load balancer. Then load balancer will decide which server can handle request and then it sends server name back to client.Now client directly sends HTTP upgrade request to the server(that is returned by load balancer)?

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      No. This is not correct.

    • @rahulsharma5030
      @rahulsharma5030 3 ปีที่แล้ว

      @@ThinkSoftware then what will be flow you tole me previously? The other way can be the client sends upgrade request to load balancer and then load balancer sends same request to server ?and then server response send back to client via loadbalancer?
      I cant think of any other way.Can you please explain?

  • @rahulsharma5030
    @rahulsharma5030 3 ปีที่แล้ว

    let us say that first time customer logsin it gets some list of drivers near it:
    1. How will we maintain info that when the driver sends new lat long then we have to send to some customers which can track driver on their app?
    2. How do we update that info when a new driver comes in same area or current driver leaves?

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      The initial 4/5 drivers that Uber app shows in the map has nothing to do with the functionality of the Uber service. Uber can show random drivers there and for booking a trip it does not matter.

    • @rahulsharma5030
      @rahulsharma5030 3 ปีที่แล้ว

      @@ThinkSoftware it has to right?It tells me where the drivers on my screen moving,how much is their ETA to me.

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      Are you talking about initial 4/5 drivers or the driver that is assigned to your trip? For the first don't care but for the later you will poll the status periodically.

    • @rahulsharma5030
      @rahulsharma5030 3 ปีที่แล้ว

      @@ThinkSoftware initial.When i open app,it shows me movement of nearby drivers on app(prior to booking also)

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      There is no ETA associated with them. Servers can you an initial list of 4/5 drivers and then you can poll them periodically

  • @rmadhavmca1
    @rmadhavmca1 4 ปีที่แล้ว +1

    could you let me know the tree name again?

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Are you asking about Quad Tree, Kd-Tree etc?

  • @preetiduhan9937
    @preetiduhan9937 3 ปีที่แล้ว +2

    A big dislike to those who disliked the video.If you can't appreciate someone's effort atleast don't discourage them.

  • @mehranbehbahani3050
    @mehranbehbahani3050 2 ปีที่แล้ว +1

    There is absolutely no way that this level of complexity is expected in a 45-minute interview. This is a solution that a team of engineers would work on for a week at least. Correct me if I am wrong.

    • @ThinkSoftware
      @ThinkSoftware  2 ปีที่แล้ว +2

      For L6+ definitely. The interviewer though may not cover everything but could go in depth in one or two things.

    • @mehranbehbahani3050
      @mehranbehbahani3050 2 ปีที่แล้ว +3

      @@ThinkSoftware I see. You know what would be a killer video? Why don't you make a video that explains the complexity and details expected by the interviewer based on the level of the position? Like if you are interviewing for an l5, talk about this and that. If it's l6, talk about this and that in more detail. If l7, you need to have a deep understanding of topics A B and C for example.

  • @syedabdul1168
    @syedabdul1168 4 ปีที่แล้ว

    hi sir..u have mentioned that you like to mentor..please let me know is there anyway to contact you..

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Hi. You can contact me through the TH-cam website contact button. It will show my email address.

  • @rishabhdugar1180
    @rishabhdugar1180 3 ปีที่แล้ว +3

    rateUser me trip id nahi li ? kyun ?

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      Why do you think rating a user require trip ID?

    • @rishabhdugar1180
      @rishabhdugar1180 3 ปีที่แล้ว

      because rating system works per trip not driver to rider or vice versa just once.

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว

      It depends on the requirements. If the requirement is to derive the driver rating from trip rating and also store the trip rating then yes. Otherwise both driver and passenger can be provide option to rate each other at the end of the trip and may not require trip ID in that case.

    • @abhishekjain96
      @abhishekjain96 3 ปีที่แล้ว

      @@ThinkSoftware That would be correct from the UI perspective where user has to act as per what is available on the screen. But from API perspective, not taking a tripId in the request would mean that a user can rate any driver(and vice versa) even if they haven't done a trip together(which shouldn't be allowed). And also a user can rate the same driver again and again

  • @umer7936
    @umer7936 4 ปีที่แล้ว

    Can your company or someone who you may know develop a software like Uber for my rideshare company? please reach out to me.

  • @bostonlights2749
    @bostonlights2749 4 ปีที่แล้ว +1

    6th Sept 2020
    6,836 views
    1.87 K Subs

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว +1

      are you trying to track the progress of this video? :D

    • @bostonlights2749
      @bostonlights2749 4 ปีที่แล้ว

      :)
      Brilliant Content btw, I've seen other channels which just post directly the content from the company's engineering blog . This was unique and from scratch..
      Wish you all the best!

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks for the nice words 😊

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks

  • @theghostwhowalk
    @theghostwhowalk 4 ปีที่แล้ว

    24.14 sec Q how can we use Dijkstra to find shortest path.
    We can compute S to D total straight line distance. Say it comes to 20Km, then we will have a threshold say 1.5, all the nodes which are 30Km from S or D will be considered and fed into Dijkstra to find the optimal path.
    At 46.35: about hot partitions:
    We can pre identify hot spot areas (say downtown or sport arena) and split those areas into even smaller grid for use. That way we can make sure limit is not exceeded. Similarly if there is some mountain or forest then we can combine multiple grids into one.
    At 59: it depends on if writes of trips are more or reads for driver ids are more. I would think #1 is more frequent. So writes may need to be fast. So prefer option 1 as slower reads for driver may be OK.
    Most Welcome for any comment to improve/correct me 😀

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks for the comment 🙂

    • @theghostwhowalk
      @theghostwhowalk 4 ปีที่แล้ว

      Think Software thanks for all the great content! It’s precious. I can see years of experience and weeks of efforts to come with 1 hour content.
      Also If you think there is flaw in above, please do let me know. 😀

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว

      Thanks for the nice words🙂

  • @Funsiestype
    @Funsiestype 3 ปีที่แล้ว +3

    Get map service? Get ready to be put on spot on the need for this service.
    You need to speak about the choice you made for a single user service service instead of driver vs rider users. Need to speak about trade offs and reasoning.
    In the high level diagram, be ready to be picked on by users talking to user service directly. Depict Load balancer. Again, most interviewers are a-holes.
    Rambling begins at 37:10 without setting the context for how the quad tree is sharded based on which this homie is crapping on Grokking course. Even before that , I challenge the assertion that quadtree is not highly available. Be ready to defend why we can’t have replicas for quad tree. (If you say yes, be ready to explain how would you ensure replication works)
    For love of god, I can’t decipher what you are saying from 40:40 to 41:24. 🤷🏽‍♂️ (next time I suggest you write better notes for you to refer when making videos. I notice you did that here. Just make sure you formulate your thoughts.
    Your design has grid ID’s pre broken partitioned by grid ID either by hash or range. You don’t really provide an answer after all that crapping on the other author. Slick. If hash partitioned by grid ID , won’t you need to scatter gather for neighbors?
    I’m some areas like Nebraska where it’s less dense it’s Inefficient .
    If partitions are hot, you need a way to re partition . Consistent hashing (add new nodes) and replication could be options.
    49:18 yup.. nice work. See you sound coherent when you check notes. Good job champ.
    49:28 after all the crapping on other author you come back to the original problem and mention that we need to add more replicas in hot partitions. Ohhh Bhai… Maro mujhe. Mujhe maaro. Mazaak ho Gaya yaar.
    You need to talk about what happens if cache servers crap out. How would you build the quadtree again during crash stop faults? Be ready to talk about this.
    I went and read the grokking’s design. I don’t think you understood what that authors proposal was clearly.
    Good job on talking about partition by document And partition by term indexing.
    I am not saying anything negative about your videos, your videos are great. Just trying to question as you requested. Bye boo.

    • @Funsiestype
      @Funsiestype 3 ปีที่แล้ว +1

      On a serious note, anyone who is watching this, my two cents, prepare well and make sure your fundamentals are strong. You need to discuss trade offs in an interview. Nothing is perfect and there are no silver bullets for problems at scale.

    • @ThinkSoftware
      @ThinkSoftware  3 ปีที่แล้ว +2

      Thanks for the comments.

    • @curiousspirit3947
      @curiousspirit3947 3 ปีที่แล้ว +3

      ​@@Funsiestype i think it's clear what he says from 40:40 to 41:24. If you shard your distributed cache by driver id it makes write easy, but the reads need to ask every node (scatter gather), which is inefficient.
      "Your design has grid ID’s pre broken partitioned by grid ID either by hash or range. You don’t really provide an answer after all that crapping on the other author. Slick. If hash partitioned by grid ID , won’t you need to scatter gather for neighbors? "
      => the difference is that now he only does a scatter gather for neighbors IF not enough drivers are found in the grid of the user. and the scatter gather is only on the neighbors, not all the shards.

  • @kiaradas8355
    @kiaradas8355 2 ปีที่แล้ว

    Does anyone know his name? :)

  • @박종선-v8i
    @박종선-v8i 4 ปีที่แล้ว

    Please enable the CC button.

    • @ThinkSoftware
      @ThinkSoftware  4 ปีที่แล้ว +1

      Thanks for the comment. I didn't do anything related to CC. May be TH-cam has made some updates

    • @박종선-v8i
      @박종선-v8i 4 ปีที่แล้ว

      @@ThinkSoftware Yup. There would be some setting that enable CC button I think.

  • @josersleal
    @josersleal ปีที่แล้ว +1

    WHY DO YOU TALK in front of the board?