System design : Design Autocomplete or Typeahead Suggestions for Google search

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ก.พ. 2025

ความคิดเห็น • 287

  • @ahmdalgendi
    @ahmdalgendi ปีที่แล้ว +1

    Thank you tushar, you were my number 1 guy to learn data structures and algorithms, now after graduating and working i am getting back to you to learn about system design
    Again thanks a lot

  • @biswajitsingh8790
    @biswajitsingh8790 7 ปีที่แล้ว +149

    the legend is back 😍😍

    • @CALVlNXD
      @CALVlNXD 7 ปีที่แล้ว +10

      i forgot what love felt like

    • @mohanish35
      @mohanish35 3 ปีที่แล้ว +3

      And he disappeared again

  • @kylcross
    @kylcross 7 ปีที่แล้ว +46

    I’m loving these new videos. You’re obviously an amazing human being. I’m a little saddened that you’re into the nfl but I think I’ll survive. Thanks Tushar!

    • @tusharroy2525
      @tusharroy2525  7 ปีที่แล้ว +11

      you are not nfl fan?

    • @kylcross
      @kylcross 7 ปีที่แล้ว +4

      You replied! I'm NBA/WNBA all the way. I have a hard time with how dangerous football is. But in Seattle you don't have NBA so I can't blame you too much! Thanks again Tushar!

    • @ulanbayaliev6831
      @ulanbayaliev6831 5 ปีที่แล้ว +3

      @@tusharroy2525 In my mind you went to the weekend game and started making 'system design' video right after you are home. Respect for engineers like you!

  • @KanagaveluSugumar
    @KanagaveluSugumar 6 ปีที่แล้ว +4

    You rock as usual, I can see your improvements from solving a problem from single machine to a distributed machines techniques, like zookeeper, load balancer, aggregators... Keep teach us these we are also forwarding with you! Again thanks for the TIPS like CDN and caching future items before the requests.

  • @rssatvikreddy
    @rssatvikreddy 7 ปีที่แล้ว +1

    hey tushar! i'm new to system design and have an interview tomorrow. This video saved me man. Very clear and precise. Keep up the good work!!!

    • @tusharroy2525
      @tusharroy2525  7 ปีที่แล้ว

      good luck with the interview.

  • @ksankitha
    @ksankitha 7 ปีที่แล้ว +23

    your videos are much awaited! Please keep them coming

  • @rembautimes8808
    @rembautimes8808 3 ปีที่แล้ว

    I watched several autocomplete tutorials and this was the most conceptually complete in terms of coverage e.g using time as weight.

  • @arunabhhejib1136
    @arunabhhejib1136 7 ปีที่แล้ว +34

    Hi Tushar, Could you please make design videos on some of the common questions like Elevator, bookmyshow, social media website like Facebook, etc. Looking for things like how do we process the request, class diagrams and the type of database, etc used. Also, I wanted to thank you for all your videos, have watched many of them and learnt a lot.

    • @tusharroy2525
      @tusharroy2525  7 ปีที่แล้ว +10

      +Arunabh Hejib I will try.

  • @syedghufranhassan3858
    @syedghufranhassan3858 4 ปีที่แล้ว

    This is my first time that I am watching you and you explanined whole design in very easy and convenient way. Count me in as your fan :)

  • @hindifunnycalls8081
    @hindifunnycalls8081 7 ปีที่แล้ว

    very good video about Design Autocomplete or Typeahead Suggestions for Google search I love it. The teaching quality is awesome!

  • @arthamsreenivas8858
    @arthamsreenivas8858 5 ปีที่แล้ว +1

    This is great video which covers both pre-processing data flow as well search, great job Tushar bhai!

  • @mayurmahajan1503
    @mayurmahajan1503 4 ปีที่แล้ว +2

    I used to come to Ashish sir's class in Seattle and you used to teach us there. There used to be another person teaching along with you, who kinda looked like an angry lad lol. I learned a lot from those weekly sessions. Please make more videos. Thanks for the great work!

  • @konstantinchernov1477
    @konstantinchernov1477 6 ปีที่แล้ว +1

    Thanks for all the great work you do Tushar, it's always a pleasure to see your amazing yet humble videos.

  • @wenbobu8314
    @wenbobu8314 6 ปีที่แล้ว +18

    Nice video!
    One suggestion: can we also talk about single point failure and possible solutions for that? This topic seems to help a lot on making design decisions and weighing tradeoffs.

  • @木漏れ日-v9n
    @木漏れ日-v9n 6 ปีที่แล้ว +5

    Since you're sharding your distributed trie, you could also shard/resource intelligently, for example have a larger pool (and larger instances) for frequently accessed tries, since the access patterns definitely won't be uniform.

    • @andreitolkachev8295
      @andreitolkachev8295 3 ปีที่แล้ว +1

      can't believe this doesn't have more upvotes! It's what we do too at work

  • @shirishtekale549
    @shirishtekale549 6 ปีที่แล้ว +1

    You really are giving a precious knowledge to all of us, i am greatly grateful to you on all your help and time.

  • @HXYZZZ
    @HXYZZZ 4 ปีที่แล้ว

    Excellent video. Thanks Tushar for sharing

  • @DigitalSageSocial
    @DigitalSageSocial 7 ปีที่แล้ว +23

    when you say trie nodes, are these created in the application layer or the back-end(lets say a relational database)? Can we store data in a trie data structure in the database?

    • @HarshitKumar-dj4ev
      @HarshitKumar-dj4ev 6 หลายเดือนก่อน

      u got an answer for the above?

    • @16Kailash
      @16Kailash 5 หลายเดือนก่อน

      it should be the backend which helps you to store and access the data in some specific structure way, using some hashing technique to access those nodes(address) where data is stored

    • @Yes-no1bl
      @Yes-no1bl หลายเดือนก่อน

      you can store a trie in MongoDB with its support for Tree based structures

  • @EbrimaTunkara
    @EbrimaTunkara 6 ปีที่แล้ว +5

    I think for distributed storage and cache, Redis could be useful in which the trie data can be partition into multiple shard redis servers . No need for zookeeper in maintaining the trie metadata which is bit a complicated and creates some overhead

  • @dibyajyoti_ghosal
    @dibyajyoti_ghosal 7 หลายเดือนก่อน

    Why do you not make any more videos? These videos are literally gems! I wish you had made more videos on the system design!

  • @divyeshgaur
    @divyeshgaur 6 ปีที่แล้ว +1

    Hi Tushar, I want to say thank you for all your beautiful work. Keep them coming.

  • @hlibpylypets1333
    @hlibpylypets1333 3 ปีที่แล้ว

    As always, you dig into the details with your explanation. Very helpful

  • @bishwajitpurkaystha3075
    @bishwajitpurkaystha3075 5 ปีที่แล้ว +13

    Hi Tushar dada,
    I liked your solution.
    One thing I wanted to mention is that the cache and the zookeeper are the single point of failures in the proposed design. For caches, we could cluster Redis (instead of memcached), and make them highly available by introducing redundancy.
    We could add more passive zookeepers as well.
    Let me know how you think of it.
    - Bishwa

  • @balakrishnan3725
    @balakrishnan3725 4 ปีที่แล้ว

    Nice video. Giving some tips to make the design better is even better. Thank you Tushar!

  • @anssha2643
    @anssha2643 7 ปีที่แล้ว

    Its really great to see you back with the design questions.

  • @benleung6331
    @benleung6331 6 ปีที่แล้ว +1

    A couple questions I have
    1) From what I understand N1, N2, ...are replicated server nodes to improve the scalability of the system and T1, T2, T3 are replicated trie data structure to implement the auto-complete feature, but do they occupy in two separate sets of physical machines or could each of the server node has its own of trie data structure. What is the pro and con of two different set up?
    2) You mention about optimizing the system by leveraging CDN, where would the CDN sit in the system diagram? Between the client machine and the load balancer or between the load balancer and the N1, N2..server nodes. If it is the latter, how is the CDN different from the server nodes themselves?

  • @indiansoftwareengineer4899
    @indiansoftwareengineer4899 6 ปีที่แล้ว +1

    thanks, tushar sir. you are doing great work. please give us more knowledge.

  • @anmoljawa8367
    @anmoljawa8367 4 ปีที่แล้ว +1

    More more more ... more videos on system design plzzzzzz !!

  • @mohamadazmath
    @mohamadazmath 7 ปีที่แล้ว +1

    watched your tiny url and this video. thanks for sharing knowledge. the kind of application and companies used elastic search. request goes to LB, then to app server, server pull data from solr/lucene/elasticsearch/sphinx. this worked for us in terms of scale, latency, availability. data get refreshed every few hours based on the need. internally all these works based on inverted index.

  • @InfogramTV
    @InfogramTV 2 ปีที่แล้ว

    Hi Tushar,
    Very neat explanation.
    Thanks.

  • @ST-nm7pr
    @ST-nm7pr 3 ปีที่แล้ว

    dont let this great content distracted you from the fact that settle should have ran the ball

  • @revanthkumar1183
    @revanthkumar1183 6 ปีที่แล้ว +3

    I would use a debounce in the search textbox handler and make calls only after a certain timeout. This network bandwidth and a lot of server cpu cycles.

  • @shivroh7678
    @shivroh7678 7 ปีที่แล้ว +1

    Awesome, in the next design videos please provide more details which no-sql db or cache to choose in this condition.
    I mean which cache/db could be beneficial to have in this scenario over any-other.

    • @tusharroy2525
      @tusharroy2525  7 ปีที่แล้ว

      +Shivam Rohilla feedback taken.

  • @kannadaHuduga12
    @kannadaHuduga12 4 ปีที่แล้ว

    Thanks Tushar. Appreciate your video.
    1. Rationale behind choosing the technologies is missing ex: Zookeeper (also list alternatives)
    2. How did you arrive that the data retrieval needs caching ? Also how is cache invalidated when trie is built is missing.
    3. Instead of T1,t2,t3, there could be one more load balancer ?
    4. Applier complexity needs to be explained better. Its not simple to query all the words from a data store and build a trie.
    5. How do you replace the trie on live nodes needs better explaination.

  • @pradiplamsal1403
    @pradiplamsal1403 3 ปีที่แล้ว

    Nice, simple explanation. Thank you very much.

  • @mehdikhosravi
    @mehdikhosravi 6 ปีที่แล้ว +1

    Truly amazing content. Watching your videos is significantly adding to my knowledge.

  • @rahilshaikh5181
    @rahilshaikh5181 6 ปีที่แล้ว +1

    Crisp and on point videos. Thanks for delivering high quality content :)

  • @priyanksaxena2196
    @priyanksaxena2196 7 ปีที่แล้ว +1

    This is awesome!! Thanks Tushhar for these system designing videos. I was waiting for them for long time. Are you from Seattle? Your shirt says that. 😊 I was in Redmond for 1.5 yrs. now moved to Portland. Preparing for interviews in Microsoft and Amazon. Want to come back there. So, thank you again for these wonderful tutorials.

  • @hurdurlemur1615
    @hurdurlemur1615 4 ปีที่แล้ว +1

    love this my dude
    thanks for your videos

  • @myhinditech8723
    @myhinditech8723 7 ปีที่แล้ว +2

    Very helpful videos and I love it and I appreciate it on the TH-cam!

  • @rahulagarwal841
    @rahulagarwal841 6 ปีที่แล้ว +11

    I was asked in Microsoft Design round: Design a system to allocate memory for queues. Operations like create, enqueue , dequeue should happen in O(1). Memory efficiency should over 90%. Similar question was asked in adobe. Please make a video on memory allocation. I could not find much content about memory allocation on ytube.

    • @NachiketJoshiTheBloodMage
      @NachiketJoshiTheBloodMage 6 ปีที่แล้ว

      hello Rahul,
      can you please tell me what other questions did they ask to you?
      What DS question and how did you approach for this questiona too?

    • @skootergofast123
      @skootergofast123 6 ปีที่แล้ว

      Linked List Implementation should do for that

    • @skootergofast123
      @skootergofast123 6 ปีที่แล้ว +1

      keeping two pointers head and tail....enqueue at head and dequeue at tail

    • @vector4067
      @vector4067 5 ปีที่แล้ว +1

      The proper way to handle this question is to ask more questions before you approach the problem. Are the required memory same size? e.g 512 k per query?
      Or there are different size queries?
      The basic under laying approach is not to allocate and release memory randomly.
      Memory allocation can be expensive. Bottom level data structure should be memory pool with fixed size chunk memory which you allocate at initialization. and chop up based on the query size. You manage the the allocate and release manually by maintain a pointer to the valid memory address.
      If there are 3 types of queries, you should have 3 memory pools chopping up with 3 different sizes.
      On the upper level logic, yes, you do round robin with a queue.

  • @nikkis8102
    @nikkis8102 7 ปีที่แล้ว

    You are back!!! YAY!!! Thank you!!!

  • @rashidaidun4880
    @rashidaidun4880 7 ปีที่แล้ว +5

    Great video using Trie structures. Thank you.
    In a real solution for companies using SOLR or Elastic Search, would they use the n-gram analyzer or pre-compute the best tries and use that as part of the index (instead of a custom trie structure and plumbing) ?

    • @tusharroy2525
      @tusharroy2525  7 ปีที่แล้ว +3

      You are right. Probably they are using SOLR or something like SOLR.

  • @ashwinipatil9590
    @ashwinipatil9590 5 ปีที่แล้ว +9

    Can you elaborate on why did you keep N1 , N2 different than T1, T2 etc? Are you considering Trie as a datastorage?

    • @IshantSharma29
      @IshantSharma29 4 ปีที่แล้ว

      Same question. If it's a data storage - what kind of data storage ?

    • @pandeybox2988
      @pandeybox2988 3 ปีที่แล้ว

      Those are application servers.

    • @gabrielxu2087
      @gabrielxu2087 3 ปีที่แล้ว

      Same question. I feel N1, N2, N3 are servers; while T1, T2, T3 are database storing the Trie?

    • @Reyna-yj6vo
      @Reyna-yj6vo 7 หลายเดือนก่อน

      @@gabrielxu2087 i think im late answering the question, Yes. they are the databases which will have serialized Trie Structure

  • @mhimanshu91
    @mhimanshu91 6 ปีที่แล้ว

    Great! Out of the box! Thanks Tushar 🤗

  • @sandeephalder7536
    @sandeephalder7536 6 ปีที่แล้ว

    Tushar rocks!!!
    The implementation with RNN will be smarter.

  • @aeon327
    @aeon327 6 ปีที่แล้ว

    Thank you for taking your time to break these things down. This is helping me a lot to prepare for design interviews and helping me learn how to design scalable systems.

  • @WS-lv4kk
    @WS-lv4kk 5 ปีที่แล้ว +3

    17:51 Why build a trie structure in the applier node? Why not just submit a list of phrases paired with top-k auto-completed phrases into the nodes containing the trie structure?

    • @skyfordTj
      @skyfordTj 5 ปีที่แล้ว

      I guess you could do in this way, but it seems would take of resource from the Tries nodes that are serving user requests.

  • @pathakshardul
    @pathakshardul 4 ปีที่แล้ว

    Thanks for the video! Question I have is what are you using for persistent storage?

  • @NachiketJoshiTheBloodMage
    @NachiketJoshiTheBloodMage 6 ปีที่แล้ว +3

    they asked me this question in one of the interviews,
    the emphasis was on,
    1. how will you store messages? data structure? I said tree of hashmaps.
    2. how will you manage scrolling up through the messages?
    3. how will you decide how much will be stored in users device and then on the cloud (if you want cloud)?
    4. how will you search in the past like messages from 2 years ago?
    all these questions are again unanswered after watching this video...
    can you please make a video or give a most correct answer?
    Thank you so much for all your videos...you have helped a lot of people.

    • @coolswatish
      @coolswatish 6 ปีที่แล้ว +1

      Shouldn't this be a comment on this video: th-cam.com/video/zKPNUMkwOJE/w-d-xo.html ?

    • @NachiketJoshiTheBloodMage
      @NachiketJoshiTheBloodMage 6 ปีที่แล้ว

      Yep, I was browsing many videos at a time and was too lazy to change the comment. I was also sure that I am not going to get any answer so did not bother changing at all...

  • @wondershow123
    @wondershow123 3 ปีที่แล้ว +2

    When you are explaining the "Request Flow", Google just had one big trie which could be fit into one machine (T1 T2 T3). And you used Zookeeper to store the mapping ("A-$"=>T1,T2,T3). Then you mentioned as the trie size grows, you need T4, T5, T6, then the mapping of Zookeeper changed to ("A-L"=>T1, T2, T3, "M-$"=>T4,T5,T6). My question is, how will the data of "M-$" on T1, T2, T3 before the expansion be migrated to T4, T5, T6? Thank you!

    • @pramalvi
      @pramalvi 2 ปีที่แล้ว

      I guess that should be done by applier as part of data collection flow.

  • @nitishkumar-yi9yi
    @nitishkumar-yi9yi 7 ปีที่แล้ว +1

    Hi Tushar. It's a great video you have come up with. A big thumbs up for this. I hope that the splitting of the data in zookeeper is done with help of indexing techniques like SOLR. It would be really great if you can do a video on indexing or suggest some good links for the same.

  • @kevinhan9307
    @kevinhan9307 7 ปีที่แล้ว +10

    Hi Tushar, can you please do a video on "Median of 2 Sorted Arrays" from Leetcode, where they are of lengths m and n? This is the hardest problem on the leetcode portal

    • @tusharroy2525
      @tusharroy2525  7 ปีที่แล้ว +9

      Soon.

    • @wulymammoth
      @wulymammoth 6 ปีที่แล้ว

      Did you ever figure this problem out? It took me a while, but the Geeks for Geeks piece here helped me out on my path: www.geeksforgeeks.org/median-two-sorted-arrays-different-sizes-ologminn-m/

    • @anilanil166
      @anilanil166 5 ปีที่แล้ว

      th-cam.com/video/LPFhl65R7ww/w-d-xo.html

  • @umangmalhotra1222
    @umangmalhotra1222 6 ปีที่แล้ว

    Hello Mr. Tushar. This is a satisfying video. Thanks.

  • @Gamabunta24345
    @Gamabunta24345 4 ปีที่แล้ว

    This is a really good solution

  • @MiztaHall
    @MiztaHall 3 ปีที่แล้ว +1

    I'm working at a autocomplete feature. But I'm here mainly for the jersey. GO HAWKS!

  • @NuncNuncNuncNunc
    @NuncNuncNuncNunc 7 ปีที่แล้ว +2

    Tries are covered, but how many real world systems are backed by tries compared to databases whether the be relational, NoSQL, or dedicated search like Solr or Elasticsearch? As an example, suppose you had a set of stories, one with the phrase, "Tushar went for a stroll" and another with "Tushar pushed the stroller." Both "tu" and "str" should match, but only if the stories start with those phrases will even "tu" match. The one narrow but practical use I see is if you want to match phrases that others have searched for, e.g. it's a nice day so everyone wants to see how you spent the day.

    • @tusharroy2525
      @tusharroy2525  7 ปีที่แล้ว +2

      you have valid point. If some company was building it they would probably use solr but remember this is an interview. And yes I have seen trie used in real world usecases.

  • @ThakurArjun247
    @ThakurArjun247 6 ปีที่แล้ว +1

    Cassandra stores timestamp for all the columns/rows (during insert/update) so basically we don't need to store the date time explicitly

  • @Avinashkumar-fo2bu
    @Avinashkumar-fo2bu 4 ปีที่แล้ว

    Hi Tushar bhaiya please upload all your data structure and algorithm video with test cases in hindi also.this will be useful for us for competitive examinations and interview

  • @QDem19
    @QDem19 4 ปีที่แล้ว +2

    Hey Tushar - Thank you for all you do!
    In this scenario, I understand how you are building the Trie to autocompete a word based on the prefix and top k words for a given prefix. But how are we adding words to a search phrase, i.e. with "ba", you return "bath" and "ball", but once "bath" is selected, how do you return the phrases "bath time" and say "bath towel"?
    Thank you.

    • @rahuljoshi456
      @rahuljoshi456 3 ปีที่แล้ว

      Hey, I think its out of scope of this problem however if you want to do this then you have to consider "bath towel" as a single word.

    • @nvntsingh
      @nvntsingh 2 ปีที่แล้ว

      Add space as a character and then the same solution will work

  • @Hesoyam1121
    @Hesoyam1121 7 ปีที่แล้ว +1

    Thanks, very informative. I just understood simple request flow and server distributed caching. 😊

  • @lchen7655
    @lchen7655 5 ปีที่แล้ว +6

    Thanks for the great video, Tushar. Could you talk a little about how phrase is implemented in this Trie module?
    Because in real world, ppl often search phrases like "batman movies", "batman and robin", etc instead of just one word "batman". When "batman" is searched in Trie, 'n' will be the leaf node and Trie search is terminated there. How is phrase handled by your design? Thanks.

    • @HXYZZZ
      @HXYZZZ 4 ปีที่แล้ว

      I had the similar question instantly after listening to this. But immediately realized Trie is built based on all searches done previously. Hence all trie paths are based on complete search sentence. The works stored at each char in Tree is based on weights given to subsequent words in the sentence user is searching for. Share your thoughts if that makes sense and kind of addresses your question.

  • @wenlaizhang9017
    @wenlaizhang9017 5 ปีที่แล้ว +2

    For the zookeeper, we can still use load balancer here, why we choose zookeeper, it has some benefits ?

  • @Ttj963
    @Ttj963 4 ปีที่แล้ว +1

    Can you explain how splitting of ranges works in little bit more depth , like how we are move range between nodes? Also how applier sitting on a diff node can move the in-memory Trie on to another node , say T1? .

  • @mvsmurty
    @mvsmurty 7 ปีที่แล้ว +1

    Very nice explanation Tushar. I just would like to understand how we can store trie in Database in Redis? or do we maintain the entire trie in memory by with the help of typical trie operations whenever there is new entry to be added to the trie? Is there any other blog or suggestion available? Hope to hear from you.

  • @nishitgoyal970
    @nishitgoyal970 5 ปีที่แล้ว

    Great video Tushar..Can you put some video which is covering authentication concepts too..

  • @yangmyfly
    @yangmyfly 5 ปีที่แล้ว

    100% better than
    Grokking the System Design Interview

  • @siangjin3456
    @siangjin3456 6 ปีที่แล้ว

    For the data collection session, I think it would be good to use Hadoop to do some batch processing, then we can have the words with frequencies, the build the trie tree database from there.

  • @sirajmemon8801
    @sirajmemon8801 ปีที่แล้ว

    For the aggregation, I would probably go ahead with something like HBase and run map reduce as it is a typical "word (phrase) count" problem and we can store the aggregated data into a relational DB or even in a NoSQL one like Cassandra with a TTL.

  • @CALVlNXD
    @CALVlNXD 7 ปีที่แล้ว +7

    i knew my man Tushar was a seahawks fan

  • @GDBthe1st
    @GDBthe1st 7 ปีที่แล้ว

    Thanks this is a great video and very nice ideas

  • @sudhanshurajvaidya1587
    @sudhanshurajvaidya1587 6 ปีที่แล้ว +4

    Would zookeeper be a bottleneck in this design since every query and applier has to go through and, given, its not good for high throughput applications? How about using a KV store like Redis instead of trie which provides prefix search of keys?

    • @khaledkaram1613
      @khaledkaram1613 6 ปีที่แล้ว

      etcd can be the same and it's distributed

    • @Yes-no1bl
      @Yes-no1bl หลายเดือนก่อน

      Redis can be used in case you need real time autocomplete - it has got support for sorted sets.

  • @naikkunjan
    @naikkunjan 6 ปีที่แล้ว

    Great insights and explanation

  • @pingqiu7318
    @pingqiu7318 7 ปีที่แล้ว

    Hi, Tushar, your videos are awesome. Would you make more on system design? Thanks

  • @ashutosh6060
    @ashutosh6060 5 ปีที่แล้ว

    I have to say your content is thorough and deep. I just don't like the way you spell words with 'T' :D but i can bear with that.

  • @michaelhuston
    @michaelhuston 7 ปีที่แล้ว

    Great video! Could you share a little bit more about how an Applier batch job takes the phrases and weights in its range and turns them into the top K terms for each phrase? Thanks!

  • @gabokings260388
    @gabokings260388 7 ปีที่แล้ว

    Good job 👍🏼, It would be great if you made videos about ML

  • @wij8044
    @wij8044 ปีที่แล้ว

    But how do you know the size of your trie? I fear this solution was too easy to assume we’re needing to implement sharding before considering whether the data can feasibly be incorporated in-memory.

  • @prashantmehta5217
    @prashantmehta5217 7 ปีที่แล้ว +1

    How the trie is implemented in the discussed architechture. Is it in memory?? or some Database. which database supports trie?

  • @aaronkatz9411
    @aaronkatz9411 4 ปีที่แล้ว

    Thanks, great explanation

  • @gdthangakumar
    @gdthangakumar 7 ปีที่แล้ว

    Thanks Tushar. You are videos helps a lot :)

  • @ppsanyal
    @ppsanyal 6 ปีที่แล้ว

    Tushar could you please post a video on the following design question " Design facebook live streaming". I have mostly found ideas and half-baked solutions elsewhere. Your videos are pretty comprehensive about how things should be explained in an interview so I would like to know your perspective about approaching this problem.

    • @MosheSiegel
      @MosheSiegel 6 ปีที่แล้ว

      I am just a beginner at system design, but I imagine the following points can be used for Facebook live streaming. 1) HTTPS so all livestreamed-data is secure. 2) Blob storage container such as AWS S3 to store video 3) CDN to optimize others watching the livestream. The CDN would need an algorithm to analyze the locations of the live-streamers audience. The algorithm would take into account number of audience members, their locations, the locations of CDN servers etc. to decide which CDN locations should get a copy. The algorithm could save its results in a database and use that data in the future instead of running the algorithm for each livestreamed video. When the CDN algorithm reruns itself to get new data will have to be based on another algorithm. 4) Load balancers, availability, consistency, etc. could be mentioned. If someone is watching the livestream on the other side of the globe, is it okay for their 'live' video to have a delay of several seconds? The answer to that depends on the business requirements. 5) Something needs to maintain the constant connection between the livestreamer's camera and the host server while the video is broadcasted. Web Sockets would be good for this. 6) The audience would need some way of receiving the video from the Blob storage container. I believe Web Sockets between the client and server would allow the audience to watch the video via a constant connection. But I do not know enough to say whether or not there are better solutions for this than web sockets.

  • @manojkumaranbalagan854
    @manojkumaranbalagan854 7 ปีที่แล้ว +1

    Nice video!! But how these Trie nodes are stored...whether they are stored in relational db or nosql db ?

    • @tusharroy2525
      @tusharroy2525  7 ปีที่แล้ว +1

      Since my source of truth is in Cassandra I can keep trie inmemory as I have 3 replicas.

    • @manojkumaranbalagan854
      @manojkumaranbalagan854 7 ปีที่แล้ว

      Storing that big data inmemory will not add more cost? And also are you talking about these inmemory databases be stored in CDN's ?

    • @tusharroy2525
      @tusharroy2525  7 ปีที่แล้ว +2

      Good questions
      1) yes I would want to store trie in memory to be super fast. Remember we can split trie as much as we want based on the algorithm I described in the video. So if trie is bigger than memory then split it. Memory these days are 200GB easy so I m not worried too much about keeping things in memory. But you are right in worst case trie could be serialized into hard drive if needed. I would store them as simple key value pair
      e.g b -> {bat, bit}, ba -> {bath, bat} and so on. Basically keeping data for each prefix combination.
      2) When I m keeping data in CDN I m not storing trie. Instead I m storing what users are asking for.
      e.g if user asked for "ba" I would store "ba" -> {bat, bath} in the CDN so that next prefix query for ba returns those autocomplete values.

    • @manojkumaranbalagan854
      @manojkumaranbalagan854 7 ปีที่แล้ว

      Thanks for the replies :)

    • @NandanKumar-if3fn
      @NandanKumar-if3fn 7 ปีที่แล้ว

      Tushar, once you have the data in cdn or in memcache, you would have to refresh that one with the latest phrases that have higher weight. If we don't refresh, it would end up showing the old results. How would that work? We might need background process to refresh that. Any thoughts around that?

  • @GRV1198
    @GRV1198 7 ปีที่แล้ว

    Superb ! thanks a ton Tushar. I had some dumb doubts on trie part. Where is the trie data structure defined and stored . Would that be in cassandra itself ?

    • @tusharroy2525
      @tusharroy2525  7 ปีที่แล้ว +1

      +gaurav srivastava Cassandra will store data in the format I described. Applier will take that data and create trie locally and push them to trie nodes. Trie can stay inmemory in trie nodes as they are not source of truth.

  • @surabhigoyal9504
    @surabhigoyal9504 5 ปีที่แล้ว

    Amazing Video!

  • @occo5877
    @occo5877 4 ปีที่แล้ว

    This was super useful thanks

  • @saikishorer
    @saikishorer 6 ปีที่แล้ว

    Nice video Tushar. Really helped a lot to me.
    I have a small question,Is Trie structure(T1,T2,T3....) will be stored in the DB as a pieces or replicas of original. Eg: Zoo Keepr has a distributed mapping like T1 ->a to f.... So, T1 DB holds only the combinations starts from a to f?

  • @sanke7982
    @sanke7982 7 ปีที่แล้ว

    Tushar, may we have videos on Saga transaction pattern, session storage/caching as well as api key storage/caching? Anyways great work as usual.

  • @NeerajDhariwal
    @NeerajDhariwal 5 ปีที่แล้ว

    Hi Tushar, You said we would split the trie across multiple machines. Would each of of these subparts reside in the main memory of these machines? In this case, wouldn't it require too much main memory for the scale of Google?

  • @vardaanbajaj3181
    @vardaanbajaj3181 5 ปีที่แล้ว

    Hi, thanks for the video. Can you please tell why we need to use distributed cache like redis in this case since the tries will also be in-memory only?

  • @md.abdullahal-alamin8059
    @md.abdullahal-alamin8059 6 ปีที่แล้ว

    Brilliant as always. thanks

  • @Chandan-io3jm
    @Chandan-io3jm 7 ปีที่แล้ว

    keep uploading videos they are great

  • @bishwajitpurkaystha3075
    @bishwajitpurkaystha3075 5 ปีที่แล้ว +1

    One more thing is missing in your lecture.
    You should categorize how you actually store a trie in the DB. What DB is used? A proper justification for the learners should have been followed.

    • @tsalaka
      @tsalaka 5 ปีที่แล้ว

      Bishwajit Purkaystha I think trie should be stored in memory. We have an Applier who can restore a trie in memory when all replicas died

  • @rahulsharma5030
    @rahulsharma5030 3 ปีที่แล้ว

    where will you store trie? It should be stored in memory data store i think as we want our results to be fast,and if thats case, we dont need any more in memory cache. Since we will store only 1 or two year data in trie so in memory store can accommodate it nicely

  • @infraops5184
    @infraops5184 6 ปีที่แล้ว

    Great Video

  • @janarthananrajendran8528
    @janarthananrajendran8528 7 ปีที่แล้ว +2

    Thanks for the nice explanation. One question. wouldn't dumping the entire data into DB cause performance issues for both read and write ? Just a thought maybe acquire locks at node level that needs to be updated and update only those and the relevant parents to that node (top K items) ?

  • @gleventhal
    @gleventhal 6 ปีที่แล้ว

    I probably need to rewatch, but where does he address the client side communication that results in updates with each new keystroke? I assume it's AJAX or Websockets, but where did he discuss that?

  • @p111calcutta1
    @p111calcutta1 7 ปีที่แล้ว +1

    Thanks Tushar for nice video. Had you thought about OOPS design questions like Design an elevator, chess game ?. Any suggestions for that

    • @tusharroy2525
      @tusharroy2525  7 ปีที่แล้ว +1

      +Piyush Tiwari Honestly not my kind of questions.

  • @iLikestolaugh
    @iLikestolaugh 5 ปีที่แล้ว

    Thank you so much for such a clear explanation!

  • @gauravkushwaha2009
    @gauravkushwaha2009 6 ปีที่แล้ว

    Amazing explaination.