6: Typeahead Suggestion + Google Search Bar | Systems Design Interview Questions With Ex-Google SWE

แชร์
ฝัง

ความคิดเห็น • 30

  • @Xiaoxixi939
    @Xiaoxixi939 20 ชั่วโมงที่ผ่านมา +1

    This is the best video I've seen explaining type ahead, thanks a lot for making great content!

  • @danielginovker1984
    @danielginovker1984 4 หลายเดือนก่อน +8

    My girlfriend no joke asked if you were gonna steal me from her because of how much I talk about your channel. Keep it up

  • @pavankhubani4191
    @pavankhubani4191 5 หลายเดือนก่อน +1

    This is a great video, thanks for taking efforts to explain everything in such depth.

  • @aneeshbhansali
    @aneeshbhansali 5 หลายเดือนก่อน +6

    At 12:55 you should push onto the heap before you pop, otherwise if the value you push is smaller than the value you are popping, your result will be incorrect.

  • @almirdavletov535
    @almirdavletov535 2 หลายเดือนก่อน +1

    Great vid as always! Would be cool to see how sentence suggestions are working, how words are connected to each other etc.

  • @i_want_youtube_anonymity7099
    @i_want_youtube_anonymity7099 27 วันที่ผ่านมา +1

    Wow, a video where the capacity estimates actually matter. Really nice to see you compare these to memory amounts of client / servers.

    • @jordanhasnolife5163
      @jordanhasnolife5163  27 วันที่ผ่านมา +2

      I perform capacity estimates every weekend when figuring out how much late night food I should eat to not explode the next morning

  • @LawZist
    @LawZist 5 หลายเดือนก่อน +1

    Great video! Keep with the good job i really enjoyed it

  • @schan263
    @schan263 5 หลายเดือนก่อน +1

    Thanks for making the video. It was interesting and helpful.

  • @meenalgoyal8933
    @meenalgoyal8933 3 หลายเดือนก่อน +1

    Another great video! Thanks for making it.
    I am a bit confused about the update path.
    1. It looks like we are creating new trie from the logs (containing search term with freq in kafka) instead of updating the existing trie. Lets say we want to account for last few days of search, then to build the trie shouldn't we feed the copy of existing trie as well (along with recent search logs) to hdfs to calculate top suggestions for each prefix?
    2. Instead of app server just getting data about top suggestion for each prefix from hdfs, is it possible for us to compute the trie as well offline and then load it in server? If yes, can you also please suggest tools to use for computing trie offline and loading from offline to server memory ?

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 หลายเดือนก่อน

      1) HDFS already has the last few days of data available. It doesn't have to delete that just because we computed another trie from it. You wouldn't have to send the existing trie.
      2) Considering that you can't really represent a trie in a text file like that, I'm not quite sure. I guess in theory, you could compute it on one server from the hdfs data, then serialize it to JSON or something, then send it out to all of the other servers. But then even, you're just building a trie from the JSON rather than the frequencies which frankly has a similar time complexity.

  • @robinpan2245
    @robinpan2245 4 หลายเดือนก่อน +2

    Your jokes make the grind slightly less terrible :))

  • @ivanaslamov
    @ivanaslamov หลายเดือนก่อน +1

    You should probably replace Flink with Spark Streaming since you already planning on using Spark downstream.

    • @jordanhasnolife5163
      @jordanhasnolife5163  หลายเดือนก่อน

      Yeah in reality I think that's reasonable, but for the sake of the systems design interview I like to be idealistic.

  • @levimatheri7682
    @levimatheri7682 หลายเดือนก่อน +1

    Top tier videos! Can you do Design a Parking Lot?

  • @yrfvnihfcvhjikfjn
    @yrfvnihfcvhjikfjn 5 หลายเดือนก่อน +2

    Merry Christmas 🎄🎁

  • @tymurmaryokhin9767
    @tymurmaryokhin9767 หลายเดือนก่อน +2

    Why not something like Elasticsearch for prefix searching with the same range based partitioning?

    • @jordanhasnolife5163
      @jordanhasnolife5163  หลายเดือนก่อน

      It's going to be slower: that's on disk, and now I have to perform a binary search for my word rather than just traversing down a trie

    • @firezdog
      @firezdog 8 วันที่ผ่านมา

      @@jordanhasnolife5163 I'm not sure I agree. Elasticsearch supports term and phrase suggestions as special use cases, and it gives users control over general relevance features. I work on a search team, and our design for this feature is centered around an Elasticsearch cluster w/ special typeahead indices, an ETL from BQ to that cluster, and a service to query the cluster. I don't know if our design is the industry standard, and it depends on exactly what you're trying to do, but I think this is definitely one of the ES use cases. (Typeahead isn't just about popularity either, there could be many different heuristics you need to use to rate which suggestions are the best. There may be machine learning models involved to help determine that as well.)

  • @rahullingala7311
    @rahullingala7311 4 หลายเดือนก่อน +1

    What are your thoughts on using GraphDBs like Neo4j to store the trie?

    • @jordanhasnolife5163
      @jordanhasnolife5163  4 หลายเดือนก่อน +1

      I think that if we can avoid storing this guy on disk, we should! It's a pretty inefficient operation to jump from random spot to random spot on disk.

  • @ariali2067
    @ariali2067 3 หลายเดือนก่อน +1

    Curious why we need stream processing (Kafka -> Flink -> HDFS) to upload newly entered work to HDFS? Why cannot' we upload them to HDFS directly?

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 หลายเดือนก่อน +1

      hdfs stores full files, not an individual string of text. We need to aggregate the queries first

    • @NakulMatta
      @NakulMatta 2 หลายเดือนก่อน

      @@jordanhasnolife5163 Is it better if we use spark streaming consumer instead of flink here? We can so batching using this and write a batch to HDFS

  • @aritchanda205
    @aritchanda205 3 หลายเดือนก่อน +1

    @ 21:08 short is 2bytes so its 16bits not 8 and hence we have like 65k terms ( i have to point this minor insignificant mistake or i cannot go to bed since I'm a internet police)