How Google searches one document among Billions of documents quickly?

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 พ.ย. 2024

ความคิดเห็น • 142

  • @csie123
    @csie123 4 ปีที่แล้ว +28

    11:00
    18:17 noise removal
    26:35 indexMap[keyWord]=[ { 哪個sentenseIdx , 字串中出現的pos} , ]
    31:20 Conjunction=And / DisConjunction OR 31:25 remove duplicate
    34:35 Union三連字要看順序

  • @jheelparikh5365
    @jheelparikh5365 4 ปีที่แล้ว +6

    You are excellent at explaining System Design. Watching your videos makes me motivated to learn in depth and my learning experience interesting. There is so much to learn from your videos also shows that you enjoy teaching and sharing your immense knowledge.

  • @Akashkumar-md6rg
    @Akashkumar-md6rg 4 ปีที่แล้ว +3

    I can't find such a content anywhere on TH-cam. These videos shows the hard work behind it. U r really great.
    Thnq very much..🙌🙌

  • @rohanvardhan2767
    @rohanvardhan2767 5 ปีที่แล้ว +16

    Your channel is one stop shop for SDI prep. Thank you for another great video. Could you please share system design for YELP like service.
    Cheers!

  • @lalithak6323
    @lalithak6323 2 ปีที่แล้ว

    Can't explain in words how much useful is this video, thanks a lot!!!

  • @ArvindChourasia
    @ArvindChourasia 2 ปีที่แล้ว

    What an amazing simplified explanation of such a big concept. Thanks for sharing your knowledge.

  • @MargiPatel-o9h
    @MargiPatel-o9h ปีที่แล้ว

    Thank you for sharing! Very clear and learner centered. What an amazing educator!

  • @kumarc4853
    @kumarc4853 3 ปีที่แล้ว

    Narendra Sir is a Legend TH-camr in Sys Design. Very Thorough and knowledgeful videos

  • @phamquangvi4413
    @phamquangvi4413 2 ปีที่แล้ว

    Thank you for your interesting video. Thank a lot.

  • @iitgupta2010
    @iitgupta2010 5 ปีที่แล้ว +12

    Grt, its good knowledge on Inverted index search. That' cool.
    But was expecting bit more on Actual design. System diagram flow etc.

  • @vivekrautela6928
    @vivekrautela6928 3 ปีที่แล้ว +28

    The b tree indexing has time complexity of O(logn) as it performs binary search, why you told that it takes O(1)?

    • @cool1000nitin
      @cool1000nitin 3 ปีที่แล้ว +7

      Correct. What was explained is caching not indexing but overall video is very helpful.
      👍

  • @veenuvinod1
    @veenuvinod1 2 ปีที่แล้ว

    Bro , you are doing such a great work that too free of cost.

  • @weekendresearcher
    @weekendresearcher 5 ปีที่แล้ว +38

    So basically, it is a great video!

  • @raviprakashagrawal9478
    @raviprakashagrawal9478 5 ปีที่แล้ว +17

    This guy has done so much hard work to provide free education. Still, a few people disliked this video. Shame on them.

  • @rahulsadanandan5076
    @rahulsadanandan5076 4 ปีที่แล้ว +13

    Once that "basically" sets in,it's hard to get over

    • @RanjuRao
      @RanjuRao 3 ปีที่แล้ว

      I literally counted how many times "basically" is used. I appreciate the fact that people like him are coming out of their way to contribute to society ,but it would have been more of a sounding content iff this was planned and represented even more methodologically.

  • @brijeshshirodkar8784
    @brijeshshirodkar8784 6 หลายเดือนก่อน

    really enjoyed the illustration of Indexing

  • @madhu9829
    @madhu9829 5 ปีที่แล้ว +38

    How do you gather such in-depth details of these systems? Really great work and I'm enjoying it.

    • @TechDummiesNarendraL
      @TechDummiesNarendraL  5 ปีที่แล้ว +19

      You can too, keep reading :)

    • @mmanuel6874
      @mmanuel6874 4 ปีที่แล้ว

      @@TechDummiesNarendraL what books and resources do you recommend?

    • @nabanitasen
      @nabanitasen 4 ปีที่แล้ว +4

      @@mmanuel6874 Best place is the papers published by these companies and their tech talks.

    • @techtea5911
      @techtea5911 2 ปีที่แล้ว

      @@mmanuel6874 Google!

  • @Kasatankit
    @Kasatankit 3 ปีที่แล้ว

    Bhai you are awesome!!! I cleared so many interviews because of you!! Sending you more power and may all your wishes come true!

  • @ricardob.18
    @ricardob.18 ปีที่แล้ว +1

    Amazing video!

  • @harishwarreddy9114
    @harishwarreddy9114 4 ปีที่แล้ว

    While watching the video i can observe increase in my knowledge .What an explanation!

  • @anastasianaumko923
    @anastasianaumko923 ปีที่แล้ว

    Very clear! Thank you so much for your work 😌

  • @akashjkhamkar
    @akashjkhamkar 2 ปีที่แล้ว

    really cool video man, keep such videos coming !

  • @NitishSarin
    @NitishSarin 5 ปีที่แล้ว +8

    You sir, are a legend! 🙏🏻

  • @janabodu3392
    @janabodu3392 4 ปีที่แล้ว +7

    Can you please increase the volume, its hard to listen from my laptop.. Noticed same thing with your previous videos as well.. But Nice and informative videos..

  • @goutamkreddy
    @goutamkreddy 4 ปีที่แล้ว

    Nice work with laying out the basic building blocks if any search engine! (especially, the inverted index section)

  • @subee128
    @subee128 3 หลายเดือนก่อน

    Thanks

  • @kambalavijay6800
    @kambalavijay6800 3 ปีที่แล้ว +1

    @Tech Dummies Narendra L, it's ok to build a matrix for 3 documents. How about a case where there are a billion docs to scan through and build an inverted index matrix. The matrix would be huge right?

  • @shivampradhan6101
    @shivampradhan6101 5 หลายเดือนก่อน

    can you make a new video, since i think searching has moved quiet ahead from tf idf to gen ai embeddings and rag methods

  • @大盗江南
    @大盗江南 3 ปีที่แล้ว

    amazing.........just amazing......thank u narendra!!!

  • @optimizer_____2420
    @optimizer_____2420 5 ปีที่แล้ว

    Very nice explanation sir 👍. Thank you very much.

  • @ganeshmain009
    @ganeshmain009 5 ปีที่แล้ว +3

    Great content on this channel bro..keep up the good work

  • @ankitjain8255
    @ankitjain8255 3 ปีที่แล้ว

    Such an amazing video, thank you so much Narendra.

  • @v.karikaran5973
    @v.karikaran5973 3 ปีที่แล้ว

    how you learn this thing can you please suggest any book to refer,
    how search engine work how to build it

  • @AP-tz1ns
    @AP-tz1ns ปีที่แล้ว

    really good video

  • @adithyaks8584
    @adithyaks8584 3 ปีที่แล้ว

    Very good channel I am enjoying every bit of learning here !

  • @mohamedamr9203
    @mohamedamr9203 4 ปีที่แล้ว

    thank you its a great video

  • @prasadjayanti
    @prasadjayanti 3 ปีที่แล้ว

    good explaining ...

  • @MrQuadraaa
    @MrQuadraaa 3 ปีที่แล้ว

    Top content. Thank you!

  • @saurabhtyagi6963
    @saurabhtyagi6963 2 ปีที่แล้ว

    Make bit indexing between words and docs and then it will be easy to do conjuction and disjunction

  • @524emon
    @524emon 3 ปีที่แล้ว

    Great content ... great explanation ... it is difficult to find such content on youtube.

  • @yzmashuai
    @yzmashuai 5 ปีที่แล้ว

    Thx Narendra for this wonderful eposide

  • @adamhughes9938
    @adamhughes9938 4 ปีที่แล้ว

    So great

  • @ashish161087
    @ashish161087 4 ปีที่แล้ว

    Wow..You are great..

  • @kumarch4027
    @kumarch4027 3 ปีที่แล้ว

    How the millions of queries will be searched over Reverse index . How the concurrency achieved ? Union and intersection where it is happening and how quick it is . Can query go to the exact shards based on query words and perform union ?

  • @rahulsoni-lx5rb
    @rahulsoni-lx5rb 9 หลายเดือนก่อน

    🤩🤩🤩

  • @swaroopjin
    @swaroopjin 4 ปีที่แล้ว +1

    Its a great video Naren.. but would like to understand how does refactor/removal of index/keyword happens, also does the Google store index across multiple regions eg: APAC, AMERICA... How big the index can be incase of the Google

  • @theultimaterelaxation6839
    @theultimaterelaxation6839 3 ปีที่แล้ว

    Thank you so much.

  • @waelalghazouli8024
    @waelalghazouli8024 4 ปีที่แล้ว

    Very helpful, thank you very much!

  • @priyaravindran4337
    @priyaravindran4337 4 ปีที่แล้ว

    Awesome 👏

  • @anchaldubey4217
    @anchaldubey4217 5 ปีที่แล้ว

    Very nice explanation sir

  • @kslsantosh
    @kslsantosh 5 ปีที่แล้ว +2

    Nice explanation :) , one small suggestion where the order of the words matter is for example query: "Distance between Mumbai to Delhi". This video helped me to brush up information retrieval techniques.

    • @cliffmathew
      @cliffmathew ปีที่แล้ว

      When you type "Distance between Mumbai to Delhi", I guess the expectation is to get a number ("26 hr (1,419.7 km) via NH 48") as an answer, and not the documents containing those words. Essentially, we would like the search engine to understand the meaning of our question, and then respond with a factual answer. I suspect that is handled by different techniques like "Semantic search" or "extractive question answering".

  • @bvsivakrishna
    @bvsivakrishna 2 ปีที่แล้ว +2

    "Basically" I got your point brother! 😂

  • @sagartyagi2450
    @sagartyagi2450 3 ปีที่แล้ว

    How does fuzzy search works? What if I write Quock instead of Quick? How does that comes as a result?

  • @rahulsharma5030
    @rahulsharma5030 3 ปีที่แล้ว

    how can we update this table when new stuff is added?Shall we store it in redis/cache to speed up?

  • @prabhatkumarsahu3115
    @prabhatkumarsahu3115 4 ปีที่แล้ว

    Great super explanation...

  • @sanjeebkumargouda1471
    @sanjeebkumargouda1471 3 ปีที่แล้ว

    Great Lecture sir !!!
    I enjoyed listening and learning😇😇

  • @MoonsuKang
    @MoonsuKang 5 ปีที่แล้ว +1

    Great contents. Thanks for your video.

  • @piotrszkiecin8357
    @piotrszkiecin8357 5 ปีที่แล้ว

    Your videos are great!

  • @rahulsrivastava1603
    @rahulsrivastava1603 ปีที่แล้ว

    You earned a sub Sir

  • @lokesh4585
    @lokesh4585 3 ปีที่แล้ว

    What if we search for "Jumped", Do we expect search engine to lemmatize search keyword?

  • @poonamgoel8993
    @poonamgoel8993 4 ปีที่แล้ว +1

    Very much informative, thanks for posting this video. One question: is this how all search engine works eg: Windows File System Search, outlook/gmail search, web search?

  • @chaosu2755
    @chaosu2755 3 ปีที่แล้ว +1

    If there are millions documents have these words. You cannot run intersection in memory. That is one critical part we need focus on. How to make it work?

  • @w.maximilliandejohnsonbour725
    @w.maximilliandejohnsonbour725 5 ปีที่แล้ว

    Nice info...!!!!!.

  • @vamshiabhilash
    @vamshiabhilash 4 ปีที่แล้ว +1

    this is really informative and thank you for the kn0wledge this will really be helpful for me as I'm into digital marketing Thank you do much sir really loved the way you explained

  • @abhaytiwari6411
    @abhaytiwari6411 4 ปีที่แล้ว

    bhaut acha guru

  • @panprasanta
    @panprasanta 3 ปีที่แล้ว

    can you shed some light on how intersections are performed when you have billion of docs ?.

  • @sushantdev4997
    @sushantdev4997 2 ปีที่แล้ว

    can't we use a trie here?

  • @roshankumar0911
    @roshankumar0911 4 ปีที่แล้ว

    Awesome explanation :-)

  • @faisalmorensya4936
    @faisalmorensya4936 4 ปีที่แล้ว

    great video man!

  • @sandeepbatchu487
    @sandeepbatchu487 5 ปีที่แล้ว +1

    What is the point of storing frequencies?

    • @carrotcarrorfarm
      @carrotcarrorfarm 4 ปีที่แล้ว

      it is relevant the search result rank. tf-idf, bm25 concept will help you

  • @20frieza
    @20frieza 5 ปีที่แล้ว

    Great Video again Narendra! I am not sure what is the purpose of frequency in this table. It looks like we will never use it when querying. Also why cant the data structure to store the index be Map. This way you can have very fast lookups. But I agree you cant do regex type matching in it.

    • @cliffmathew
      @cliffmathew ปีที่แล้ว +1

      I suspect the frequency is used to understand the semantics. A word and words that appear together or frequently around it are all useful information to gauge the relevance of a document.

  • @vijaykumar-yq7sf
    @vijaykumar-yq7sf 5 ปีที่แล้ว +1

    Great

  • @esakkisundar
    @esakkisundar 3 ปีที่แล้ว

    Out of topic, did you ever wondered the information retrieval from your brain. You can bring back the incident/emotions/moments in milliseconds and the brain is not power intensive at all. What an amazing design our brain has.

    • @JasonBechervaise
      @JasonBechervaise 3 ปีที่แล้ว

      The brain operates at about 20 watts; this is about 30% more than a Raspberry Pi 4B at theoretical consumption (5V @ 3A = 15w)

  • @GopalRoy-nn6ft
    @GopalRoy-nn6ft 5 ปีที่แล้ว +2

    I don't think u have included technical concept. And technology they use.

  • @bharathpreetham2840
    @bharathpreetham2840 5 ปีที่แล้ว +1

    nice explanation😊

  • @eugnsp
    @eugnsp 4 ปีที่แล้ว +2

    Suppose that words "quick", "brown" and "fox" are associated with millions of documents. How would you find an intersection of them in a fraction of a second?

    • @ers-br
      @ers-br 2 ปีที่แล้ว

      Just compare the 'bits' ...

  • @vrushtijoshi92
    @vrushtijoshi92 5 ปีที่แล้ว +1

    This was really helpful. Thank you so much. :)

  • @shubhambansal5487
    @shubhambansal5487 5 ปีที่แล้ว +1

    Great job!!
    One important suggestion. Use external microphone to cut off background noise and better sound quality.
    Otherwise your Hard work would not be paid off.

  • @pravaskumar7078
    @pravaskumar7078 5 ปีที่แล้ว

    awesome ....

  • @chessmaster856
    @chessmaster856 21 วันที่ผ่านมา

    Nothing is searched when requested. They show you what they have. Then go and refine indexes.

  • @sujataroychowdhury178
    @sujataroychowdhury178 5 ปีที่แล้ว

    Great Video for explaining the nitty gritty details of inverted search. If you at least show the full flow of the design, would be much appreciated .

  • @mityabor
    @mityabor 4 ปีที่แล้ว +1

    Nice explanation of TF-IDF preprocessing. But not clear how ranking and ML are organized

  • @dataguy7013
    @dataguy7013 4 ปีที่แล้ว

    @NArendra, thanks for the detailed explanation. Nice work. So if there ia a new document with the word borwn, how do you update the index? IS updating the index more expensive? Can you do incremental load of the index?

  • @raselahmedb
    @raselahmedb 5 ปีที่แล้ว

    Best explanation.
    Please upload POS system.

  • @keshavyadav
    @keshavyadav 3 ปีที่แล้ว

    Please improve the audio quality 👏

  • @adamhughes9938
    @adamhughes9938 4 ปีที่แล้ว +1

    If you put a donation link on your videos I would send you money at this point, your vids have helped me so thouroughly!

  • @chitthiaayeehai
    @chitthiaayeehai 5 ปีที่แล้ว +1

    Noise removal got slipped ... But it's ok I figured it out... Nice lecture dude.

    • @helloworld6679
      @helloworld6679 5 ปีที่แล้ว

      He did at 17.50

    • @chitthiaayeehai
      @chitthiaayeehai 5 ปีที่แล้ว

      @@helloworld6679 yes yes yes u r rt dude .... Perfect ...

  • @deep-path-DP
    @deep-path-DP 5 ปีที่แล้ว

    can we have a video on dream11 architecture

  • @rahulsharma5030
    @rahulsharma5030 4 ปีที่แล้ว

    Awesome stuff.One doubt:
    in last minutes prefix search, i think if we store keys by sorted order.it will make insertion complex,search through the table using binary search and then insert.Btrees/trie can be used but they are not as efficient to search as hash.what we can do is to search jum(a) jum(b) and so on upto z.Since there are 26 apphas and limited number of character apart from alpha.It is more efficient than maintaining sorted list.

    • @realgabreal
      @realgabreal 2 ปีที่แล้ว

      jum* doesnt tell us how long the word would be so just searching for jum(a), jum(b) and so on would not work I think

  • @chiragkataria4161
    @chiragkataria4161 3 ปีที่แล้ว

    Great content man, well explained, only suggestion is to work on audio quality, even after max volume its not properly audible

  • @somerandomguy000
    @somerandomguy000 3 ปีที่แล้ว

    Basically this is a video on how to use basically in all sentences you say

  • @soniajain7822
    @soniajain7822 4 ปีที่แล้ว

    how do u extend this to also show related searches?

    • @karthik5240
      @karthik5240 4 ปีที่แล้ว

      k-mean clustering

  • @balakrish3387
    @balakrish3387 4 ปีที่แล้ว

    Great content... Voice not so clear

  • @stevemew6955
    @stevemew6955 3 ปีที่แล้ว

    Great video! A bit of trivia - "the quick brown fox jumped over the lazy dog" should have an 's' in it -> "the quick brown fox jumps over the lazy dog". It's a typing exercise to hit all 26 letters. en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over_the_lazy_dog. ;)

  • @AbhishekSharma-bg7ez
    @AbhishekSharma-bg7ez 3 ปีที่แล้ว

    Thanks for the great video and explanation. There is an issue in the term document table. Word "Quick" appears twice in that table at row0 (as Quick) and row8 (as quick) but has different values. this is bit confusing as I am not sure if I am missing something.

    • @atulsinghrajput9932
      @atulsinghrajput9932 2 ปีที่แล้ว

      yes , you can to toLowerCase() , its was a mistake. you can ignored that

  • @arunachalamk1145
    @arunachalamk1145 5 ปีที่แล้ว

    Can u make video about networking deeply

  • @rajparekh08
    @rajparekh08 3 ปีที่แล้ว

    Thanks for taking time to make this video. Basically 😂 explained very well .

  • @EricEric2004
    @EricEric2004 4 ปีที่แล้ว +1

    Basically, the basicality of the basic is quite basical.

    • @TechDummiesNarendraL
      @TechDummiesNarendraL  4 ปีที่แล้ว +1

      You got me

    • @EricEric2004
      @EricEric2004 4 ปีที่แล้ว

      Tech Dummies great videos. I love them and subscribed. Thank you 🙏

  • @tanapaul654
    @tanapaul654 5 ปีที่แล้ว

    Naren sir please help me how can I contact you I'm really need the design ply

  • @preety202
    @preety202 4 ปีที่แล้ว

    You need better lighting, it's too dark