What's ElasticSearch Used For? | Search Indexes | Systems Design Interview 0 to 1 with Ex-Google SWE

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ธ.ค. 2024

ความคิดเห็น • 35

  • @slow2steady
    @slow2steady ปีที่แล้ว +27

    This is like a complete show… you learn and laugh

  • @adrian333dev
    @adrian333dev 7 หลายเดือนก่อน +11

    My average experience while watching Jordan's content: He starts explaining and in the first minute of the video he mentions something from his previous video, I pause and start watching his previous video and he again mentions something from his past content, and this loop goes on until I say enough is enough

    • @jordanhasnolife5163
      @jordanhasnolife5163  7 หลายเดือนก่อน +2

      My average answer: start from #1

    • @adrian333dev
      @adrian333dev 7 หลายเดือนก่อน

      @@jordanhasnolife5163 Saw this coming but starting 60 video series after finishing two system design courses on Udemy feels intimidating

    • @pakkunhatake
      @pakkunhatake 6 หลายเดือนก่อน +1

      same

    • @shreesharao7261
      @shreesharao7261 หลายเดือนก่อน +1

      @@jordanhasnolife5163 You should incorporate some quick recap just like in TV shows so that every video is self contained, unless your intention is to increase views to all your videos ;)

    • @jordanhasnolife5163
      @jordanhasnolife5163  หลายเดือนก่อน

      @@shreesharao7261 haha it depends, there can be a lot of recap

  • @Unstoppable_gaur
    @Unstoppable_gaur ปีที่แล้ว +3

    Great Content would like some more of this kind.
    Appreciate the effort and dedication you try to make this system design videos they are helpful. These videos really made me to fall in love with the System design and I just keep reading blogs and looking out for your new videos for this knowledge.

  • @andyborch9886
    @andyborch9886 4 หลายเดือนก่อน +2

    Man I normally don't laugh at your jokes but this one actually made me laugh, I think it was mainly due to your stare at the end of the intro! 😆

  • @guoard
    @guoard 4 วันที่ผ่านมา +1

    Great video!

  • @mayankchhabra3070
    @mayankchhabra3070 27 วันที่ผ่านมา +1

    If we take the example of creating a search on top of chats and if we partition it at chat_id wont that lead to an uneven distribution of data? Given elastic search has these shards and it tries to distribute the data evenly across all the shards but if we explicitly route our data to a specific shard (using chat_id in our example) it can lead to uneven distribution of data across shards where one chat might have active and other might be dormant.
    Just thinking out loud how we would solve for this :P (Probably distribute it evenly by using some composite key but that would defy the purpose to just search chats from one partition)

    • @jordanhasnolife5163
      @jordanhasnolife5163  23 วันที่ผ่านมา +1

      Using many small partitions and balancing them appropriately I believe tends to be the preferred approach here

  • @ryan-bo2xi
    @ryan-bo2xi ปีที่แล้ว +1

    Great job sir !!

  • @NghiaPham-o7x
    @NghiaPham-o7x ปีที่แล้ว +1

    Hi Jordan, great job, learn from you a lot!
    One thing I don't understand is you are mentioning the global index might be inefficient because we might need to send the document to many partitions. I'm wondering why do we need to send the document to many partitions?
    What I am thinking is, when a query comes, and we have a node to handle that query, this node will gather document lists from the indexes and merge it into a set of document ids and then query those documents from partitions. Or am I missing something?

    • @jordanhasnolife5163
      @jordanhasnolife5163  ปีที่แล้ว +2

      Hey! I'm saying that when we upload a document, we have to write to multiple partitions.
      This is because the document has many words in it!

    • @NghiaPham-o7x
      @NghiaPham-o7x ปีที่แล้ว +1

      @@jordanhasnolife5163
      I see, so you are talking about write path. Out of curiosity, I'd like to discuss more on options here, as with global index, we can have other approaches:
      1. Write the same document to multiple partitions -> as you said that it will make partition meaningless
      2. We save the document in one partition + update the global index from other partition with distributed transaction (e.g. 2PC)
      Is there any flaw from the second approach or it can be used in real system? The second approach is slow on write, and it might be bad for heavy write system like logging, but I think it will benefit for a light write and heavy read system.
      What do you think?

    • @jordanhasnolife5163
      @jordanhasnolife5163  11 หลายเดือนก่อน +1

      @@NghiaPham-o7x The second appraoch seems doable, but just consider what happens when we have to write the document to 10 partitions instead of just two haha

  • @Spyrie
    @Spyrie 23 วันที่ผ่านมา +2

    Didn't know Kylo Ren do tech stuffs

    • @jordanhasnolife5163
      @jordanhasnolife5163  20 วันที่ผ่านมา

      Surprisingly this is not the first Adam Driver comment I've gotten

  • @sahilguleria6976
    @sahilguleria6976 3 หลายเดือนก่อน +1

    Elasticsearch partitoning section : In partition 1 we have cherry: 47, 39. So this partition has these documents in memory. Now do the two document 47, 39 stay only in partition 1? If yes, is this how we prevent duplication?
    Also do all the other tokens in 47, 39 reside in the same partition?

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 หลายเดือนก่อน

      Yes, those documents just stay in that partition, as do the other tokens in 47, 39. Confused what you mean - the same document will always be hashed to the same partition, ideally.

  • @msebrahim-007
    @msebrahim-007 4 หลายเดือนก่อน +3

    I'm not really understanding the difference between of using the local index instead of a global index. It sounds like the reason not to use a global index is because it is possible for a document to be duplicated to multiple partitions, so instead a local index is used with a pointer to a document in-memory.
    This is where my confusion lies. It doesn't sound like a local index addresses the issue of the document being duplicated onto multiple partitions but instead just references the document locally (but it is potentially on multiple partitions) by using a pointer.
    If in both cases the document will be duplicated to multiple partitions, why not just use a pointer in the global index case? That way there is no scatter-gather required for a particular word.

    • @jordanhasnolife5163
      @jordanhasnolife5163  4 หลายเดือนก่อน +1

      To be clear, we're denormalizing the documents. It's not a pointer to the document in the local index case, you're actually storing the document itself there. Otherwise I'd agree with you.

    • @sahilguleria6976
      @sahilguleria6976 3 หลายเดือนก่อน +1

      @@jordanhasnolife5163 can you please explain what does denormalizing the documents means here?

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 หลายเดือนก่อน +1

      @@sahilguleria6976 I'm not just holding a document id in the search index, I'm holding a decent amount of document data in it

  • @GANJIMAN123
    @GANJIMAN123 4 หลายเดือนก่อน +3

    not clear if elastic search uses local index or global index?

  • @Summer-qs7rq
    @Summer-qs7rq 10 หลายเดือนก่อน +1

    Amazing video. Thanks for these informative videos.
    However i have a question related about elastic search. given that scatter gather is difficult to avoid in elastic search. So how much data can it scale to ? like if i want to build search on twitter now the data is growing at rapid pace. Will it be okay to store all the tweets in elastic search ? or if we need retention then what happens to the tweets that are not found in the elastic search ? could you please help answer above questions ?

    • @jordanhasnolife5163
      @jordanhasnolife5163  10 หลายเดือนก่อน +1

      I think that for twitter, for example, what they would do for example is to index data by timestamp. That way, when you search for something on elastic search, it'll mainly hit the indexes for the last couple of days of data. That way there are fewer posts and there are less things to perform a "scatter/gather" for.
      You basically just have to be clever about how you want to shard your data.

    • @Summer-qs7rq
      @Summer-qs7rq 10 หลายเดือนก่อน

      @@jordanhasnolife5163 in case of timestamp are you suggesting to search the key word for latest time and then if it not found then look into different time stamp index ? Wouldnt this make more time consuming ?

  • @theblobinc
    @theblobinc 2 หลายเดือนก่อน +1

    I too have no life, thats probably why I find myself here learning about elasticsearch....

  • @nicotomomate
    @nicotomomate หลายเดือนก่อน +1

    Thanks man

  • @yrfvnihfcvhjikfjn
    @yrfvnihfcvhjikfjn ปีที่แล้ว +2

    Hi