SPLADE: the first search model to beat BM25

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ก.ย. 2024

ความคิดเห็น • 58

  • @jamesbriggs
    @jamesbriggs  ปีที่แล้ว +13

    To install the naver labs splade library you need `pip install git+github.com/naver/splade.git`

  • @JulianHarris
    @JulianHarris ปีที่แล้ว +16

    Came here curious about SPLADE, discovered a super understandable introduction to transformers and attention networks. Thank you!

    • @jamesbriggs
      @jamesbriggs  ปีที่แล้ว +2

      I really wanted to get the point across about SPLADE but there was a lot of foundational stuff to cover from sparse vs. dense, transformers, etc - so I'm glad the extra info helped :)

    • @zazouille2264
      @zazouille2264 ปีที่แล้ว

      Agreed. Great video. Nicely layered.
      Thank you OP

    • @magicofjafo
      @magicofjafo ปีที่แล้ว

      I agree!

  • @shamaldesilva9533
    @shamaldesilva9533 ปีที่แล้ว +3

    dude you are a gold mine when it comes to these topics 😍😍 .

    • @jamesbriggs
      @jamesbriggs  ปีที่แล้ว +1

      thanks man it's appreciated!

  • @ArnavJaitly
    @ArnavJaitly ปีที่แล้ว +1

    James, this is awesome and very relevant to my current project! Thank you for your efforts in putting this together and sharing it, much appreciated!

  • @ttharita
    @ttharita หลายเดือนก่อน

    Super informative. Thank you so much!!!

  • @kevon217
    @kevon217 ปีที่แล้ว

    Great tutorial as always. Your slide animations are next level!

  • @IamalwaysOK
    @IamalwaysOK ปีที่แล้ว +1

    Hey James, as usual, thanks a ton for your awesome videos! I've got a quick question for you. Have you ever thought about using a knowledge graph alongside SPLADE to expand terms? And is there any way we can embed that knowledge into sparse vectors using transformers? Curious to hear your thoughts on this!

  • @user-ih1dx6wn9c
    @user-ih1dx6wn9c ปีที่แล้ว +1

    Thank you! when using embeddings and asking the model gpt-3.5 some question like "write me some code that use this and that" does the model automaticlly search in the embedding too in order to give the answer?

    • @jamesbriggs
      @jamesbriggs  ปีที่แล้ว

      gpt 3 doesn't, you need to add a knowledge base to do this, like I do here th-cam.com/video/rrAChpbwygE/w-d-xo.html

  • @abhayr
    @abhayr ปีที่แล้ว

    Amazing explanation. Thx for sharing

  • @lutune
    @lutune ปีที่แล้ว +2

    Have you built any of these apps? Your content is so great, as you get into more media, some development of those apps could really help with putting this into a visual space

    • @jamesbriggs
      @jamesbriggs  ปีที่แล้ว +2

      started building some demos and testing splade a couple months ago, will be sharing more soon - it's really cool though and I intend on making it a big part of my "go to toolkit" in the future

    • @lutune
      @lutune ปีที่แล้ว

      @@jamesbriggs Your DC seems to be getting a lot of new people! ill get some things updated on there today for ya

  • @MaheshJha-y3j
    @MaheshJha-y3j ปีที่แล้ว

    Hello James, the above pinned method for pip install splade is not working and giving error like "error: subprocess-exited-with-error" so, Can you please let what is the issue or what alternate we can use if not this.

  • @salesgurupro
    @salesgurupro ปีที่แล้ว

    Amazing. Thanks for such a great explanation 😊

  • @snack711
    @snack711 ปีที่แล้ว +1

    i am surprised how "orangutans" got split into tokens. i thought "orangutan" surely had to be a token itself.

  • @danrosher6658
    @danrosher6658 ปีที่แล้ว

    Great talk, thanks James ... Would an alternative to the cosine sim to compare query/doc is to index the tokens and weights for docs (from SPLADE model outputs) , also convert a query to tokens(and weights) , then return docs having the query tokens where the doc weight > query token weight for each token? .. would this work ?

  • @Sky-ec9eu
    @Sky-ec9eu ปีที่แล้ว

    This is incredible. Thanks James!

  • @avidlearner8117
    @avidlearner8117 ปีที่แล้ว

    Fantastic content! Especially since I'm building an app and need to find a proper solution for data retrieval....

  • @leventk.1611
    @leventk.1611 11 หลายเดือนก่อน

    13:02: low proximity = high semantic similarity. Not high proximity. :D

  • @alivecoding4995
    @alivecoding4995 ปีที่แล้ว

    Which graphics library do you use for these Transformer illustrations? Are these pre-built assests?

  • @gorgolyt
    @gorgolyt 9 หลายเดือนก่อน

    Great video. But you should link to the SPLADE paper(s). Are you just talking about the original paper here?

  • @thedude9270
    @thedude9270 ปีที่แล้ว

    Thanks for the tutorial! Is it possible that you could also share a colab or video explaining what would then be upserted as a Pinecone vector?

  • @johannamenges3095
    @johannamenges3095 ปีที่แล้ว

    But is Faiss still a solid solution for a semantic search engine? Cause I am at the moment working on a search engine with Faiss algorithm

  • @aurkom
    @aurkom 10 หลายเดือนก่อน

    Really enjoyed this one.

  • @ylazerson
    @ylazerson ปีที่แล้ว

    very fascinating - thanks!

  • @AnonymousIguana
    @AnonymousIguana 9 หลายเดือนก่อน

    So, SPLADE vector generation is just as computationally intensive as dense vector generation? My understanding is that SPLADE requires real-time inference from a sophisticated model like BERT at query time. Isn't that very problematic?

    • @RatafakRatafak
      @RatafakRatafak 8 หลายเดือนก่อน

      Looks like so. Sentence-BERT is equally computationally intensive thant this SPLADE.

  • @SinanAkkoyun
    @SinanAkkoyun ปีที่แล้ว

    How does this compare to the new OpenAI embeddings?

  • @biaoliu9297
    @biaoliu9297 ปีที่แล้ว

    Is there a multi-language version model?

  • @nhatpham4053
    @nhatpham4053 ปีที่แล้ว

    awesome works

  • @abhinavkulkarni6390
    @abhinavkulkarni6390 ปีที่แล้ว

    Hey James,
    Can you please compare SPLADE with ColBERTv2 - both of which are designed to alleviate the problems of desnse passage retrievers?

    • @jamesbriggs
      @jamesbriggs  ปีที่แล้ว

      I haven't read into the colbert models, I understood them to not be hugely scalable? I can look into it if they're of interest

  • @EkShunya
    @EkShunya ปีที่แล้ว

    what tool do you use to make the diagrams ?

  • @kayalvizhi8174
    @kayalvizhi8174 2 หลายเดือนก่อน

    How has the results of SPLADE been. Has it been proven to be effective?

  • @jeffsteyn7174
    @jeffsteyn7174 ปีที่แล้ว

    That's interesting. What does pinecone use, sparse or dense?

    • @jamesbriggs
      @jamesbriggs  ปีที่แล้ว +2

      now it can use both, I'll talk about it in the coming days or you can refer to here github.com/pinecone-io/examples/blob/master/search/hybrid-search/medical-qa/pubmed-splade.ipynb - for an example

    • @sndrstpnv8419
      @sndrstpnv8419 9 หลายเดือนก่อน

      code deleted pubmed-splade.ipynb @@jamesbriggs

    • @RatafakRatafak
      @RatafakRatafak 8 หลายเดือนก่อน

      Is it important? If you use cosine similarity for both dense and sparse embeddings, it should work in any case.

  • @BuFu1O1
    @BuFu1O1 10 หลายเดือนก่อน

    vocabulary mismatch can be fixed with sub-embeddings

  • @ramsescoraspe
    @ramsescoraspe ปีที่แล้ว

    Multilingual??

    • @jamesbriggs
      @jamesbriggs  ปีที่แล้ว +2

      I don't think there's a multilingual splade *yet*

    • @RubenAlvarezMtz
      @RubenAlvarezMtz ปีที่แล้ว

      My thoughts exactly

  • @klammer75
    @klammer75 ปีที่แล้ว +1

    Keywords and page rank are dead! The information landscape is undergoing a seismic shift and everyone better put a helmet on!!!🤔🤪😉🤖

    • @jamesbriggs
      @jamesbriggs  ปีที่แล้ว +1

      things are moving so fast rn

    • @klammer75
      @klammer75 ปีที่แล้ว

      @@jamesbriggs seems we’re getting closer and closer to the inflection point of the exponential….next stop, ludicrous speed!🤯🚀

  • @hoangphanhuy1992
    @hoangphanhuy1992 9 หลายเดือนก่อน

    I thought CLIP no need to finetune so why cons of dense is need to finetune sir? @jamesbriggs