RAG But Better: Rerankers with Cohere AI

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.ค. 2024
  • Rerankers have been a common component of retrieval pipelines for many years. They allow us to add a final "reranking" step to our retrieval pipelines - like with Retrieval Augmented Generation (RAG) - that can be used to dramatically optimize our retrieval pipelines and improve their accuracy.
    In this video we'll learn about rerankers, how they compare to the more common embedding retrieval only setup, and how we can create retrieval pipelines with reranking using Cohere AI reranking model. We'll also be using the (more typical) OpenAI text-embedding-ada-002 model with the Pinecone Vector Database.
    📌 Code (08:32):
    github.com/pinecone-io/exampl...
    📚 Article:
    www.pinecone.io/learn/series/...
    🌲 Subscribe for Latest Articles and Videos:
    www.pinecone.io/newsletter-si...
    👋🏼 AI Consulting:
    aurelio.ai
    👾 Discord:
    / discord
    Twitter: / jamescalam
    LinkedIn: / jamescalam
    00:00 RAG and Rerankers
    01:25 Problems of Retrieval Only
    04:32 How Embedding Models Work
    06:34 How Rerankers Work
    08:20 Implementing Reranking in Python
    13:11 Testing Retrieval without Reranking
    15:21 Retrieval with Cohere Reranking
    21:54 Tips for Reranking
    #artificialintelligence #nlp #ai #openai
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 100

  • @slayermm1122
    @slayermm1122 2 หลายเดือนก่อน +6

    cohere just released rerank3 and it wokred increiblely fantastic with openai's embedding 3 model; thanks for your kind intro

  • @gitmaxd
    @gitmaxd 8 หลายเดือนก่อน +5

    Each video gets better! Thank you for your work!

  • @jantuitman
    @jantuitman 8 หลายเดือนก่อน +11

    I learned a lot from this, thank you. You say you plan a series, and you were talking about other topics for the series but these other topics you mentioned were not about rerankers. I noted that this video treats rerankers as black boxes so you could even expand the series. I for sure would be interested in: what are the most recent reranking models, how doe rerankers work, is it feasible to make a reranker yourself or does this require, just like a transformer, that you scrape the entire language / internet? In other words, this video was very interesting, but now I know about rerankers I have lots and lots of questions about rerankers.

  • @adityavd97
    @adityavd97 2 หลายเดือนก่อน

    Thanks man. You are improving my hobby projects in real time.

  • @justinwlin
    @justinwlin 8 หลายเดือนก่อน +7

    You’ve got top notch editing + technical explanations and none of that is easy. The amt of work to create a 20 min video, and be cohesive on such a topic is amazing. Thanks! 🔥 all ur videos are so helpful and just interesting to watch and learn

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน

      that's awesome to hear, thanks :)

  • @Cdaprod
    @Cdaprod 8 หลายเดือนก่อน

    My god, thank you 🙏 as someone that only rebuilds the wheel, your content is very much appreciated.

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน

      Happy to hear it!

  • @Shaunmcdonogh-shaunsurfing
    @Shaunmcdonogh-shaunsurfing 8 หลายเดือนก่อน

    Thank you for making this. Fascinating.

  • @samwilletts9390
    @samwilletts9390 8 หลายเดือนก่อน +1

    Great video, looking forward to more on this!

  • @narayangopalmaharjan
    @narayangopalmaharjan 8 หลายเดือนก่อน +1

    Thank you for this video, been stuck in RAG realm with llama index and not satisfied, I thought similar reranking but manually, i will try cohere today instead

  • @Jandodev
    @Jandodev 8 หลายเดือนก่อน

    I've been doing this with transformers I think theirs a alot to think about with doing this efficiently but it does get the best results!

  • @real-ethan
    @real-ethan 5 หลายเดือนก่อน +2

    My approach is letting the LLM summarize the user's input first, the prompt could be written as: "The summary of the user's request to semantically search relevant documents in English." The output of the LLM's summarization can then be used to query the vector database after embedding. This approach may potentially increase the accuracy of retrieval.

    • @edwardmitchell6842
      @edwardmitchell6842 4 หลายเดือนก่อน

      How would an LLM know how to optimize for semantic search? I would expect an expansion on the ideas would be better.
      I need to finish my LLM Ops to get the answer,

    • @real-ethan
      @real-ethan 4 หลายเดือนก่อน

      ​@@edwardmitchell6842
      The typical process we currently observe for Retrieval-Augmented Generation (RAG) is as follows:
      1. After segmenting the document, embedding is directly performed, and the results are stored in the vector database.
      2. For user inputs or queries, embedding is directly carried out, and a similarity search is conducted within the vector database.
      It is a peculiar solution to rely solely on mathematical similarity matching for a single question and multiple factual paragraphs. As humans, when addressing a customer's request or solving a problem, we are unlikely to simply copy and paste the original question into Google's search box and hit enter. Instead, we tend to abstract and summarize the question and requirements before conducting a Google search.
      To achieve more precise content retrieval, my personal approach involves:
      1. After segmenting the document, each paragraph undergoes summarization using a Language Model (LLM). This summary is then embedded and treated as a comprehensive index for the original document. Together with the original document paragraphs, this summary index is stored in the vector database. During subsequent retrieval and matching processes, the focus is solely on this summarization index, streamlining the matching operation.
      2. For user input/questions, a summary is generated using the LLM, embedded, and then queried against the vector database to match the summary index of each document paragraph. After retrieving multiple documents, the results are then evaluated and selected by the LLM, ultimately producing the answer.
      The current Rag demos and new services, such as rerank provided by community, are all efforts focused on QUERYING. Perhaps we should explore more in terms of how to store, organize, and index documents.
      An additional hidden benefit of this approach is that, assuming our documents are in English, if a user inputs a Chinese question, direct embedding would inevitably fail to retrieve any content. Through summarization and subsequent embedding, we can translate the original input into English before processing.😄

  • @matteomarjanovic
    @matteomarjanovic 8 หลายเดือนก่อน +6

    Hi, thank you so much for that content! Do you think that parameters like document chunks size and overlapping are important for RAG accuracy? Should we fine-tune them in some way?

  • @alejandrovelez2083
    @alejandrovelez2083 8 หลายเดือนก่อน

    Great content man!!! I have learned so much from you

  • @timwarren4332
    @timwarren4332 8 หลายเดือนก่อน

    Great stuff! Thank you!

  • @TheOfficialWoover
    @TheOfficialWoover 8 หลายเดือนก่อน

    That's GREAT! thank you!

  • @jellederijke
    @jellederijke 8 หลายเดือนก่อน +3

    Top notch material, James. Much appreciated 🎉🎉 Really curious to see what kind of difference this makes in my projects. Thanks!

    • @anandteerthrparvatikar5359
      @anandteerthrparvatikar5359 8 หลายเดือนก่อน

      Totally

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน +4

      Glad it helps - may want to try retrieval + reranker system for improved name recall 😅

    • @anandteerthrparvatikar5359
      @anandteerthrparvatikar5359 8 หลายเดือนก่อน

      You are literally educating corporate people. Waiting for next session, Thanks for the efforts

    • @jellederijke
      @jellederijke 8 หลายเดือนก่อน

      ​@@jamesbriggshah really Sorry James 😂 I have just reranked the names in my RAG

  • @bonadio60
    @bonadio60 8 หลายเดือนก่อน

    Very good content!! I will definitely try it. Thanks

  • @andikunar7183
    @andikunar7183 8 หลายเดือนก่อน

    Great content, thanks a lot!

  • @Shaunmcdonogh-shaunsurfing
    @Shaunmcdonogh-shaunsurfing 8 หลายเดือนก่อน +2

    You mentioned some better approaches than reranking. Any hints as to what that might be (curious to know if it involves fine tuning the LLM with the data too)

  • @SanjeevKumar-hj1fb
    @SanjeevKumar-hj1fb 7 หลายเดือนก่อน

    Thanks for the great videos. How does embedding work on numeric fields? Shall we use embeddings for non text fields?

  • @tushaar9027
    @tushaar9027 8 หลายเดือนก่อน

    Hi James great video learned a lot, actually i was using multi query retriever in my approach and was seeing the slow inference speed because of overstuffing as you mentioned. Can you give more info on re ranking models, any free ones we can use in our projects.

  • @vinsentparamanantham5756
    @vinsentparamanantham5756 8 หลายเดือนก่อน

    Hi James, can you give as an example with openapi since we have compliance issues, we need to run against the locally hosted llama models and also locally hosted vector database.
    thank you

  • @pythontok4192
    @pythontok4192 8 หลายเดือนก่อน

    Thank you for this James. I found that when I return more things in the context, the LLM also tries to make up answers that are an amalgation of several sources' context. Any ways around this, from your experience?

  • @RedCloudServices
    @RedCloudServices 8 หลายเดือนก่อน

    James thank as always. I hope I am asking these questions with clarity. (1) You used a different encoder model ada 002 with Cohere LLM as the vector response model? (2) Huggingface have rankings for encoding models and rankings for LLMs but are there rankings for pairs of encoding:response LLMs pairs?

  • @ackiamm
    @ackiamm 8 หลายเดือนก่อน

    thanks sir

  • @harisjaved1379
    @harisjaved1379 6 หลายเดือนก่อน

    Have been doing this for few years now. Good video but you should cover bi-encoders vs cross encoder as this is one of the best reranking techniques and also talk a bit about FAISS.

  • @shaheerzaman620
    @shaheerzaman620 8 หลายเดือนก่อน

    awesome stuff!

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน

      thanks Shaheer :)

  • @timkoehler86
    @timkoehler86 5 หลายเดือนก่อน

    Great video!
    Btw what software are you actually using to show/explain the concept? I really like the look of it.

  • @ganj0rm0n
    @ganj0rm0n 8 หลายเดือนก่อน

    Awesome! I wonder if there is a way to use a re-ranker with low code tools like flowise.

  • @LoVeRSaMa
    @LoVeRSaMa 7 หลายเดือนก่อน +1

    Any chance you can show examples with OpenSource re ranking like: JinaAI-v2-base-en
    for example?

  • @Ishant875
    @Ishant875 หลายเดือนก่อน +1

    Can anyone explain this. Why we are using reranker to rank, is it not the work of retriever(to rank on basis of cosine similarty or something else, and return the relavant chunks)?

  • @frazuppi4897
    @frazuppi4897 8 หลายเดือนก่อน +10

    any benchmark? otherwise is kinda of very empirical and only seems like a sponsored video by Cohere

    • @matteomarjanovic
      @matteomarjanovic 8 หลายเดือนก่อน +1

      Good point. Do you know about any possibly useful metric/benchmark?

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน +2

      no sponsor from Cohere, I'm sharing what I do in production to make search better

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน +3

      there are many benchmarks comparing bi-encoders (embedding models) to crossencoders (rerankers), but I'm not aware of any for Cohere's model compared directly to ada-002. Nonetheless you can read here txt.cohere.com/rerank/ (it only shows comparison with elastic)

    • @frazuppi4897
      @frazuppi4897 8 หลายเดือนก่อน +1

      @@jamesbriggs thanks a lot for the reply, so in production you push things after an empirical evaluation? Would it be possible to have a link to some benchmarks? Thanks a lot again

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน +4

      @@frazuppi4897 yeah it's a lot of fast moving projects for me at the moment, so we do a lot of empirical assessments, for reranking you can see benchmarks for some of the best performing embedding + reranker models here huggingface.co/BAAI/bge-reranker-large#baai-embedding
      They unfortunately don't compare reranker to encoder directly beyond a few statements on rerankers being more accurate - they do explain in better language than I the reason for this though

  • @elrecreoadan878
    @elrecreoadan878 8 หลายเดือนก่อน

    Hi, what approach would you suggest for a hotel or restaurant customer service bot? Maybe botpress + plugin like vectora + chatgpt?

  • @GiridharReddy-hb5nv
    @GiridharReddy-hb5nv 8 หลายเดือนก่อน

    is there any open source way to do the reranking ?
    The content was great!

  • @RabeeQasem
    @RabeeQasem 8 หลายเดือนก่อน

    can you do a tut on how to use falcon to chat with you data and use diffrent data loaders ( txt,pdf,json)?
    love you content

  • @hughesadam87
    @hughesadam87 8 หลายเดือนก่อน

    Have you done any videos on ETL or suggestiosn for getting data into RAG systems?
    I'd really love to start with an open-source project that is more opinionated and ready-made for RAG than just langchain. LLMware looks promising. Do you have any suggestions? Some framework that would have opinionated, deployable RAG systems that solve hard problems like: auth, reranking, doc ingestion/scrubbing etc...
    Something I can just fire up in k8s and start fiddling w/ ? Does this exist to your knowledge?
    Thanks for the great video

  • @tiagoc9754
    @tiagoc9754 5 หลายเดือนก่อน

    How much the vector store affects the RAG responses accuracy?

  • @da-bb2up
    @da-bb2up 4 หลายเดือนก่อน

    Great video, James! :) What do you think is better if you compare optimizing stratagies? 1.finetuning the embedding model on your domain specific language,2 . use a hybrid search, which combines dense and sparse retrieval, 3.Reranking ? Or 4.Could you combine the three optimization strategies maybe? Thank you in advance for your answer. :) and another question: is reranking not pretty much the same as the hybrid search?(because it also uses also two search strategies but in a slitght different way - first it searches the data chunk candidates and than it searches out of candidates)

  • @nishanthk7048
    @nishanthk7048 8 หลายเดือนก่อน

    Hey can anyone answer my question,
    While reranking it calculates relevance score again, so while calculating the score does cohere inferences the LLM or uses a algorithm?

    • @bastabey2652
      @bastabey2652 หลายเดือนก่อน

      I believe Cohere ReRank doesn't use a separate LLM model.. it relies on its algorithm/model

  • @enkiube
    @enkiube 3 หลายเดือนก่อน

    Hey James, great great series on Retrieval Augmented Generation... One question, having looked at the notebook and the video, why don't we avoid vector embedding and have cohere's rerank to do the job for us? I did test the idea over a group of pdf documents and it seems like the performance was significantly better particularly considering that we pass the entire text altogether to cohere API instead of breaking them down into chunks.
    I understand there can be cost implications involved but considering the free cohere pricing isn't that a better approach? Afterall, any reranking you perform on top of results from pinecone is somewhat at the mercy of how well you retrieve the original vectors.
    Would appreciate your thoughts.

  • @matthewpublikum3114
    @matthewpublikum3114 8 หลายเดือนก่อน

    Are the reranking models specifically trained for the task, or are they decoder or encoder portion of an LLM?

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน

      yes they're fine-tuned specifically for calculating similarity scores - you take a pretrained transformer model, add 1-2 linear layers on to the top of the output logits of the model, and fine-tune on a dataset that would contain records like [sentence A, sentence B, similarity score]

  • @victorhenriquecollasanta4740
    @victorhenriquecollasanta4740 5 หลายเดือนก่อน

    amazing! can you make more enterprenerial videos, maybe on how to apply this knlowdge to build a business

  • @sanchaythalnerkar9736
    @sanchaythalnerkar9736 8 หลายเดือนก่อน +2

    Can we use llama index to improve the efficiency?

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน +5

      llama-index have a lot of great retrieval tooling - I haven't been able to dive too deeply into it yet but from what I've seen they (1) do support this type of retrieval (ie with reranking), and (2) can likely improve accuracy, but I don't think you can get much faster than what we do here

    • @sanchaythalnerkar9736
      @sanchaythalnerkar9736 8 หลายเดือนก่อน

      Why is the embedding taking so long?@@jamesbriggs

  • @edvinbeqari7551
    @edvinbeqari7551 8 หลายเดือนก่อน

    I didn't understand how you get a similarity score from one transformer. Whats the hint?

  • @dato007
    @dato007 7 หลายเดือนก่อน

    I don’t understand how re-ranking is adding anything. you’re giving it the same query again and they’ve already been matched with a vector similarity what additional information is using your improve the ranking? Thx!

  • @GeigenAkademie
    @GeigenAkademie 5 หลายเดือนก่อน

    Nowadays the context length cab be ~32k - why reranking, if I could put all possible matches to the final answering step/llm? Therefore the answering itself does a kind of reranking of the contexts

  • @torstenkarstadt9785
    @torstenkarstadt9785 7 หลายเดือนก่อน

    Does Canopy support this rerankng approach?

    • @jamesbriggs
      @jamesbriggs  7 หลายเดือนก่อน

      I know it's on the roadmap, but it's not in there yet

  • @frazuppi4897
    @frazuppi4897 8 หลายเดือนก่อน

    well the reranker will prob use a [CLS] token so still one vector so I don't get why you say that in the normal embedding we loose info but in the reranker no - weird. If you are sending the two documents each token will able to attend to the other, this could means the info is more accurate

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน +2

      you can read about bi-encoders (embedding models) and crossencoders (reranker), there is information compression with the bi-encoder approach as you are encoding generic embeddings, with the crossencoder you are feeding the query and original text, the transformer must then decide, on that specific query, how relevant the document is

    • @frazuppi4897
      @frazuppi4897 8 หลายเดือนก่อน

      @@jamesbriggs copy that - thanks a lot!

  • @hetnon
    @hetnon 2 หลายเดือนก่อน

    I don't get the part that you feed both documents in the same transformer. If your transformer output is only 1 array, what are you comparing to? You have only 1 array to compare to... nothing? What did I miss?

  • @bastabey2652
    @bastabey2652 หลายเดือนก่อน

    how does the ReRanker know it needs to return 3 documents with relevant information to the user's query?

    • @jamesbriggs
      @jamesbriggs  หลายเดือนก่อน

      we set the `top_n` parameter to `3`, logically the reranking scores every document, then we take the top 3 scoring docs

  • @thedoctor5478
    @thedoctor5478 8 หลายเดือนก่อน

    I believe we have better than openai embeddings now. The leaderboard says so anyway. Also, backoff library is better for retries.

    • @heywrandom8924
      @heywrandom8924 8 หลายเดือนก่อน

      I did not watch the video but I am interested in knowing what leaderhoard you are referring to

    • @thedoctor5478
      @thedoctor5478 8 หลายเดือนก่อน

      on huggingface /spaces/mteb/leaderboard@@heywrandom8924

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน

      @@heywrandom8924 probably this one huggingface.co/spaces/mteb/leaderboard - I'll be talking about other embedding models in upcoming video, but yes it's true, ada-002 is far from best performing

    • @heywrandom8924
      @heywrandom8924 8 หลายเดือนก่อน

      ​@@jamesbriggsthank you (:.
      I didn't watch that specific video as I am not sure what the keywords in the title mean and I am not sure it's relevant to me. I just checked it out a bit and the video looks cool (:.

  • @nicholasliu-sontag1585
    @nicholasliu-sontag1585 8 หลายเดือนก่อน

    You describe the re-ranker transformer as more accurate because it doesn't encode the documents into vectors - but don't all transformers work off a vectors to begin with? Isn't it still working with the same vectors that are used to calculate similarity score?

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน

      I probably could have phrased better, there are two parts:
      1. Embedding models encode the full sequence into a single vector, transformers work with vectors but they contain a single vector for each token - but these will be compressed through a single layer before producing the similarity score, so there is still compression into a single vector happening, but...
      2. Reranker models have the full context, ie they see both the query and the document that they must compute similarity for. An embedding model must produce a single vector embedding for every possible query

    • @nicholasliu-sontag1585
      @nicholasliu-sontag1585 8 หลายเดือนก่อน

      makes sense. thanks for explaining!@@jamesbriggs

    • @masssurfski
      @masssurfski 6 หลายเดือนก่อน

      My first impression was that the need to rerank means that the rank was to optimal to begin with. Your explanation above helped me better understand this. Ultimately this capability should be integrated and not require a different tool.

  • @da-bb2up
    @da-bb2up 4 หลายเดือนก่อน

    th-cam.com/video/Uh9bYiVrW_s/w-d-xo.html isn't the similarity score also calculated at this point at the end with cosinesimililarity just like without retriever or how is the similarity score exactly calculated?

  • @HazemAzim
    @HazemAzim 8 หลายเดือนก่อน

    Great . Any other open source reranking transformer on Hugging Face ? other than cohere which is closed source ?

    • @haristan1960
      @haristan1960 8 หลายเดือนก่อน

      Sentence transformer has cross encoder models in hugging face you can try them but there quite old

    • @HazemAzim
      @HazemAzim 8 หลายเดือนก่อน

      @@haristan1960 Yes true .. I found many cross encoders on SBERT and hugging face . Thanks

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน

      check out bpe-reranker huggingface.co/BAAI/bge-reranker-large/tree/main

  • @Data_scientist_t3rmi
    @Data_scientist_t3rmi 8 หลายเดือนก่อน

    do you have any videos about scalability ? i mean for 1000 pdfs it could be good thing but for 100000 documents, the time to pre-process is diffcult. thanks again for the video you were the most person that introduce me to Transformers

  • @bastabey2652
    @bastabey2652 หลายเดือนก่อน

    the moment showing how LLM reads the scraped concatenated text is impressive
    th-cam.com/video/Uh9bYiVrW_s/w-d-xo.html

    • @jamesbriggs
      @jamesbriggs  หลายเดือนก่อน

      it's pretty wild

  • @michaeldausmann6066
    @michaeldausmann6066 5 หลายเดือนก่อน +1

    cool, ok. but... how does it work? what does it do? you just give it a query and it reranks for you.......wtf what is the magic sauce I want to understand the technique.

    • @nirbhaykumar4906
      @nirbhaykumar4906 หลายเดือนก่อน

      Yes some more detail into working of reranker would be useful.

  • @vanerk_
    @vanerk_ 6 หลายเดือนก่อน

    you have explained a high level idea of reranker whereas explanation of reranker achitecture was expected, dislike.

    • @jamesbriggs
      @jamesbriggs  6 หลายเดือนก่อน

      see here th-cam.com/video/WS1uVMGhlWQ/w-d-xo.html

  • @narutocole
    @narutocole 8 หลายเดือนก่อน +3

    Do you have any thoughts or recommendations for Opensource re-rankers? I've used 'cross-encoder/mmarco-mMiniLMv2-L12-H384-v1' for re-ranking. But I'm curious as to if anyone has using some of the recent LLMs and modifying them to work for Re-ranking similar to how SGPT modified EleutherAI/gpt-neo-125M

    • @jamesbriggs
      @jamesbriggs  8 หลายเดือนก่อน

      Hey Jordan! I haven't tested the open-source cross-encoders/rerankers for a long time - so I'm not sure - they generally get less attention than the encoder models but I'm sure there must be some good rerankers out there

    • @lachlanholland7157
      @lachlanholland7157 8 หลายเดือนก่อน

      I’m looking into using bge reranker large, however haven’t gotten it to work yet.