Diffbot
Diffbot
  • 50
  • 124 670
Trying to make LLMs less stubborn in RAG (DSPy optimizer tested with knowledge graphs)
RAG (retrieval-augmented generation) has been recognized as a method to reduce hallucinations in LLMs, but is it really as reliable as many of us think it is?
The timely research "How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior" resonated with our struggles when LLMs don't always follow external knowledge in RAG systems, even when ground truth (from a knowledge graph) is provided.
One interesting takeaway is that, like humans, different language models have varying degrees of "stubbornness." Some models are more likely to fall back on their internal knowledge when external information contrasts with what they have been trained on. Essentially, these models struggle to adjust to new values or patterns that differ significantly from their existing "belief system."
We also found that a knowledge graph-based method, the entity linker, can enhance the correctness of answers by filtering out erroneous information that doesn't match the ground truth in knowledge graphs. Check out how the entity linker solved the funny hallucination where "PayPal" was recognized as a co-founder of SpaceX in the video at 4:21, with RAG and an entity linker outperforming the one without it.
0:00 hallucination seen in our RAG system
0:36 how faithful are RAG models? (research)
2:02 recap of DSPy
2:27 entity linker in knowledge graph
3:56 integrating entity linker in LLM pipeline
4:21 different LLM-based outcomes with entity linker
5:10 recap of DSPy w/ KG pipeline
5:41 setting metrics for DSPy optimizer
7:00 test effecitveness of optimzed DSPy pipeline (gpt4)
8:04 testing the other 2 optmized programs
9:25 back to tweaking prompts
10:49 some thoughts on DSPy
Get your free Diffbot token at: app.diffbot.com/get-started
Sample Entiy Linker:github.com/leannchen86/entity-linker-diffbot-nl-api/blob/main/entity_linker_dspy_rag.ipynb
Quickstart guide with Diffbot’s Natural Langage API: github.com/leannchen86/Diffbot-natural-language-api-demo/blob/main/diffbot_natural_language_api.ipynb
DiffbotGraphTransformer plugin on LangChain: python.langchain.com/v0.1/docs/integrations/graphs/diffbot/
code for this video: TBD
มุมมอง: 1 296

วีดีโอ

Things you should check before using Llama3 with DSPy.
มุมมอง 3.1K28 วันที่ผ่านมา
No, comparing individually the performance of different language models and embedding models is not enough. To further investigate the hallucination issues we saw in our DSPy RAG pipeline in our last video, we tested pairing Llama3: 70B with both nomic embedding (local and open-source embedding model) and ada-002 (one of OpenAI's embeddings), while using gpt3.5 ada-002 as the baseline for our c...
DSPy with Knowledge Graphs Tested (non-canned examples)
มุมมอง 6Kหลายเดือนก่อน
The DSPy (Declarative Self-improving Language Programs in Python) framework has excited the developer community with its ability to automatically optimize and enhance language model pipelines, which may reduce the need to manually fine-tune prompt templates. We designed a custom DSPy pipeline integrating with knowledge graphs. The reason? One of the main strengths of knowledge graphs is their a...
Diffbot is making ____ intelligence possible.
มุมมอง 376หลายเดือนก่อน
What's beyond just artificial intelligence? Hint:The answer is at the very end of the video.
Is Tree-based RAG Struggling? Not with Knowledge Graphs!
มุมมอง 39K2 หลายเดือนก่อน
Long-Context models such as Google Gemini Pro 1.5 or Large World Model are probably changing the way we think about RAG (retrieval-augmented generation). Some are starting to explore the potential application of “Long-Context RAG”. One example is RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval), by clustering and summarizing documents, this method lets language models gras...
Building less wrong RAG with Corrective RAG?
มุมมอง 2.6K2 หลายเดือนก่อน
Building a basic retrieval-augmented generation (RAG) system is becoming easier, but the harder part often comes from having it work correctly. For example, if wrong information is being selected early on in the retrieval process, it's obvious that the quality of generated answer is going to be bad. To address this issue, Corrective RAG is being explored to more carefully evaluate the quality o...
Extract 5 Lists in 2 Minutes
มุมมอง 1.4K2 ปีที่แล้ว
Our biggest update to Diffbot Extract EVER - Extract any type of list on any website into JSON or CSV with no rules or scripts. Diffbot Extract reads websites like a human so you don't have to. Stop scraping, start extracting. List API Documentation: docs.diffbot.com/docs/en/api-list MORE ABOUT DIFFBOT Access a trillion connected facts across the web, or extract them on demand with Diffbot - th...
Diffbot's Knowledge Graph In Three Minutes
มุมมอง 2.3K2 ปีที่แล้ว
The world's largest Knowledge Graph contains billions of organizations, articles, and people. But where do you get started? Here's our quick start video meant to be consumed alongside our Knowledge Graph Get Started Guide at: docs.diffbot.com/docs/en/dql-quickstart
Building a Better Quality Internet with Factmata
มุมมอง 2692 ปีที่แล้ว
Factmata helps monitor internet content and analyse its risks and threats. Their technology can automatically extract relevant claims, arguments and opinions, and identify threatening, growing narratives about any issue, brand, or product. Their tools save time for online media analysts, finding new opportunities, risks and threats. MORE ABOUT DIFFBOT Access a trillion connected facts across th...
10 New Market Intelligence Queries From Diffbot's Knowledge Graph [Webinar]
มุมมอง 3522 ปีที่แล้ว
In this weekly webinar we look at 10 new(ish) and innovative ways to use the world's largest Knowledge Graph to explore linked data on people, organizations, articles, and more. Diffbot's Knowledge Graph takes entities, facts, and relationships extracted from the public web and structures them into a queryable database.
Eight Ways Web-Reading Bots Revolutionize Market Intelligence [Webinar]
มุมมอง 2452 ปีที่แล้ว
Eight Ways Web-Reading Bots Revolutionize Market Intelligence [Webinar]
Best Practices: Using External Data To Enrich Internal Databases [Webinar]
มุมมอง 2242 ปีที่แล้ว
Data decays at an average of 30% a year, and dated or incorrect data can be more harmful than not having data coverage at all. In this webinar we explain the basics of data enrichment as well as work through hands-on ways in which you can use the world's largest Knowledge Graph to pull in millions of facts related to organizations you care about. Resources: Bulk Enhance API Google Collab Walkth...
Diffbot For Demand and Lead Generation [Webinar]
มุมมอง 3102 ปีที่แล้ว
The public internet is chock full of useful information for demand and lead generation. But manual fact accumulation just isn't scalable. See how Diffbot's products including the world's largest Knowledge Graph, our Natural Language API, and our Automatic Extraction APIs can structure public web data at scale for demand and lead generation in this 30 minute webinar!
[Webinar] Informal Dashboard Building With Diffbot's Excel and Google Sheets Integrations
มุมมอง 1302 ปีที่แล้ว
[Webinar] Informal Dashboard Building With Diffbot's Excel and Google Sheets Integrations
[Webinar] Knowledge Graph Techniques For Global News Monitoring
มุมมอง 3322 ปีที่แล้ว
[Webinar] Knowledge Graph Techniques For Global News Monitoring
[Webinar] Competitor, Vendor, And Customer Data From Across The Web With Diffbot's Knowledge Graph
มุมมอง 1512 ปีที่แล้ว
[Webinar] Competitor, Vendor, And Customer Data From Across The Web With Diffbot's Knowledge Graph
Diffbot The Web-Reading Robot: Explainer Video
มุมมอง 4722 ปีที่แล้ว
Diffbot The Web-Reading Robot: Explainer Video
What's Rule-Less Web Scraping and How Is it Different Than Rule-Based Web Data Extraction? [Webinar]
มุมมอง 2753 ปีที่แล้ว
What's Rule-Less Web Scraping and How Is it Different Than Rule-Based Web Data Extraction? [Webinar]
Knowledge Graph Basics: Data Enrichment
มุมมอง 5213 ปีที่แล้ว
Knowledge Graph Basics: Data Enrichment
Crawlbot Basics - Choosing The Right Web Data Extraction API For Crawling
มุมมอง 3343 ปีที่แล้ว
Crawlbot Basics - Choosing The Right Web Data Extraction API For Crawling
The Ultimate Guide To Natural Language API Products
มุมมอง 2.7K3 ปีที่แล้ว
The Ultimate Guide To Natural Language API Products
Knowledge Graph Basics: Data Provenance
มุมมอง 1.8K3 ปีที่แล้ว
Knowledge Graph Basics: Data Provenance
NLP Fundamentals: Entities, Sentiment, Facts
มุมมอง 5833 ปีที่แล้ว
NLP Fundamentals: Entities, Sentiment, Facts
Knowledge Graph Basics: Faceting
มุมมอง 4033 ปีที่แล้ว
Knowledge Graph Basics: Faceting
Knowledge Graph Basics - Searching For Orgs Or Articles
มุมมอง 3893 ปีที่แล้ว
Knowledge Graph Basics - Searching For Orgs Or Articles
Knowledge Graph Basics: Entity Types
มุมมอง 1.2K3 ปีที่แล้ว
Knowledge Graph Basics: Entity Types
How to Track Market Indicators Using Knowledge Graph News Monitoring Scheduling
มุมมอง 6123 ปีที่แล้ว
How to Track Market Indicators Using Knowledge Graph News Monitoring Scheduling
Advanced Crawlbot Tutorial - Crawling Web Pages Behind Logins
มุมมอง 9633 ปีที่แล้ว
Advanced Crawlbot Tutorial - Crawling Web Pages Behind Logins
Diffbot Crawlbot Web Crawler Tutorial (2021) - Scrape Ecommerce Pages Quickly
มุมมอง 2.9K3 ปีที่แล้ว
Diffbot Crawlbot Web Crawler Tutorial (2021) - Scrape Ecommerce Pages Quickly
Automate SAAS ABM (Account Based Marketing) List Building in 5 Minutes
มุมมอง 2153 ปีที่แล้ว
Automate SAAS ABM (Account Based Marketing) List Building in 5 Minutes

ความคิดเห็น

  • @NicolasEmbleton
    @NicolasEmbleton วันที่ผ่านมา

    Haha. Brilliant. Love the behind-the-scene piece at the end. Very instructive. Thanks 🙏🏻

  • @mbrochh82
    @mbrochh82 4 วันที่ผ่านมา

    Here's a ChatGPT summary: - Retrieval bandage generation Bragg is effective in reducing hallucinations in large language models (LLMs). - Despite providing correct context and ground truth, LLMs often do not incorporate external knowledge correctly. - Viewer comment suggests LLMs may prioritize their internal knowledge over external information. - Research compares GPT-4, GPT-3.5, and MST-7B in balancing external information with internal knowledge. - GPT-4 is the most reliable model when using external information, followed by GPT-3.5 and MST-7B. - All models tend to stick to their internal knowledge if they believe external knowledge is less correct. - Rax can enhance accuracy, but its effectiveness depends on the model's confidence and prompting technique. - Different combinations of language models and embedding models can lead to varied results. - The study highlights the influence of different prompting techniques on how LLMs follow external knowledge. - SPI framework for auto-tuning prompts can improve LLMs' adherence to external knowledge. - SPI uses bootstrapping to create and refine examples, improving prompts based on specific metrics. - Entity linking can prevent incorrect answers by mapping and identifying words in text to entities in a knowledge graph. - Default Knowledge Graph is used for validation due to its extensive network of verified information sources. - Entity linker helps filter made-up information when LLMs hallucinate. - DSP RAC pipeline updated with entity type validity check improves output accuracy. - Custom DSP RAC pipeline integrates knowledge graph data to refine questions and retrieve relevant information. - Two metrics for DSP optimizer: entity type check and alignment with knowledge graph context. - Knowledge graph context ensures final answers align with ground truth. - Enhanced output incorporates specific passages and relationships from the knowledge graph. - Example shows knowledge graph confirming Elon Musk as the sole founder of SpaceX. - Optimized program sometimes fails to make LLMs stick to external knowledge. - Manual prompt tweaking may be necessary to ensure LLMs follow external knowledge strictly. - DSPive framework has a steep learning curve but can yield better results for experienced programmers. - Main message: Integrating knowledge graphs and entity linking with LLMs can improve accuracy, but manual prompt customization may still be necessary to ensure adherence to external knowledge.

  • @TheRealAfroRick
    @TheRealAfroRick 4 วันที่ผ่านมา

    This is the way...

  • @ScottzPlaylists
    @ScottzPlaylists 4 วันที่ผ่านมา

    I haven't seen any good examples of the Self-improving part of DSPy yet. Is it ready for mainstream use❓

  • @pedromoya9127
    @pedromoya9127 4 วันที่ผ่านมา

    thanks great video!

  • @victoriamartindelcampo7827
    @victoriamartindelcampo7827 4 วันที่ผ่านมา

    girl, u rock thank you so much for this!! Where can I follow you?

  • @JeffreyWang-hh4ss
    @JeffreyWang-hh4ss 5 วันที่ผ่านมา

    Love this kind of RAG comparison, would be better if the background looks less like a spa room.😅

    • @diffbot4864
      @diffbot4864 5 วันที่ผ่านมา

      Leann here, I literally filmed it in my room. Which parts in the video do they suggest spa room features?

    • @JeffreyWang-hh4ss
      @JeffreyWang-hh4ss 4 วันที่ผ่านมา

      @@diffbot4864 oops, didnt think you would reply personally… maybe just the nice bed… very different from all the other tech influencers, haha, keep up the good work Leann

    • @diffbot4864
      @diffbot4864 4 วันที่ผ่านมา

      @@JeffreyWang-hh4ss It's Leann again :) Well, NeetCode also films a lot in his room: www.youtube.com/@NeetCode Currently, my room is the only place where I can get the best voice quality. The most important thing I hope is that the content itself delivers value. Thank you for the feedback!

  • @wadejohnson4542
    @wadejohnson4542 5 วันที่ผ่านมา

    Until I saw this, I was starting to think that there was something wrong with me not being able to achieve magical improvements in results by using DSPy over meticulously hand-crafted prompts targeted at the observed quirkiness of specific LLMs. Thank you for restoring my self confidence. And now I'm also going to incorporate graph databases into my RAG pipelines after watching a couple of your videos.

  • @pedromoya9127
    @pedromoya9127 5 วันที่ผ่านมา

    great video! thanks

  • @plattenschieber
    @plattenschieber 9 วันที่ผ่านมา

    Hey @lckgllm, could you also upload the missing `dspy-requirements-2.txt` in the repo? 🤗

  • @PoGGiE06
    @PoGGiE06 10 วันที่ผ่านมา

    Very interesting, thanks! But Musk wasn't a co-founder of Tesla either.

  • @jonyfrany1319
    @jonyfrany1319 13 วันที่ผ่านมา

    Thanks 🙏🏽

  • @codelinx
    @codelinx 14 วันที่ผ่านมา

    Great info and content.

  • @googleyoutubechannel8554
    @googleyoutubechannel8554 14 วันที่ผ่านมา

    Impressive a 996 worker can find time to put this together, keep it up! Ah yeah, RAG doesn't work, in fundamental ways... it can't.

  • @ronifintech9434
    @ronifintech9434 15 วันที่ผ่านมา

    Love it! Finally Neo4j has good usage!

  • @kefanyou9928
    @kefanyou9928 16 วันที่ผ่านมา

    Great video~ Very interested in KG' adaption in LLM. Kindly reminder: hide your api key in the video😉

  • @paneeryo
    @paneeryo 16 วันที่ผ่านมา

    Music is too annoying. Please dial it down

    • @diffbot4864
      @diffbot4864 16 วันที่ผ่านมา

      Will be more aware of the volume! Thanks for the feedback

  • @MrKrtek00
    @MrKrtek00 25 วันที่ผ่านมา

    It is so funny how tech people do not understand why ChatGPT was a hit: exactly because you can use it without programming it.

  • @RoshanKumar-hk5ij
    @RoshanKumar-hk5ij 26 วันที่ผ่านมา

    How good are llms at generating cypher queries?

  • @nju415
    @nju415 27 วันที่ผ่านมา

    how can peraon be both smart and pretty

  • @marc-io
    @marc-io 28 วันที่ผ่านมา

    Is this parody? Considering the title… and the result

    • @diffbot4864
      @diffbot4864 28 วันที่ผ่านมา

      Just go check the code 😉

  • @cchance
    @cchance 28 วันที่ผ่านมา

    I’m sorry are the headers wrong the gpt and llama headers swapped on the nomic tests in your paper so either the header is wrong or what you said we’re all opposite

    • @diffbot4864
      @diffbot4864 28 วันที่ผ่านมา

      I guess you're referring to 3:07. In that scene, please ignore the header text. You see that is because I opened the same notebook but placed it side by side and I'm sorry that confused you.

  • @shbfy
    @shbfy 28 วันที่ผ่านมา

    Hey I’d love to understand how you can improve agents at using tools. Thanks 🙏!

    • @diffbot4864
      @diffbot4864 27 วันที่ผ่านมา

      Can you say more about which type of agents and the problems you’re trying to solve?

    • @shbfy
      @shbfy 27 วันที่ผ่านมา

      @@diffbot4864 Sure, I’m trying to build two agents. One which navigates websites using the Playwright browser tool and the other which uses shell commands from the shell tool. I’m using Langchain to build the agent and a react flow through the ‘structured_chat’ function. The agent needs to be able to correctly call the commands available in each tool and I find that sometimes it doesn’t do this correctly (the agents gets carried away and sends invalid commands). Any tips / tricks / knowledge on how this can be improved would be greatly appreciated. At this stage I’m thinking of fine tuning an open source model. Thank you for considering!

  • @real-ethan
    @real-ethan 28 วันที่ผ่านมา

    "Don't look at me" is an instruction no language model could ever follow.

    • @diffbot4864
      @diffbot4864 28 วันที่ผ่านมา

      That’s a good one! 🤣

  • @vishalverma-wx7eo
    @vishalverma-wx7eo 28 วันที่ผ่านมา

    Lovely 🌹

  • @BackTiVi
    @BackTiVi 28 วันที่ผ่านมา

    Very interesting, it seems we're not too far from getting a robust retrieval. Would explicitly asking in the system prompt to prioritize the given context over any prior knowledge change this behavior?

  • @HomeEngineer-wm5fg
    @HomeEngineer-wm5fg 29 วันที่ผ่านมา

    You got a topic I exactly was looking into.. Now subscribed....I will follow on X.

    • @diffbot4864
      @diffbot4864 28 วันที่ผ่านมา

      That’s very kind of you! Thanks! May I know if you’re more interested in knowledge graphs or DSPy with knowledge graphs? Would appreciate your feedback 😊

    • @HomeEngineer-wm5fg
      @HomeEngineer-wm5fg 28 วันที่ผ่านมา

      @@diffbot4864 I'm in industry. A middle weight engineer trying to early adapt machine learning in a production environment. I see the same thing you are, but you are well ahead of me. Use case for integrating AI with BI. RAG is the natural progression and KG is a latecomer in my industry. Digital Thread things....

  • @user-mo7wm8ny5e
    @user-mo7wm8ny5e หลายเดือนก่อน

    Thanks for the video! I tried this but with convert_to_graph_documents I kept getting empty graphs. While debugging I found an error message at an intermediate step, it said "Your plan doesn't have this feature", apparently it won't work with the free version of Diffbot? $299 per month would be a bit much for a small student project... Do you have any other advice for getting a solid Neo4J graph from text data? Crucially, I'd like metadata to be included with the relations, which didn't happen with the LLM Graph Transformer

    • @diffbot4864
      @diffbot4864 หลายเดือนก่อน

      Thanks for the question and this sounds like an interesting problem! The Diffbot knowledge graph and APIs have become free! So, it shouldn’t cause the problem. Is there a good way to reach you or if you’re open to a vid chat, you can shoot me an email at: leannchen86@gmail.com

    • @user-mo7wm8ny5e
      @user-mo7wm8ny5e 29 วันที่ผ่านมา

      @@diffbot4864 Thanks so much, I'll email you!

  • @fromjavatohaskell909
    @fromjavatohaskell909 หลายเดือนก่อน

    10:38 Hypotheisis: what if providing additional data from KG does not override knowledge ("false or hallucinated facts") already inherently present in LLM. I wonder what would happen if you change labels of knowledge graph to same abstract names like Person1, Person2, Company1, Company2 etc. and try to run the exact same program. Would it dramatically change result?

    • @nk8143
      @nk8143 5 วันที่ผ่านมา

      I agree on that. Because misconception "everyone knows that Elon co founded every company" was most likely present in training data.

  • @jantuitman
    @jantuitman หลายเดือนก่อน

    Your question number 87 in your Jupyter notebook has the casing of Joe Hisaishi ‘s name differently than the documents, it is all lower case in the question. So it will be tokenized differently. Could this have influenced the quality of the answer perhaps?

    • @diffbot4864
      @diffbot4864 29 วันที่ผ่านมา

      Good question. But that normally doesn’t make a difference, as language models don’t read text like we as humans do - words are transitioned into tokens. The easiest way to test is using ChatGPT, either misspell a famous person’s name or use all lower cases, and it would still return the correct person.

    • @jantuitman
      @jantuitman 29 วันที่ผ่านมา

      @@diffbot4864 yes, i am familiar with the workings of a transformer. I think the casing will produce a different token, so you are at the mercy of the training data size / transformer size / context size to have a capitalized word return the same response. In general my experience is also that it does not make a difference, but you were using here a short context and a small model. That is why i thought it might be an explanation.

  • @kayhal
    @kayhal หลายเดือนก่อน

    Do you think the same depth and accuracy could be achieved with a metadata-filtered vector search followed by a reranker? I worry about having to maintain two data representations that are prone to drift

  • @bioshazard
    @bioshazard หลายเดือนก่อน

    Which language model did you use? Llama 3 and Sonnet might offer improvements to recall over RAG context.

    • @diffbot4864
      @diffbot4864 หลายเดือนก่อน

      Next video is about testing llama3. coming out soon 😉

  • @danilotavares77
    @danilotavares77 หลายเดือนก่อน

    You are the best! Congrats!

    • @diffbot4864
      @diffbot4864 28 วันที่ผ่านมา

      Thanks for the kind words! 🫶

  • @diffbot4864
    @diffbot4864 หลายเดือนก่อน

    ANNOUNCEMENT: Diffbot's APIs and the Diffbot Knowledge Graph have become FREE! You can freely access them by signing up at: app.diffbot.com No more free trials because it's free 🤩

  • @stanTrX
    @stanTrX หลายเดือนก่อน

    I now see why i dont get good results with rag s.

  • @aldotanca9430
    @aldotanca9430 หลายเดือนก่อน

    Interesting, but professional accounts only. I guess I will stick with Neo4j.

    • @diffbot4864
      @diffbot4864 หลายเดือนก่อน

      Friendly reminder: Neo4j is a graph database, so it can’t construct knowledge graphs for you. You’ll still need NLP techniques to extract entities and relationships first for knowledge graph data and later store in a graph database like Neo4j.

    • @aldotanca9430
      @aldotanca9430 หลายเดือนก่อน

      Very true, I guess non profit and passion projects will require to deal with that in other ways.

    • @diffbot4864
      @diffbot4864 หลายเดือนก่อน

      @@aldotanca9430 Actually all Diffbot's APIs become FREE this week, including Diffbot Natural Language API (being used in this video), which can help extract entities/relationships, entity resolution, etc. You can freely get access at: app.diffbot.com

    • @aldotanca9430
      @aldotanca9430 หลายเดือนก่อน

      @@diffbot4864 Thanks! But I think I would need to sign up with a company email? The system will not accept a personal email, github account or similar. It seems set to only allow a "work email". Of course, many people will have a work email and could use that even if their project is unconnected with their job, and there are probably 'creative' ways to get around that. I was just pointing out that diffbot accounts seem to be meant for business users only.

  • @NadaaTaiyab
    @NadaaTaiyab หลายเดือนก่อน

    Great video! I need to try this.

  • @daniel_tenner
    @daniel_tenner หลายเดือนก่อน

    Definitely learned a lot, and now keen to see some kind of tutorial on how to implement the knowledge graph backed RAG. I’m a Ruby dev so something language agnostic would be particularly helpful.

    • @diffbot4864
      @diffbot4864 หลายเดือนก่อน

      Thanks for the feedback! Sure, we will keep this in mind and make the content more relatable and approachable. Appreciate you contributing your thoughts here 😊

    • @10Rmorais
      @10Rmorais 15 วันที่ผ่านมา

      Agreed. Appreciate the vid though!

  • @CrypticPulsar
    @CrypticPulsar หลายเดือนก่อน

    This is amazing!! Thank you! I already felt much pain with managing the token size in early days of LangChain.. this was truly eye-opening!

    • @diffbot4864
      @diffbot4864 หลายเดือนก่อน

      Thanks for the kind words! 😊

  • @vbywrde
    @vbywrde หลายเดือนก่อน

    As I'm reviewing this video again, I notice that at 3:54 you show the Prompt for assessing relevance. I've tried this kind of prompting before and I always find it hit or miss. It depends a lot on the model and how well it can assess the information and assign the correct value. But if we look at the prompt, ultimately we're asking it to check to see if key words from the original question show up in the results. If not, we give it a "yes" or "no", but then ask it to bundle the binary score into JSON with "score" as the key. I have two thoughts on this. One, we're asking it to follow a lot of instructions here. For models like GPT4, it probably won't be a problem, but for other models, well, some yes, some no. There's a reasonable chance that models not trained on JSON may foul that up. Even some that have been trained on JSON can foul it up, sometimes (especially if the temperature you set is too high). The other issue is that where it says "binary" and then "yes" or "no", some models may get confused by what is being asked for here. Binary can also mean that it will return the result 1 (yes), or 0 (no). Because that is what binary suggests. So you may wind up with some hit-or-miss results because of that. Another way to handle that is to simply as for a boolean answer, True or False. And don't ask it to assign it to JSON. Just have it return the result as a boolean value. Then you can pick it up, most likely, as a boolean value in the answer. But more importantly. Why ask the model to do this at all? This is just the kind of spot that causes hiccups. I'm thinking that with this kind of operation, wouldn't it be safer to simply interrogate the each page content for the key words using python search through the text? You could do something like this: # Define the keywords to search for keywords = ["Computer", "Document", "Batman"] # Define the text to search through page_content = "This page contains information about computers and documents. Batman is a popular superhero." # Search for the keywords in the page_content found_keywords = [] for keyword in keywords: if keyword.lower() in page_content.lower(): found_keywords.append(keyword) # Print the found keywords if found_keywords: print("The following keywords were found in the page content:") for keyword in found_keywords: KeyWordCount +=1 While I admit, this approach is not as cool as using the LLM... it would definitely be more reliable. You could then use the keywordcount as a way of scoring the document, and include it if the keywords meet a certain threshold. In other words, I think it's important to factor in that LLMs make mistakes (like all the time), and so when we design programs to use LLMs we want to carefully select when we're going to use the LLM, and choose not to use it whenever possible. So if we can get a result without using the LLM, not only are we more assured of accurate results, but it will also cost far less to run the program. The fewer calls to the LLMs the better. Anyway, just some thoughts. Not sure if that actually makes sense, but ... there you have it. I hope it does. As a general rule, though, I think the point I'm making here is pretty solid. Use the LLMs as sparingly as possible, and only for those things that you can't achieve without the LLM. Yah?

  • @nas8318
    @nas8318 หลายเดือนก่อน

    Long context <=> RAG Pen Pineapple Apple Pen

    • @diffbot4864
      @diffbot4864 หลายเดือนก่อน

      You cited it better than I did 😂

  • @amarnamarpan
    @amarnamarpan หลายเดือนก่อน

    The most valuable contents on youtube are the least liked.. Often..

    • @lckgllm
      @lckgllm หลายเดือนก่อน

      Very flattered! Our main purpose for content creation is providing values to the audience, especially educational value. So while likes/view counts definitely are seen as part of the metrics to assess content quality, whether the viewers learn something new or enjoy the video is the more important focus for us. Thanks for the encouragement!

  • @vbywrde
    @vbywrde หลายเดือนก่อน

    Another great video that brings up so many vital questions. I wish we had more time to dig into the details of what you just presented, and try to find solutions to the bottlenecks. Overall, I think the bottom line problem is that we are looking for ways to use LLMs to do what they are not actually good at because of the nature of their construct. LLMs are stochastic by nature. Therefore, because they deal in probabilities, you are not likely to get a 100% correct answer. So if you use RAG, the LLM that is doing the inferencing may wind up with invalid, or less probable responses. You can set the temperature to 0, and get the most probable based on the data, but as you point out, sometimes the data is the problem, and so you want the LLM to reason... which means you want a higher temperature, even at the risk of getting lower probability responses. It's a problem of having the fox guarding the hen house. You can't quite trust the LLM to validate its own assumptions. Of course, you could bring in more LLMs into the task and have them validate each other, but that introduces the probability that one or more of them get it wrong, throwing the effort into a tailspin. And if you go to the IntArWebZ to get information, well, as you pointed out, there's a LOT of junk out there. Totally unreliable. So where are your reliable data sources? And how can we trust LLMs to evaluate them? We've entered into a strange new world with LLMs. They seem to be great at processing language, but they're really kind of awful at finding facts, and doing reasoning (because even if they were good at reasoning, the deficit of a facts-system would still throw them off). What I imagine is a system (program) by which we have numerous dials and meters that we watch and manipulate as we explore the Sea of Information. When the objects we are looking at appear bright and solid, we are being told the system thinks they have a high probability of being factually correct, or correctly reasoned. So we swim towards them, but along the way, some moderately translucent information catches our eyes, and we wonder... is that lower probability thing worth looking at? So we swim towards it, and then we find out, why yes, inside there was a nugget of gold, hidden by the LLMs incorrect evaluation of its probability. And so we store that in our vector Knowledge-Tank, and keep swimming as we search for interesting ideas, and the facts to support them. I see a sort of 3D universe of vector graphs with visual properties that correspond to different aspects of what we need to know in order to understand what we're looking at. Super Sci-Fi stuff, I know. But that's where my mind wanders to when watching your videos. Very thought-provoking! Anyway, enough of that. Thanks again for sharing your explorations. Looking forward to your future videos.

    • @diffbot4864
      @diffbot4864 หลายเดือนก่อน

      Leann here! Not sure why my reply seemed to be removed likely by TH-cam :( There's just so much gold in your comment and I definitely need to re-read a couple more times to fully digest them. I totally agree with your perspectives on the probabilistic nature of LLMs, and I've also started doing more research on how to better guide LLMs to follow the ground truth in knowledge graphs as they sometimes are stubborn and prefer sticking to their existing patterns in the training data. Also, really love your idea on a more interactive and visualized landscape to navigate through more factual information. I think it would be such a cool video if things can be visualized (but ofc we need someone advanced being able to build this first 😂). I learn so much every time reading through your comments. Really appreciate you contributing your perspectives/ideas here! I hit the "Back my campaign" on: about.me/vbwyrde hoping to support, but I'm not sure if that helps!

    • @vbywrde
      @vbywrde หลายเดือนก่อน

      @@diffbot4864 Thanks! That's very kind of you. I find myself watching your videos multiple times in order to drill down into the details of what you're looking at, and think it through. Since I'm really new to this domain it takes me a number of passes before I can grasp certain points. I am really excited about your RAG-Graph approach, but moreover your methodology of exploration. I admire your scientific approach, and the honesty you bring to the table. You're not hyping anything - you're exploring it, with all the pros and cons. That's completely awesome. My primary goal right now is to learn as much as I can, so please accept my thanks for your videos. Each one is both helpful and entertaining. I like your jokes! You are like the Emma Peel of computer science, imo. What I would like to work on is a generic tool that creates the pipeline necessary to build out the RAG-Graph methodology using DSPy. There are several pieces to this puzzle, but I'm working on one piece now that I think may prove helpful. Since I'm really new to python, and to DSPy (I have been programming in .net for ages, but never touched python until a couple of months ago), it's taking me some effort to climb the Learning Curve. But I'm making solid progress I think. Here's my DSPy repo github.com/vbwyrde/DSPY_VBWyrde ... please take a look at DSPY12.py and DSPY12_Out_3.md when you have a chance to see what I'm currently up to in terms of progress. DSPY12 is the current iteration of work on an LLM code generator that I hope will become a building block for many future implementations. I'd really like to combine your best practices and methods with what I've got, and derive a utility for streamlining the process of selecting a topic, building the training data, the RAG, and Graph, and getting accurate information from our LLMs. Then I can work on the Information Visualizer I was dreaming about in my previous post. With this, there would be numerous "visual keys" that indicate the nature of the data we are exploring. This idea of Swimming through the Sea of Information is one that I find really fascinating. Let's create a new modality of exploration through the Realm of Idea... Sophianauts. :) And thanks for checking out my about me. I think you're the first person to actually ever press that dusty old button! haha! Thank you!! Not that it actually does anything more than bring you to my Elthos RPG website, but still - thank you!! Very much appreciate your support!

  • @CaptTerrific
    @CaptTerrific หลายเดือนก่อน

    What a great demo, and idea for a comparison of techniques! As for why you got those weird answers about companies Musk co-founded... You chose a very interesting question, because even humans with full knowledge of the subject would have a hard time answering :) For example, while Musk bought Tesla as a pre-existing company (and thus did not co-found it), part of his purchase agreement was that he could legally call himself a co-founder. So is he or isn't he a co-founder? Murky legal/marketing vs. normal understanding, polluted article/web knowledge set, etc.

  • @chrisogonas
    @chrisogonas หลายเดือนก่อน

    Very well illustrated! Thanks

  • @SimonMariusGalyan
    @SimonMariusGalyan หลายเดือนก่อน

    Thank you for your explanation of integrating KG to RAG

  • @Fluffy_Cat_Owner
    @Fluffy_Cat_Owner หลายเดือนก่อน

    That explanation was awesome

    • @diffbot4864
      @diffbot4864 หลายเดือนก่อน

      Thank you for the kind words! 😊

  • @paulcailly7515
    @paulcailly7515 หลายเดือนก่อน

    great video thx

  • @vbywrde
    @vbywrde หลายเดือนก่อน

    Thank you for this video! Really great. I'm also a bit new to DSPY, but am having a great time learning it. This is really the right way to explore, imo. You set up comparative tests and then look carefully at the results to think about what might be going on in order to find the best methodology. Yep. That's the way to do it. Some thoughts come to mind. 1) take the same code and question, and try it with different models for the embedding. What I've noticed is that the selection of models can have a significant influence on the outcomes. 2) perhaps try creating a validator function for when you take the data and convert it into English as a way to have the LLM double-check the results to make sure they are accurate. I've been doing that with a code generator I'm working on, and it seems pretty helpful. If the LLM determines the generated code does't match the requirements, then it recursively tries again until it gets it (I send the rationale of the failure back in on each pass to help it find its way -- up to five times max) 3) I'll be curious to see how much optimization influences the outcome! Anyway, fun to follow you on your journey. Thanks for posting!

    • @lckgllm
      @lckgllm หลายเดือนก่อน

      This is such well-rounded and constructive feedback! Thank you so much! 🫶 You're right that I can set up some more testing and validation mechanisms, which are great suggestions that I'm taking to improve the pipeline. Really appreciate your effort writing this down, and I just learn so much more from the community with you all :) Thanks for the encouragement too!

  • @adennyh
    @adennyh หลายเดือนก่อน

    Thanks for the awesome summary about raptor