You Need Better Knowledge Graphs for Your Graph RAG

Leann Chen

มุมมอง 37 279

1 400

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 9 ก.พ. 2025

ความคิดเห็น • 112

@JaredWoodruff ปีที่แล้ว ⁺⁴
Excellent video! Thanks Leann, I had no idea about Diffbot, i'll be checking that out for sure.
Best of luck on your GenAI Journey
@nschul4 6 หลายเดือนก่อน
I've just completed some experiments using Microsoft's GraphRAG and your description describes my results exactly, "garbage-in-garbage-out". Without a consistently solid knowledge graph there's not much an LLM can have knowledge of. Thanks for sharing your project. I'll take a look at that.
@BenjaminKing1 9 หลายเดือนก่อน
It's nice to see my old co-worker Michelle randomly popping up in a video. I hope you were able to meet her. She is great!
@lckgllm 9 หลายเดือนก่อน ⁺²
I did meet her and was able to talk to her personally! It was definitely great. Joining her session presented with Amy Hodler will make you realize you don't want to miss another one! 😊
@thomaskaminski2187 11 หลายเดือนก่อน ⁺⁶
KG are key for providing context to RAG. Still, I see the OWL/RDFs path outperforming LPG as it enables the user to explicitly define semantics and infer knowledge
@pouet4608 11 หลายเดือนก่อน ⁺¹
It can be done by hand, but automatisation of this human feature is impressing. Good video!
@MrBekimpilo 10 หลายเดือนก่อน
This is very insightful Leann.. cheers from South Africa
@lckgllm 9 หลายเดือนก่อน
Thanks for the encouragement! 😊
@AI-Nate-SA ปีที่แล้ว ⁺⁴
I feel for most of the RAG user cases, vectorDB is good enough to retrieve information for LLM. But I agree that KB is better when you need the LLM to answer complex questions with precise and explainable answers.
@markquinsland8385 8 หลายเดือนก่อน
Use a VectorDB when 'good enough' is acceptable. Give your VectorDB a brain by combining it with a KG if you need it to be accurate, timely, or safe
@vbywrde 9 หลายเดือนก่อน ⁺¹
Yes, to physically present yourself in multiple locations at the same time is quite challenging. My understanding is that it requires you to achieve presence on the fourth dimension. Once there, you can then enter multiple three-dimensional spaces at the same time. I wish I could do that, though I suspect it would be really disorienting at first! Best wishes!
Also, I learn something new with every one of your videos! Thank you! I really like your approach!
@lckgllm 9 หลายเดือนก่อน
I'm surprised and also thrilled that finally someone takes my not-so-funny joke in the video seriously! 😂 Love you concise and scientific explanation of the multi-dimensional space, which makes me dream more about having that superpower. 😉 Thanks for the encouragement once again. I'm learning a lot from you guys too and have been enjoying the journey with you all!
@vbywrde 9 หลายเดือนก่อน
@@lckgllm Oh good. When you make it to the fourth dimension, please give a holler! :) It would be fun to see you in two places at the same time. Three even! XD.
In the meantime, please keep us posted as to your coding progress. I find your videos really helpful. Thanks!
@senthilkumarpalanisamy365 8 หลายเดือนก่อน
Excellent video, clear explanation, please do post more in the gen ai and knowledge graph space
@kingmouli 11 หลายเดือนก่อน
Thank you for amazing short video, I am eagerly waiting for you to make a video on how to convert csv data into knowledge graphs and answers questions on the csv files
@kwongster 11 หลายเดือนก่อน ⁺¹
amazing video - hope to see more. this was very informational and inspirational to learn about Knowledge Graphs
@mohammedmahinuralam2796 11 หลายเดือนก่อน
Great! Waiting for more of your videos!
@rephechaun 7 หลายเดือนก่อน
Hello, thank you very much for posting the video, I am very interested in the part where you also show the graoh with in the chatnot, what python packahe is that please?(,y apologies if the question is redundant, I couldn't find it in other comments)
@jasonwong14205 ปีที่แล้ว
very cool! have been working on building a client-side profiling & Hyperthymesia second brain graph RAG kind of thing and really struggled with the bill with gpt graph construction! thanks!
@lckgllm ปีที่แล้ว
Sounds like a cool project! Let's chat 🙂
@samarthsarin 3 หลายเดือนก่อน
Is there any technique to evaluate the knowledge graph quality. As there were some incorrect entities
@kenchang3456 ปีที่แล้ว
Thanks Leann. I'm going to have to give it a try for deeper dive with DiffBot.
@AP-fu3bj 8 หลายเดือนก่อน
Can you please share the code for the application you built to visualize the knowledge graph ?
@AerialWaviator 11 หลายเดือนก่อน
This was an interesting video. I was more focused on the process, and thinking behind using this process to organize and visualize data.
@Avman20 ปีที่แล้ว
Interesting post, Lenann. Keep it up! It would be interesting to explore from a procedural perspective how graphs could supplement vector databases in RAG doc retrieval and relevancy evaluation.
@lckgllm ปีที่แล้ว
Thank you for your feedback! That gives me an awesome video idea. See you in the next one 😊
@Avman20 ปีที่แล้ว
You betcha. Subscribed!@@lckgllm
@rockstar-lt8rg 5 หลายเดือนก่อน
How to return neo4j subgraph image when stremlit's response?
@idk-kv9hg 8 หลายเดือนก่อน ⁺¹
Hey Leann, first of all great explanation with some insights (specially on Diffbot). You got a new subscriber 👍
I'm going to work on a RAG based project which will use Neo4J as a Graph Database.
I've went through other comments and your answers to them. But still wanted to know few things:
1. Here you took the example of speaker and what they have spoken (and their interest/expertise etc...) which is working fine. But what if I have some PDF docs of roughly 50-70 pages with some rules and regulations and want to use them as a custom knowledge base from my RAG project? Is knowledge graph database is good choice? why not simple Vector DB (such as milvus db)?
2. Assuming I must use Graph database, how do I efficiently chunk the PDFs and store into graph notes and relations? So that if users asks any query then user should get correct answer.
3. If the docs are related to rules and regulations, then what will be the nodes and relationships between them? Because here in your example, nodes were speaker, their expertise etc...
I understand that you might not have perfect answer for all of these above but I'd like to have some point of view. Hope you find my comment and reply me once you get a time. Thanks for reading and your time.😊
@amanseth4486 14 วันที่ผ่านมา
This is exactly something I’m looking for as well. Were you able to figure out the answers and how it all came together ? Or are you still under the implementation process? I would truly truly appreciate a response from you. Thank you.
@AP-fu3bj 8 หลายเดือนก่อน
Are you creating embeddings on top of the knowledge graph for RAG??
@SlykeThePhoxenix 11 หลายเดือนก่อน ⁺²
Have you tried using a local LLM such as Mistral? It'll take a big longer, but it's considerably cheaper.
@lckgllm 11 หลายเดือนก่อน ⁺²
I think I didn't get my point across clearly and sorry about that. In terms of constructing a knowledge graph, from my experience, currently Diffbot's Natural Language API has the best performance regarding Named Entity Recognition (NER) and Relationship Extraction (RE) compared to GPT-4 (so far the best LLM) or spaCy-llm as I tried them both. Frankly speaking, large language models are not inherently optimized for tasks as entity/relationship extractions and we should think again whether LLMs are the best option for every single task.
@SlykeThePhoxenix 11 หลายเดือนก่อน ⁺¹
@@lckgllm Everything's a nail if you only wield an LLM =D
@lckgllm 11 หลายเดือนก่อน
🤣@@SlykeThePhoxenix
@Shaan_Suri 6 หลายเดือนก่อน
Hi, since you mentioned pricing, especially how expensive GPT-4 is, is using the Diffbot API free? Thanks.
@SunMai93 11 หลายเดือนก่อน ⁺¹
Diffbot sets a pretty high bar for entering this project, any thought/plan to utilise open source project instead? Thanks!
@lckgllm 11 หลายเดือนก่อน ⁺¹
Yes I have previously used spacy-llm in my last video:th-cam.com/video/mVNMrgexxoM/w-d-xo.html
However, from the results generated by spacy-llm in my GitHub, you can see that there are still errors in the output, and I need to further pass the results to ChatGPT-4 for refining: github.com/leannchen86/openai-knowledge-graph-streamlit-app/blob/main/openaiKG.ipynb
I hope future LLMs (regardless of closed source and open source) will enable us to see the confidence score for the output as I experienced with Diffbot's APIs.
@SunMai93 11 หลายเดือนก่อน
thank you@@lckgllm ! I will have a look @ the video and the notebook. Might come back for discussion again. have a good one!
@MehdiAllahyari 11 หลายเดือนก่อน ⁺¹
Great video! However, I would completely replace DiffBot with an open source solution. There are many NER models, SpanMarkerNER to name one, since most of the entities you showed in the video are Person, Location, and Org, which libraries like SpaCy and setFit are pretty good for them. Using LLM with few shot learning would be another option. Overall, very nice video.
@lckgllm 11 หลายเดือนก่อน ⁺¹
Thanks for the feedback! I have previously used spacy-llm in my last video:th-cam.com/video/mVNMrgexxoM/w-d-xo.html
However, from the results generated by spacy-llm in my GitHub, you can see that there are still errors in the output even if examples are included in the prompts, and I needed to further the pass the results onto ChatGPT-4 for refinement: github.com/leannchen86/openai-knowledge-graph-streamlit-app/blob/main/openaiKG.ipynb
I hope future LLMs (regardless of closed source and open source) will enable us to see the confidence score for the output as I experienced with Diffbot's APIs.
@MehdiAllahyari 11 หลายเดือนก่อน
@@lckgllm If you'd like to have confidence score using llms, a simple hack is, add that into the prompt, so llm returns the result with scores. :)
@mohammedshuaibiqbal5469 8 หลายเดือนก่อน
Can you make a video on how to generate knowledge graphs for pdf books like DSM 5
@spikezz29 3 หลายเดือนก่อน
I guess you will continue post KG content in diffbot only in the future?
@cemery50 10 หลายเดือนก่อน
I'm interested in creating a Little Logical Model based upon the command structure of an application and then using agents take voice to text and text to cmd. maybe with a coresponding graph view updated with current information avaiable in another window on another display screen.
@shingyanyuen3420 11 หลายเดือนก่อน ⁺²
I don't understand how knowledge graphs are being used in RAG? What's the differences between a KG-RAG and a normal RAG?
@lckgllm 11 หลายเดือนก่อน ⁺⁷
Good question @shingyanyuen3420 ! Sorry for not making it clear in the video, will improve my explanation next time.
The typical RAG applications chunk documents into smaller parts and convert them into embeddings, which are lists of numeric values. LLM then retrieves information based on similarity to the semantic question.
However, the information retrieval process can become challenging as document sizes increase, potentially causing the model to lose the overall context.. This is where knowledge graphs can be useful. Knowledge graphs explicitly define the relationships between entities, offering a more straightforward path for the LLM to find the answer while staying context-aware - improving accuracy for the retrieval process.
Hopefully this article is helpful:
ai.plainenglish.io/knowledge-graphs-achieve-superior-reasoning-versus-vector-search-for-retrieval-augmentation-ec0b37b12c49
@AdityaSharma24091994 11 หลายเดือนก่อน ⁺¹
Can RAGs become efficient enough to do data analysis over text tables and csvs? I'm planning to build one so wanted to know if this is possible.
@lckgllm 11 หลายเดือนก่อน
Yeah I think so! That's a great idea for a new video :)
@AdityaSharma24091994 11 หลายเดือนก่อน
@@lckgllm yes. I would be glad to collaborate on such project.
@auro284 5 หลายเดือนก่อน
very insighful, would love to see more such videos.
@vanderstraetenmarc 11 หลายเดือนก่อน
I'm a newbie on these matters discussed here, but I really do appreciate the way your MyGraph RAG AI Assistant work, responding with text AND graph. Can you tell me a bit more on how you did accomplish this? (I'm especially interested in the graph that got generated!). Hope that's not a stupid question?
@lckgllm 11 หลายเดือนก่อน ⁺¹
Definitely a great question! I didn't include the process in this video and plan to make another video about this, but let me show you the details via email :)
@vanderstraetenmarc 11 หลายเดือนก่อน
@lm Would be highly appreciated! 🙏 Didn't get it yet though...
@justindehorty 8 หลายเดือนก่อน
@@lckgllm Hi Leann, I had the same question. Isn't this just an implementation of streamlit-agraph? Is there any reason why you left this out of the GitHub repo you shared? It would be incredibly helpful/instructive to see the implementation.
@Manu-m8w6m ปีที่แล้ว ⁺³
Hi Leann. First off great video, I have been following your content for a while. I have some quick doubts:
1. What your opinion on using Knowledge Graph RAG for production level compared to vector search ?
2. I have tried different methods to extract entities and relationship from unstructured data, what I am looking for is leveraging LLM compatibilities to extract implicit and explicit entities and relationship from data so as to reduce manual efforts/errors. So far I have tried following methods:
i) using rebel llm to extract entities - Not good for large set of data
ii) directly using gpt 4- too much cost and lot of prompting
iii) spacy-llm from your video- ok but when comes to large data still many wrong.
What do you think would be the best and optimized approach here for a production application? We have 1000s of file and I am looking for a structured method which is cost efficient and effective in extracting entities and relationships from large unstructured data. Would love your opinion on this.
Thank You
@thanartchamnanyantarakij9950 ปีที่แล้ว
good question
@lckgllm ปีที่แล้ว ⁺²
Great question! To be honest I'm not yet an expert on this subject, but I'll do my best providing a balanced view. For your 1st question, while it's a big one to answer, I'd say it heavily depends on what your data looks like and what the expectations for your RAG system are. What's the main problem your RAG app trying to solve?
If your app needs to be highly context-aware and be able to draw/highlight the relationships between entities (e.g. find the shortest path between A & B), building a quality knowledge graph is essential for it to perform well. However, if scalability and speed are more of your focuses, vector search may need a higher priority. And it doesn't need to be "either/or". There's also examples where knowledge graphs and vector search are combined, which of course will be a more challenging task in terms of designing the roadmap.
Speaking of production-level development, I'm curious if you already have your benchmarks and evaluations ready? Evals on LLM is a very active research area. At least from what I've seen (I can be wrong), how to define metrics and evaluate LLM's performance - there's still lots of unknowns. Unless you have your metrics ready to accurately measure the performance of your RAG system, I'd say it's still early to think about production-level issues.
To your 2nd question, thanks for watching my previous video on spaacy-llm.😊 It's true that results from spacy-llm are not perfect yet as the model is still powered by LLMs such as GPT-4, which so far still sucks at identifying and labeling entities/relationships. That's why I tried Diffbot and actually found that the performance is better via their Natural Language API, even though it currently comes with a price for enterprise tho. If the organization you work at has the budget, I think it's worth trying it. *Note: this video is not sponsored by Diffbot so I'm not trying to talk you into buying their product. I purely share my experience and process to build this project (see in the description).
I hope this information is helpful to you!
@Manu-m8w6m ปีที่แล้ว
@@lckgllm
Hi Leann. First of all thank you for the reply.
Is it ok If i reply in linkden directly to you?
@Manu-m8w6m 11 หลายเดือนก่อน
@lcgdsllm Hi Leann, I have messaged you in linkden😀
@KK-xs7jw 11 หลายเดือนก่อน
@@Manu-m8w6m Even I'm looking for something similar.
So far found `Universal-NER/UniNER-7B-all` which is good in identifying entities although it can get only one entity at a down.
and `Tostino/Inkbot-13B-8k-0.2` seems promising although I haven't tried it out yet.
Can you share if you found out a better way to extract entities and relationships from unstructured data.
@janekaufman4695 ปีที่แล้ว ⁺¹
Very comprehensive. I am follow you 643
@lckgllm ปีที่แล้ว ⁺¹
Thank you ♥
@joshuacunningham7912 ปีที่แล้ว
Thank you so much for this incredible tutorial! I've discovered that "GenAI" is my newfound passion, and I hadn't even heard of the term until I watched your video. I look forward to your next video.
@lckgllm ปีที่แล้ว ⁺¹
Thank you so much for the encouragement! I’ll continue working hard to bring better content. Stay passionate with GenAI 💪🏻❤️
@darkhydrastar 9 หลายเดือนก่อน
You are an excellent presenter. Thank you. We do however need to find you a better background music. It's giving pharmaceutical commercial and the levels are a little too high over your voice. Still great though. You have excellent stage presence and a clear voice.
@lckgllm 9 หลายเดือนก่อน ⁺¹
Totally agree with you :) I have since upgraded to epidemic sounds for music and be more mindful that the music volume should not distract the viewer when I'm speaking. I'm trying to learn and become better after every video, so really appreciate seeing feedback like this for improvement!
@deniowork7084 11 หลายเดือนก่อน
Love this!
@souzajvp 11 หลายเดือนก่อน
Thanks for the awesome video!
I was trying to reproduce your code but got an error because the "text_for_kg()" function was not defined. Any chance you can help me understand where this functions comes from?
Great content and great editing!
Thank you
@ajeeshsunny4592 11 หลายเดือนก่อน
Same problem for me. Trying to implement text_for_kg.
@lckgllm 11 หลายเดือนก่อน ⁺¹
Hello! Sorry for the late reply, been busy with work. I just realized that text_for_kg() somehow was deleted from the notebook, but it should be the same thing as diffbot_nlp.nlp_request(). I just updated the notebook in the girhub repo. Let me know if it doesn't work. I'll do my best to fix it. Thanks for point this issue out! @souzajvp
@manfyegoh 11 หลายเดือนก่อน
nice sharing
@malipetek 11 หลายเดือนก่อน
Thanks.
@andydataguy 11 หลายเดือนก่อน
Great video!
@mikew2883 ปีที่แล้ว
Excellent tutorial! 👍
@IliasTerzis ปีที่แล้ว
Very good video! Thanks!
@carthagely122 ปีที่แล้ว
Thanks Can i Ask a question " What will bé first and second step to make médical chat bot using llm as with m'y style and persona"" thanks
@lckgllm ปีที่แล้ว ⁺¹
This is big question to answer, but what's important and fundamental is having a high-quality dataset. Do you already have your domain-specific data?
@carthagely122 ปีที่แล้ว ⁺¹
@@lckgllm thanks very much
@beginnerscode5684 ปีที่แล้ว
Hi Leann, a wonderful video i was looking for something like this, how can I reach out to you?
@lckgllm ปีที่แล้ว
Thank you! You can find me through either LinkedIn: Leann Chen.
Or email:
leannchen86@gmail.com
Look forward to connecting soon!
@AndrewNeeson-vp7ti 11 หลายเดือนก่อน
6:26 It's not a great answer. 🙁 The graph DB has effectively acted as a bottleneck for the data. I.e. The answer is based purely on nodes + edges.
I'd be curious whether the graph DB could essentially act as an index for the original content.
I.e. Still use a graph query to return the relevant nodes/edges, but pass the source text corresponding to them as a RAG response.
@lckgllm 11 หลายเดือนก่อน ⁺¹
Good question, although I'd defend that the answer at 6:26 is good enough for my use case😂, as my purpose was converting unstructured text data into structured knowledge graphs, which served as the ground truth for LLM to find out the answer. 6:26 exactly showed the context from my knowledge graph.
I think what you're asking "whether the graph DB could essentially act as an index for the original content" is a different use case, where the documents themselves are classified as nodes and edges would be appended to the nodes based on their semantic similarities. I'd probably make another video particularly for this use case, which is different from what you saw in this current video.
@AndrewNeeson-vp7ti 11 หลายเดือนก่อน
@@lckgllm not the entire document, but rather the section of it that corresponds to the creation of that node/edge.
E.g.
*Graph response:* [Amy] interested_in [science history]
*Source text:* "Amy has a love for science history and a fascination for complexity studies"
If it was possible to store the source text as an attribute of each relationship and return that rather than the edge names then you'd probably get a higher quality answer.
@lckgllm 11 หลายเดือนก่อน ⁺¹
Ohh!! I like this idea, yes it would be more concrete and reliable. Thanks for sharing! Let me try to improve this feature and may make a video about it ;) Really appreciate your feedback, thanks of much ❤@@AndrewNeeson-vp7ti
@andrei5230 ปีที่แล้ว ⁺¹
why do you switch rooms?
@lckgllm ปีที่แล้ว ⁺²
because I was traveling among different locations, and this video actually took me like 5 days to film lol
@zaubermanninc4390 7 หลายเดือนก่อน
Subbed for Mr. Beast counting 💗
@JG27Korny ปีที่แล้ว
Just did not mention the price of diffbot.
@lckgllm ปีที่แล้ว
Yep because it’s not a sponsored video (see in the description). It’s a tutorial video about Graph RAG application.
@markquinsland8385 8 หลายเดือนก่อน
go to the presentation by Amy Hodler (and tell her I said hello)
@stanTrX 9 หลายเดือนก่อน
I have tested for few critical document to get some answer using standard RAG and to be honest, didnt enjoy the performance so much.
@lckgllm 9 หลายเดือนก่อน
Thanks for sharing your experience! As you referring to standard RAG (purely vector-based) or graph-based? Purely vector-based RAG is not great, while graph-based RAG could return more reliable results. But I also have to be honest that prototypes are generally cute and are very far from production-ready. That's why we need a lot of testing/evaluations, and I'm currently gearing towards making videos containing production-oriented testing :)
Here's a video that I did some testing: th-cam.com/video/mHREErgLmi0/w-d-xo.html
@linlinlau6785 9 หลายเดือนก่อน
太棒了。
已訂閱追蹤
@danielneu9136 11 หลายเดือนก่อน
great video like that but plese stop using Large Language models, for NER there are way better performing and cheaper alternatives :)
@ppcalpha1042 11 หลายเดือนก่อน ⁺¹
hey daniel, what are the better performing and cheaper alternatives for NER?
@lckgllm 11 หลายเดือนก่อน ⁺¹
Good question! I want to know too :) @danielneu9136
@danielneu9136 11 หลายเดือนก่อน
@@lckgllm Use small open source encoder models for NER e.g fine-tuned versions of ALBERT. Simply import the right model from the huggingface transformers library and let it label your dataset. They often perform better than LLMs on the tasks that they are trained for and most of them even work with colab free tier :)
@BobaQueenPanda 11 หลายเดือนก่อน
Content was good but found face filters visually distracting.
@lckgllm 11 หลายเดือนก่อน ⁺³
What face filters? I literally talked in front of my MacBook Pro 14. I did have makeup on which I admit.
@Armoredcody 11 หลายเดือนก่อน
@@lckgllm you're fine do not worry about it. however, the audio sounds like it has a low bitrate.
@lckgllm 11 หลายเดือนก่อน
Definitely going get a mic for better voice quality. Thanks for the feedback, folks!@@Armoredcody
@goofballbiscuits3647 11 หลายเดือนก่อน
Cool! New sub from me 😊
@sanjaybhatikar 4 หลายเดือนก่อน
Simply using OpenAI tools is not interesting
@sachinreddy2836 11 หลายเดือนก่อน ⁺¹
Ur cute
@user-jk9zr3sc5h ปีที่แล้ว ⁺¹
not down with the closed source stuff for local llm, but cool info regardless
@lckgllm ปีที่แล้ว
Thanks for the feedback and you actually just gave me a great idea on future videos!
@SunMai93 11 หลายเดือนก่อน
Very nice content! support support 🇹🇼

ต่อไป

เล่นอัตโนมัติ

GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem