A Helping Hand for LLMs (Retrieval Augmented Generation) - Computerphile

Computerphile

มุมมอง 93 111

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 21 พ.ย. 2024

ความคิดเห็น • 149

@shutton 2 หลายเดือนก่อน ⁺²⁷¹
I made a program last year that uses RAG to help me study for an exam in refrigeration. I had 4 textbooks that were each 500+ pages. Finding the right page for a topic in my course to learn a specific concept was a nightmare. the textbooks chapters were all over the place. I converted the books to embeddings and stored them in a database. I then would ask a question about my refrigeration concept that I wanted to learn, create an embedding for my query and using a comparison algorithm I would retrieve 10 or 15 of the most mathematically similar textbook pages. After retrieving the textbook pages, I fed the text from those pages and my query into an LLM and an answer would spit out for me. It was a great way to learn my niche subject of refrigeration and helped me pass my exam. asking the same question to the LLM alone without the retrieved textbook pages to assist in the context was not giving me reliable answers.
@arcanernz 2 หลายเดือนก่อน ⁺¹¹
That’s cool thanks for sharing.
@MrBoubource 2 หลายเดือนก่อน ⁺⁴⁹
Are you sure you want to make refrigeration your career ? 😂 Impressive !
@arcanernz 2 หลายเดือนก่อน ⁺¹¹
@@MrBoubource trades are not a bad career especially when tech is so saturated. Besides you can always switch later if the opportunity arises.
@DamianReloaded 2 หลายเดือนก่อน ⁺¹⁶
This could have been your phd project for you computer science degree ^_^
@Tomyb15 2 หลายเดือนก่อน ⁺²
Did you do all of that on your own computer? Do you have a really high end graphics card?
Also, that's really cool, but it also sounds like the RAG part was what fixed most of your problem, with the LLM just being there for some comfort (but sacrificing accuracy).
@mikoaj1321 2 หลายเดือนก่อน ⁺⁷³
The presented example wasn't quite RAG. You're just putting more text into the context window. This method quickly falls short if you need to process a big set of reference data, like an entire PDF documentation. Real RAG is a bit more complicated and involves an additional step of converting the reference data to tokens that can be stored, then during inference you first convert the query to tokens, then find best matches with stored data, then use that search to generate excerpts from the original data to feed into your final inference window.
@ZandarKoad 2 หลายเดือนก่อน ⁺³
I see you point. Typically RAG uses an existing 'library' if you will of vectorized documents, and that's not what he did here. ... Or maybe he did? Under the hood, Google definitely uses some form of vectorized approached to web keyword relevance search. So in this case the vectorized library of documents is essentially Google's index. He performed a vectorized search across reference data tokens (albeit through a third party, Google). Does RAG need to include 1st party implementation of the reference data tokenization in order to be considered RAG? Probably not...
@lakshitdagar หลายเดือนก่อน ⁺¹
Vectorisation is not the point. The point is to provide context alongside the query for better results.
Though I will point out that the tricky part in all of this is finding the relevant context and that is where the whole “matching the vectorised query with vectorised document store comes in.”
@tomthetitan101 หลายเดือนก่อน ⁺²
Yeah as lakshitdagar said it's still RAG, but he just did the "retrieval" manually to get the additional context haha. This is so much harder when you need to query for this information, which is where vectors are helpful, but don't always give you the results you need.
@mscotty910 หลายเดือนก่อน
Out of all the people they have Mike is the best (IMO) it would be awesome to do a segment with him on how models like Stable Video Diffusion Image-to-Video work
@dukestt5436 2 หลายเดือนก่อน ⁺⁸⁹
The word "Strawberry" actually has two R's. I apologize for any confusion caused earlier. - Chat GPT
@metacob 2 หลายเดือนก่อน ⁺¹⁶
In 2025: "Let me search the internet first..." (internet: full of LLM generated slop) "...no I was right, it's two Rs"
@christopherdessources 2 หลายเดือนก่อน ⁺⁵
Lol yea it always has trouble counting the letters in a word.
@Yobleck 2 หลายเดือนก่อน ⁺²
LLM's are also really bad at anagrams and crossword puzzles.
@scbtripwire 2 หลายเดือนก่อน
Sounds legit. Let's add that fact to Wikipedia.
@braineaterzombie3981 2 หลายเดือนก่อน ⁺³
Sorry for confusion but i suppose there is no 'r' in stawbey
@alastairzotos 2 หลายเดือนก่อน ⁺¹⁸
I worked on a RAG to make product recommendations, but eventually I was supplying it with too much data as context and it wouldn't work.
I settled on a neat solution: use GPT's ability to call functions and tell it something like, "when the user asks for a recommendation, call the get_recommendations function with a summary of the user's query". It's cool that it gave me a summary because the embeddings are much better than those of a whole sentence or paragraph. So I could take that embedding and look up products based on semantic similarity to the user's query, while it was still generating a response, and then pass the top 10 back to GPT for it to show the user
@penfold-55 2 หลายเดือนก่อน ⁺⁴⁴
The problem with RAG and LLM's are the same. The risk is that the user takes what is said at face value.
Where RAG really can improve the situation is if the source is provided.
If you have a group of formal documents (such as documents for company procedure) then you should always state the source of that document.
This not only improves the trust of the model, but also narrows down where the user needs to look.
If it is just a black box, it can be hard for the user to know whether the RAG worked or whether it was hallucinating.
@SentientTurtle 2 หลายเดือนก่อน ⁺⁷
The issue here is that if you always have to check the sources ... the tool should just provide *only* the sources; Such as a conventional search engine would.
@robglenn4844 2 หลายเดือนก่อน ⁺⁴
I had a first-hand experience with that a couple of months ago with a google search. I asked what the crew positions were on a B-29 bomber, and what Bing came back with were mostly sensible, but it had some weird ones. It listed "Radar Operator" and "Flight Surgeon" as two of the positions. Now perhaps some later B-29's may have had a radar, but flight surgeon is such a specialized medical profession that it's clearly ridiculous. Not to Bing, though, apparently.
@intptointp 2 หลายเดือนก่อน
I personally view this as a feature, not a bug.
The problem might be with humans always taking what is said at face value.
@p-niddy 2 หลายเดือนก่อน
Do you mean state the source document? Not "of that document"?
@KylerChin 2 หลายเดือนก่อน ⁺⁷⁶
Feels illegal to be this early to Prof. Pound's lectures
@shiroyasha_007 2 หลายเดือนก่อน ⁺¹
Heh😢
@thejeffmorrison 2 หลายเดือนก่อน ⁺¹⁰
Early access to pound town is an honor
@UnemployMan396-xd7ov 2 หลายเดือนก่อน ⁺¹
Me like being criminal 🥷
@BlastinRope 2 หลายเดือนก่อน ⁺³
Arriving early for Prof. Pounds lecture sounds like a title for a video on a different site
@mokopa 2 หลายเดือนก่อน ⁺¹⁴
8:12 "Langchain does a lot of other stuff that I'm not using"...langchain in a nutshell
@Tomyb15 2 หลายเดือนก่อน ⁺²⁸
It's surprisingly bare bones as an approach. I was expecting something more sophisticated than just sticking the context as part of a promt and literally telling the model to use it in the answer. Reminds me of "promp engineers" sticking a _"and please don't lie"_ at the end of a prompt to decrease hallucinations 😂
@simon5007 2 หลายเดือนก่อน ⁺¹⁵
This is the most basic version of RAG that is shown here. Most common version of this also uses a vector database in order to actually find the correct, or most likely correct information (which allows for questions to be a lot more indirect than in our Wikipedia example where you grab a specific article).
Then, over the last 6 months or so we have seen a rise in what is most commonly called "Tools" (sometimes called Function Modules or similar). Thats when you tell the LLM what kind of data (and actions) it can take, and let the LLM request data rather than just getting fed the data passively.
To me, the main thing of this is that the next leaps in improvements for LLM's are unlikely to be about the LLM's themselves, but rather us getting better at feeding it the correct data at the correct time. Simply put, AI Data Orchestration.
@peterdonnelly1074 2 หลายเดือนก่อน ⁺²
There are more elaborate versions that use vector embeddings, and actually that's what i was expecting
@stt.9433 2 หลายเดือนก่อน ⁺¹
RAG is not Prompt Engineer, it's Search and Retrievel, Indexing and NLP, the plugging in the prompt aspect is moreso the final output of the whole rag archi.
@Tomyb15 2 หลายเดือนก่อน
@simon5007
Oh, I see. That's more like what I was expecting, and makes more sense.
@chandlercampbell5392 2 หลายเดือนก่อน ⁺²
Even vector databases are not terribly complex. In my opinion as someone who has implemented RAG, companies scrambling to make money off the AI gold rush are playing up both how complicated it is and how effective it is. The technique itself is extremely straightforward, but the implementation is a bit tricky to get right. Marketing departments would have you believe it's a silver bullet, and it's just not.
@amrelmohamady 2 หลายเดือนก่อน ⁺²¹
Now we need a video on fine tuning!
@TimHayward 2 หลายเดือนก่อน
And adversarial tuning.
@i1abnrk 2 หลายเดือนก่อน ⁺¹
I remember having a whole box of the green printer paper. A family friend worked at the state and gave it to me for drawing, etc. some of it had phone numbers and addresses. Long ago in the city dump now.
@neongensis1 2 หลายเดือนก่อน
Mike, if when you're finished your career in academia and if you find yourself bored, please considering starting a TH-cam channel explaining literally anything vaguely related to computing!
@frankbucciantini388 2 หลายเดือนก่อน ⁺⁸
I'm a simple person.
I see Mike Pound, I click on the video.
@BytebroUK 2 หลายเดือนก่อน ⁺³
More than a few people have been saying recently that yes, most of the LLM output for generic questions is pretty rubbish, but now imagine what happens when most of what they are trained upon is also LLM output. Almost certainly the quality of any results is going to get exponentially worse, no?
@jeffreycordova9082 2 หลายเดือนก่อน ⁺³
As more content becomes generated, LLMs are more often going to be using self provided information as training data.
@HelloThere-xs8ss 2 หลายเดือนก่อน ⁺¹
I think that's called bias right?
@jeffreycordova9082 2 หลายเดือนก่อน ⁺²
@@HelloThere-xs8ss Yes, or a negative feedback loop. Since over time it will use more and more of it's own degrading content. Hopefully there is some way to prevent that though, I would like it to get better not worse!
@ZandarKoad 2 หลายเดือนก่อน
Expertise will still remain and exist at the level of first-hand experimentation with real-world data. So whatever it is you are trying to learn/model/understand/improve upon, at some point you MUST come up with scenario(s) in which you can test your understanding/models/theories against the physical world (aka REALITY) to validate all the pseudo-knowledge contained in the LLMs and vectorized libraries out there. These testing scenarios ideally should be rigorous enough to construct complete confusion matrixes that fully characterize the level of precision and recall in your understanding/models/theories.
@garcipat 2 หลายเดือนก่อน
Funny. just had to do this in a hackathon last week :)
@Xjaychax9 หลายเดือนก่อน
The fact this channel doesnt have daily uploads is sad af
@jimjones7980 2 หลายเดือนก่อน ⁺²
Doesn't RAG make an LLM more susceptible to prompt injection hijacking. If I can get an LLM to grab data that I control that itself includes prompt injection attacks, then RAG is giving me a way to possibly bypass some of the prompt sanitation that is built into the LLM general interface. The hacker WunderWuzzi seems to leverage these edge cases in a lot of his recent AI security research.
@IceMetalPunk 2 หลายเดือนก่อน ⁺¹
I mean, yes? But also, if a malicious user is able to intercept the data gathering, they'd probably have an easier time just prompt engineering an attack anyway. You may be surprised at how easy it is to bypass prompt-level sanitization by just changing up the formatting away from what's expected...
@thecompanioncube4211 2 หลายเดือนก่อน
I am JUST now studying this. This very uncanny computerphile
@stoppls1709 หลายเดือนก่อน
they're in your walls
@jameshiggins-thomas9617 2 หลายเดือนก่อน
It would be interesting to get a sense of how much the context helps. What -would- the answer have been without it? If I really did have context about things the model itself could not have learned, how does it do?
@Imperial_Squid 2 หลายเดือนก่อน ⁺⁴
Given that LLMs hold knowledge in their weights, but are also taking in knowledge through RAG I wonder how those things interact if the sources of knowledge conflict... Specifically what happens if an LLMs learned knowledge falls behind compared to stuff like Wikipedia that's updated constantly? Or on the opposite end, can you poison a model using RAG by deliberately feeding in bad knowledge as part of that additional context...
@AnnCatsanndra 2 หลายเดือนก่อน ⁺¹
Depends on the model. In my experience, the search - focused ones will favor the retrieved data, but chat-focused models are more likely to point out a conflict in what it knows, or simply say that "according to the data,...) and not mention anything that wasn't included in it until receiving a follow up prompt.
@amcluesent 2 หลายเดือนก่อน ⁺²⁰
Always trust someone who writes on green fanfold paper
@ianthehunter3532 2 หลายเดือนก่อน ⁺²
What's used for the animation at 6:57?
@CheesyAceGameplay 2 หลายเดือนก่อน ⁺¹
Jane street? OCAML MENTIONEDDD
@Abhishek.Rana. 2 หลายเดือนก่อน
Prof. Pound looks younger by the day
@ardenthebibliophile 2 หลายเดือนก่อน
@4:19 is "inverted commas" the same in UK English as "quotes" in US English? I.e. that previous sentence used inverted commas twice
@IceMetalPunk 2 หลายเดือนก่อน ⁺²
"ChatGPT is more than double the size." Yeah, much more; like 21 times the size 😂
Also, you have 33GB of VRAM on a laptop? I'm envious 😅 My *desktop* only has 8GB and it's limited my ability to play with local models.
@fensoxx 2 หลายเดือนก่อน ⁺¹
He said it’s running on their servers I believe.
@ZandarKoad 2 หลายเดือนก่อน
There is no GPU (laptop or desktop) that has 33GB of VRAM. The max is 24 GB.
Unless you want to spend $5,000 or more on a workstation card that can have 48GB. And that's not going in any laptop.
@richardconway6425 2 หลายเดือนก่อน ⁺¹
a little bit of trivia for you:
Jane Street was where Sam Bankman-Fried and Caroline Ellison honed their quantitative trading 'skills', before leaving to run Sam's newly founded company, Alameda Research, which was a complete disaster.
@HeroOfHyla 2 หลายเดือนก่อน ⁺²
9:22 Something about writing the initial prompt to the LLM in second person has always rubbed me the wrong way. Wouldn't it play a lot better to the strengths of an LLM to write a prompt like "below is a transcript of a conversation where a chatbot successfully answers a user's question" rather than a prompt like "You are an AI assistant who answers questions"?
I understand that instruction-tuned models are tuned to handle these second-person prompts, but it seems like a weird stopgap.
@drdca8263 2 หลายเดือนก่อน ⁺¹
Apple apparently uses a third person prompt.
@ZandarKoad 2 หลายเดือนก่อน
The LLMs have been fine-tuned to act as 2nd person assistants, because this is a paradigm many people already understand. It is very natural and easy for users to treat the system as another person. It didn't (and doesn't) need to be this way, but if you know the LLM model you are working with has been trained in this fashion, it's best to lean into it.
@drdca8263 2 หลายเดือนก่อน
@@ZandarKoad -Ok but this was using stable diffusion, right? Which doesn’t have an assistant part as a user-interface?- edit: thought this was from a different thread, nvm
@froop2393 2 หลายเดือนก่อน ⁺¹⁴
RAG without mentioning Vector Embeddings, Cosine Similarity and Principal Component Analysis feels so incomplete.
Ollama can be used directly by creating a new model based on Llama from the command line.
Langchain is nothing more than smoke and mirrors hiding the important stuff...
@little_cal_bear 2 หลายเดือนก่อน ⁺¹
How is principal component analysis relevant?
@froop2393 2 หลายเดือนก่อน ⁺⁴
@@little_cal_bear Embeddings are high dimensional. For fast searches you need an index. PCA reduces dimensions and can be used for creating indexes or clustering vectors.
@little_cal_bear 2 หลายเดือนก่อน
🎉@@froop2393
@americo9999 2 หลายเดือนก่อน
Maybe this is not the right place to ask for this but what courses or materials do you recommend to learn more about Gen AI, in particular, creating applications such as Midjourney or any other Gen AI that generates an output, could be an image, music, etc.?
@ZandarKoad 2 หลายเดือนก่อน
If you try to learn from courses or classes, you'll always be at least 1-2 years behind current tech.
@riceandbeansandandoil หลายเดือนก่อน
Can you do Retrieval Interleaved Generation
@sjbuttonsb 2 หลายเดือนก่อน
I saw a great AI talk about RAG last week.
@combardus9309 2 หลายเดือนก่อน
Can you please make a video on llm agents?
@macoson 2 หลายเดือนก่อน
What they talk about in this video is not a RAG
@danielrhouck 2 หลายเดือนก่อน
Completely not the main topic, but what font are you using in Sublime?
@zigmundo 2 หลายเดือนก่อน ⁺¹
Jane street only cover expenses?
@bobbyshen7826 2 หลายเดือนก่อน
I believe they cover all expenses
@zigmundo 2 หลายเดือนก่อน
@@bobbyshen7826 but no wage??
@vinayakjaiswal9976 2 หลายเดือนก่อน
Sir , We need a series on diffusion model
Lots of love from india 🇮🇳🇮🇳
@damagejacked 2 หลายเดือนก่อน ⁺¹
Hallucination would seem to be an endogenous issue of the various models.
Painting broad brush and not capturing esoterica would seem to be an exogenous issue.
One test of the latter would be whether a model can tell the difference between a Kangol beret from the film “Jackie Brown,” the red felt beret of NYC’s Guardian Angels, and Prince’s “Raspberry Beret.” If a model gets that wrong, it’s on the wrong side, potentially offensively on the wrong side, of long-standing real-world social issues. Ultimately, that and other seemingly subtle semiotic fails are catastrophic fails.
@tomasg920 2 หลายเดือนก่อน
Are you a bot?
@damagejacked 2 หลายเดือนก่อน
@@tomasg920beep boop.
@lgl_137noname6 2 หลายเดือนก่อน
Would the python code for this demo be available ?
please advise .
thanks
@benway123 2 หลายเดือนก่อน ⁺¹
Michael Jackson and U2? Wonder if kids know who those are. Still room for improvement.
@TheFulcrum2000 2 หลายเดือนก่อน ⁺³
Room for improvement for the kids not knowing these names yes.
@carlborgen 2 หลายเดือนก่อน
Is it possible to simply create this as a manual prompt instead?
@charabango 2 หลายเดือนก่อน ⁺³
You can do this with simple f-strings in python. I would actually recommend against using langchain, it creates unnecessary abstractions imo
@IceMetalPunk 2 หลายเดือนก่อน
Yes. It's how my Synthia Nova framework works. Every piece of information is gathered in a separate prompt/thread, with info from the other responses interpolated into it. And a manually crafted RAG approach handles the entirety of its memory system (encouraging variety in imagined memories, detecting contradictions between memories, and recalling memories relevant to the current topic).
I've never used LangChain, but I've heard of it. Learning that it seems to do basically the same thing with some syntactic sugar makes me think it's not as useful as the hype would suggest.
@plague9301 2 หลายเดือนก่อน ⁺¹
7 year old me watching this:😮
@dzhiurgis 2 หลายเดือนก่อน
can it understand geographic data?
@sahalshah2913 2 หลายเดือนก่อน ⁺²
Great can you explain the diffusion transformer?
@VoltLover00 2 หลายเดือนก่อน ⁺³
RAG is a band aid for the awfulness of LLMs
@IceMetalPunk 2 หลายเดือนก่อน ⁺²
I wouldn't say that. I'd say it's a middle ground until LLMs have true continual learning.
Every time you Google a question, check Wikipedia, or find answers in a book, you're doing the human equivalent of RAG. And none of those are considered band-aids, they're valid ways of making your information more accurate.
@zzzaphod8507 หลายเดือนก่อน
3 videos posted here in the last 5 months--is everything ok?
@gz6616 2 หลายเดือนก่อน
Is he using a Dell xps13 laptop?
@salmiakki5638 2 หลายเดือนก่อน
*15" looks like
@michaelwilson5742 2 หลายเดือนก่อน
13, Mike is much smaller than people think
@topperthehorse 2 หลายเดือนก่อน ⁺¹
Llama isn't open source. It's more like an academic license.
@jamiequernsify 2 หลายเดือนก่อน
👍
@celari7965 2 หลายเดือนก่อน ⁺²
This seems like it's trying to solve a problem that doesn't exist. If you need to inject a source to get more accuracy, why not just present the source itself? Scrolling to find the part you want isn't hard. That's what the "find" function is for.
@TaliZorahVasectory 2 หลายเดือนก่อน
*left shoulder tug*
@some_random_loser 2 หลายเดือนก่อน ⁺¹
wait, it's just… chaining prompts with huge slabs of text from an external repository? with templates? that's it? that's the thing that they're doing?
wow, that's… that's not a lot, conceptually.
@User_36557 2 หลายเดือนก่อน
GCM
@MichaelOfRohan 2 หลายเดือนก่อน
Im not first, but im not such a lazy sod that I just clicked a recent emoji to win the game.. >_>
@BooleanDisorder 2 หลายเดือนก่อน
🤯
@Alchemetica 2 หลายเดือนก่อน ⁺²
Does the Prof make the code available so we can learn from it? Your fan base is calling for the code. PS I am not interested in football.
@lotus-chain 2 หลายเดือนก่อน
Like LotusAI search engine! 🔎🔎
@griffinmelchior6872 2 หลายเดือนก่อน
0 hour
@sabatoge17 2 หลายเดือนก่อน
That’s a g. 😂
@FusionC6 2 หลายเดือนก่อน ⁺¹
stopped the video when he said langchain
@gustafsundberg 2 หลายเดือนก่อน
You stopped to get a snack or you do not like langchain, if so can you elaborate?
Best regards.
@FusionC6 2 หลายเดือนก่อน ⁺¹
@@gustafsundberg langchain is overly complicated and completely unnecessary. i was told to ditch it like the plague by many folks in the industry when i started my local ai project.
@ZandarKoad 2 หลายเดือนก่อน ⁺²
Yeah, langchain had it's day in the spotlight. But it's just too bloated. Just build what you need for your project. Nothing inside langchain is magic. It is an unnecessary dependency in most cases.
@greenstonegecko 2 หลายเดือนก่อน ⁺¹
Lots of people say LLM's are reaching their limit. I don't think so.
Techniques like these aid LLM's in their flaws. Ofcourse LLM's can't apply logic yet... but we can get very close. I like to stay up to date with these potential next-gen LLM technology.
@SentientTurtle 2 หลายเดือนก่อน ⁺²
The main concern with RAG is what Mike touches upon right at the start; The AI is being fed by some retrieval, but it's universally better for the user to just get those retrieval search results directly without the AI summary, because the summary can't be trusted; The AI might hallucinate. The RAG might have pulled up utter nonsense as a source. (e.g. spam websites)
@darthvader4899 2 หลายเดือนก่อน
Early
@anubisai 2 หลายเดือนก่อน
Lot of half truths and assumptions in this explanation...
@JNET_Reloaded 2 หลายเดือนก่อน
can you share the code used?

ต่อไป

เล่นอัตโนมัติ