I made a program last year that uses RAG to help me study for an exam in refrigeration. I had 4 textbooks that were each 500+ pages. Finding the right page for a topic in my course to learn a specific concept was a nightmare. the textbooks chapters were all over the place. I converted the books to embeddings and stored them in a database. I then would ask a question about my refrigeration concept that I wanted to learn, create an embedding for my query and using a comparison algorithm I would retrieve 10 or 15 of the most mathematically similar textbook pages. After retrieving the textbook pages, I fed the text from those pages and my query into an LLM and an answer would spit out for me. It was a great way to learn my niche subject of refrigeration and helped me pass my exam. asking the same question to the LLM alone without the retrieved textbook pages to assist in the context was not giving me reliable answers.
Did you do all of that on your own computer? Do you have a really high end graphics card? Also, that's really cool, but it also sounds like the RAG part was what fixed most of your problem, with the LLM just being there for some comfort (but sacrificing accuracy).
The presented example wasn't quite RAG. You're just putting more text into the context window. This method quickly falls short if you need to process a big set of reference data, like an entire PDF documentation. Real RAG is a bit more complicated and involves an additional step of converting the reference data to tokens that can be stored, then during inference you first convert the query to tokens, then find best matches with stored data, then use that search to generate excerpts from the original data to feed into your final inference window.
I see you point. Typically RAG uses an existing 'library' if you will of vectorized documents, and that's not what he did here. ... Or maybe he did? Under the hood, Google definitely uses some form of vectorized approached to web keyword relevance search. So in this case the vectorized library of documents is essentially Google's index. He performed a vectorized search across reference data tokens (albeit through a third party, Google). Does RAG need to include 1st party implementation of the reference data tokenization in order to be considered RAG? Probably not...
Vectorisation is not the point. The point is to provide context alongside the query for better results. Though I will point out that the tricky part in all of this is finding the relevant context and that is where the whole “matching the vectorised query with vectorised document store comes in.”
Yeah as lakshitdagar said it's still RAG, but he just did the "retrieval" manually to get the additional context haha. This is so much harder when you need to query for this information, which is where vectors are helpful, but don't always give you the results you need.
Out of all the people they have Mike is the best (IMO) it would be awesome to do a segment with him on how models like Stable Video Diffusion Image-to-Video work
I worked on a RAG to make product recommendations, but eventually I was supplying it with too much data as context and it wouldn't work. I settled on a neat solution: use GPT's ability to call functions and tell it something like, "when the user asks for a recommendation, call the get_recommendations function with a summary of the user's query". It's cool that it gave me a summary because the embeddings are much better than those of a whole sentence or paragraph. So I could take that embedding and look up products based on semantic similarity to the user's query, while it was still generating a response, and then pass the top 10 back to GPT for it to show the user
The problem with RAG and LLM's are the same. The risk is that the user takes what is said at face value. Where RAG really can improve the situation is if the source is provided. If you have a group of formal documents (such as documents for company procedure) then you should always state the source of that document. This not only improves the trust of the model, but also narrows down where the user needs to look. If it is just a black box, it can be hard for the user to know whether the RAG worked or whether it was hallucinating.
The issue here is that if you always have to check the sources ... the tool should just provide *only* the sources; Such as a conventional search engine would.
I had a first-hand experience with that a couple of months ago with a google search. I asked what the crew positions were on a B-29 bomber, and what Bing came back with were mostly sensible, but it had some weird ones. It listed "Radar Operator" and "Flight Surgeon" as two of the positions. Now perhaps some later B-29's may have had a radar, but flight surgeon is such a specialized medical profession that it's clearly ridiculous. Not to Bing, though, apparently.
It's surprisingly bare bones as an approach. I was expecting something more sophisticated than just sticking the context as part of a promt and literally telling the model to use it in the answer. Reminds me of "promp engineers" sticking a _"and please don't lie"_ at the end of a prompt to decrease hallucinations 😂
This is the most basic version of RAG that is shown here. Most common version of this also uses a vector database in order to actually find the correct, or most likely correct information (which allows for questions to be a lot more indirect than in our Wikipedia example where you grab a specific article). Then, over the last 6 months or so we have seen a rise in what is most commonly called "Tools" (sometimes called Function Modules or similar). Thats when you tell the LLM what kind of data (and actions) it can take, and let the LLM request data rather than just getting fed the data passively. To me, the main thing of this is that the next leaps in improvements for LLM's are unlikely to be about the LLM's themselves, but rather us getting better at feeding it the correct data at the correct time. Simply put, AI Data Orchestration.
RAG is not Prompt Engineer, it's Search and Retrievel, Indexing and NLP, the plugging in the prompt aspect is moreso the final output of the whole rag archi.
Even vector databases are not terribly complex. In my opinion as someone who has implemented RAG, companies scrambling to make money off the AI gold rush are playing up both how complicated it is and how effective it is. The technique itself is extremely straightforward, but the implementation is a bit tricky to get right. Marketing departments would have you believe it's a silver bullet, and it's just not.
I remember having a whole box of the green printer paper. A family friend worked at the state and gave it to me for drawing, etc. some of it had phone numbers and addresses. Long ago in the city dump now.
Mike, if when you're finished your career in academia and if you find yourself bored, please considering starting a TH-cam channel explaining literally anything vaguely related to computing!
More than a few people have been saying recently that yes, most of the LLM output for generic questions is pretty rubbish, but now imagine what happens when most of what they are trained upon is also LLM output. Almost certainly the quality of any results is going to get exponentially worse, no?
@@HelloThere-xs8ss Yes, or a negative feedback loop. Since over time it will use more and more of it's own degrading content. Hopefully there is some way to prevent that though, I would like it to get better not worse!
Expertise will still remain and exist at the level of first-hand experimentation with real-world data. So whatever it is you are trying to learn/model/understand/improve upon, at some point you MUST come up with scenario(s) in which you can test your understanding/models/theories against the physical world (aka REALITY) to validate all the pseudo-knowledge contained in the LLMs and vectorized libraries out there. These testing scenarios ideally should be rigorous enough to construct complete confusion matrixes that fully characterize the level of precision and recall in your understanding/models/theories.
Doesn't RAG make an LLM more susceptible to prompt injection hijacking. If I can get an LLM to grab data that I control that itself includes prompt injection attacks, then RAG is giving me a way to possibly bypass some of the prompt sanitation that is built into the LLM general interface. The hacker WunderWuzzi seems to leverage these edge cases in a lot of his recent AI security research.
I mean, yes? But also, if a malicious user is able to intercept the data gathering, they'd probably have an easier time just prompt engineering an attack anyway. You may be surprised at how easy it is to bypass prompt-level sanitization by just changing up the formatting away from what's expected...
It would be interesting to get a sense of how much the context helps. What -would- the answer have been without it? If I really did have context about things the model itself could not have learned, how does it do?
Given that LLMs hold knowledge in their weights, but are also taking in knowledge through RAG I wonder how those things interact if the sources of knowledge conflict... Specifically what happens if an LLMs learned knowledge falls behind compared to stuff like Wikipedia that's updated constantly? Or on the opposite end, can you poison a model using RAG by deliberately feeding in bad knowledge as part of that additional context...
Depends on the model. In my experience, the search - focused ones will favor the retrieved data, but chat-focused models are more likely to point out a conflict in what it knows, or simply say that "according to the data,...) and not mention anything that wasn't included in it until receiving a follow up prompt.
"ChatGPT is more than double the size." Yeah, much more; like 21 times the size 😂 Also, you have 33GB of VRAM on a laptop? I'm envious 😅 My *desktop* only has 8GB and it's limited my ability to play with local models.
There is no GPU (laptop or desktop) that has 33GB of VRAM. The max is 24 GB. Unless you want to spend $5,000 or more on a workstation card that can have 48GB. And that's not going in any laptop.
a little bit of trivia for you: Jane Street was where Sam Bankman-Fried and Caroline Ellison honed their quantitative trading 'skills', before leaving to run Sam's newly founded company, Alameda Research, which was a complete disaster.
9:22 Something about writing the initial prompt to the LLM in second person has always rubbed me the wrong way. Wouldn't it play a lot better to the strengths of an LLM to write a prompt like "below is a transcript of a conversation where a chatbot successfully answers a user's question" rather than a prompt like "You are an AI assistant who answers questions"? I understand that instruction-tuned models are tuned to handle these second-person prompts, but it seems like a weird stopgap.
The LLMs have been fine-tuned to act as 2nd person assistants, because this is a paradigm many people already understand. It is very natural and easy for users to treat the system as another person. It didn't (and doesn't) need to be this way, but if you know the LLM model you are working with has been trained in this fashion, it's best to lean into it.
@@ZandarKoad -Ok but this was using stable diffusion, right? Which doesn’t have an assistant part as a user-interface?- edit: thought this was from a different thread, nvm
RAG without mentioning Vector Embeddings, Cosine Similarity and Principal Component Analysis feels so incomplete. Ollama can be used directly by creating a new model based on Llama from the command line. Langchain is nothing more than smoke and mirrors hiding the important stuff...
@@little_cal_bear Embeddings are high dimensional. For fast searches you need an index. PCA reduces dimensions and can be used for creating indexes or clustering vectors.
Maybe this is not the right place to ask for this but what courses or materials do you recommend to learn more about Gen AI, in particular, creating applications such as Midjourney or any other Gen AI that generates an output, could be an image, music, etc.?
Hallucination would seem to be an endogenous issue of the various models. Painting broad brush and not capturing esoterica would seem to be an exogenous issue. One test of the latter would be whether a model can tell the difference between a Kangol beret from the film “Jackie Brown,” the red felt beret of NYC’s Guardian Angels, and Prince’s “Raspberry Beret.” If a model gets that wrong, it’s on the wrong side, potentially offensively on the wrong side, of long-standing real-world social issues. Ultimately, that and other seemingly subtle semiotic fails are catastrophic fails.
Yes. It's how my Synthia Nova framework works. Every piece of information is gathered in a separate prompt/thread, with info from the other responses interpolated into it. And a manually crafted RAG approach handles the entirety of its memory system (encouraging variety in imagined memories, detecting contradictions between memories, and recalling memories relevant to the current topic). I've never used LangChain, but I've heard of it. Learning that it seems to do basically the same thing with some syntactic sugar makes me think it's not as useful as the hype would suggest.
I wouldn't say that. I'd say it's a middle ground until LLMs have true continual learning. Every time you Google a question, check Wikipedia, or find answers in a book, you're doing the human equivalent of RAG. And none of those are considered band-aids, they're valid ways of making your information more accurate.
This seems like it's trying to solve a problem that doesn't exist. If you need to inject a source to get more accuracy, why not just present the source itself? Scrolling to find the part you want isn't hard. That's what the "find" function is for.
wait, it's just… chaining prompts with huge slabs of text from an external repository? with templates? that's it? that's the thing that they're doing? wow, that's… that's not a lot, conceptually.
@@gustafsundberg langchain is overly complicated and completely unnecessary. i was told to ditch it like the plague by many folks in the industry when i started my local ai project.
Yeah, langchain had it's day in the spotlight. But it's just too bloated. Just build what you need for your project. Nothing inside langchain is magic. It is an unnecessary dependency in most cases.
Lots of people say LLM's are reaching their limit. I don't think so. Techniques like these aid LLM's in their flaws. Ofcourse LLM's can't apply logic yet... but we can get very close. I like to stay up to date with these potential next-gen LLM technology.
The main concern with RAG is what Mike touches upon right at the start; The AI is being fed by some retrieval, but it's universally better for the user to just get those retrieval search results directly without the AI summary, because the summary can't be trusted; The AI might hallucinate. The RAG might have pulled up utter nonsense as a source. (e.g. spam websites)
I made a program last year that uses RAG to help me study for an exam in refrigeration. I had 4 textbooks that were each 500+ pages. Finding the right page for a topic in my course to learn a specific concept was a nightmare. the textbooks chapters were all over the place. I converted the books to embeddings and stored them in a database. I then would ask a question about my refrigeration concept that I wanted to learn, create an embedding for my query and using a comparison algorithm I would retrieve 10 or 15 of the most mathematically similar textbook pages. After retrieving the textbook pages, I fed the text from those pages and my query into an LLM and an answer would spit out for me. It was a great way to learn my niche subject of refrigeration and helped me pass my exam. asking the same question to the LLM alone without the retrieved textbook pages to assist in the context was not giving me reliable answers.
That’s cool thanks for sharing.
Are you sure you want to make refrigeration your career ? 😂 Impressive !
@@MrBoubource trades are not a bad career especially when tech is so saturated. Besides you can always switch later if the opportunity arises.
This could have been your phd project for you computer science degree ^_^
Did you do all of that on your own computer? Do you have a really high end graphics card?
Also, that's really cool, but it also sounds like the RAG part was what fixed most of your problem, with the LLM just being there for some comfort (but sacrificing accuracy).
The presented example wasn't quite RAG. You're just putting more text into the context window. This method quickly falls short if you need to process a big set of reference data, like an entire PDF documentation. Real RAG is a bit more complicated and involves an additional step of converting the reference data to tokens that can be stored, then during inference you first convert the query to tokens, then find best matches with stored data, then use that search to generate excerpts from the original data to feed into your final inference window.
I see you point. Typically RAG uses an existing 'library' if you will of vectorized documents, and that's not what he did here. ... Or maybe he did? Under the hood, Google definitely uses some form of vectorized approached to web keyword relevance search. So in this case the vectorized library of documents is essentially Google's index. He performed a vectorized search across reference data tokens (albeit through a third party, Google). Does RAG need to include 1st party implementation of the reference data tokenization in order to be considered RAG? Probably not...
Vectorisation is not the point. The point is to provide context alongside the query for better results.
Though I will point out that the tricky part in all of this is finding the relevant context and that is where the whole “matching the vectorised query with vectorised document store comes in.”
Yeah as lakshitdagar said it's still RAG, but he just did the "retrieval" manually to get the additional context haha. This is so much harder when you need to query for this information, which is where vectors are helpful, but don't always give you the results you need.
Out of all the people they have Mike is the best (IMO) it would be awesome to do a segment with him on how models like Stable Video Diffusion Image-to-Video work
The word "Strawberry" actually has two R's. I apologize for any confusion caused earlier. - Chat GPT
In 2025: "Let me search the internet first..." (internet: full of LLM generated slop) "...no I was right, it's two Rs"
Lol yea it always has trouble counting the letters in a word.
LLM's are also really bad at anagrams and crossword puzzles.
Sounds legit. Let's add that fact to Wikipedia.
Sorry for confusion but i suppose there is no 'r' in stawbey
I worked on a RAG to make product recommendations, but eventually I was supplying it with too much data as context and it wouldn't work.
I settled on a neat solution: use GPT's ability to call functions and tell it something like, "when the user asks for a recommendation, call the get_recommendations function with a summary of the user's query". It's cool that it gave me a summary because the embeddings are much better than those of a whole sentence or paragraph. So I could take that embedding and look up products based on semantic similarity to the user's query, while it was still generating a response, and then pass the top 10 back to GPT for it to show the user
The problem with RAG and LLM's are the same. The risk is that the user takes what is said at face value.
Where RAG really can improve the situation is if the source is provided.
If you have a group of formal documents (such as documents for company procedure) then you should always state the source of that document.
This not only improves the trust of the model, but also narrows down where the user needs to look.
If it is just a black box, it can be hard for the user to know whether the RAG worked or whether it was hallucinating.
The issue here is that if you always have to check the sources ... the tool should just provide *only* the sources; Such as a conventional search engine would.
I had a first-hand experience with that a couple of months ago with a google search. I asked what the crew positions were on a B-29 bomber, and what Bing came back with were mostly sensible, but it had some weird ones. It listed "Radar Operator" and "Flight Surgeon" as two of the positions. Now perhaps some later B-29's may have had a radar, but flight surgeon is such a specialized medical profession that it's clearly ridiculous. Not to Bing, though, apparently.
I personally view this as a feature, not a bug.
The problem might be with humans always taking what is said at face value.
Do you mean state the source document? Not "of that document"?
Feels illegal to be this early to Prof. Pound's lectures
Heh😢
Early access to pound town is an honor
Me like being criminal 🥷
Arriving early for Prof. Pounds lecture sounds like a title for a video on a different site
8:12 "Langchain does a lot of other stuff that I'm not using"...langchain in a nutshell
It's surprisingly bare bones as an approach. I was expecting something more sophisticated than just sticking the context as part of a promt and literally telling the model to use it in the answer. Reminds me of "promp engineers" sticking a _"and please don't lie"_ at the end of a prompt to decrease hallucinations 😂
This is the most basic version of RAG that is shown here. Most common version of this also uses a vector database in order to actually find the correct, or most likely correct information (which allows for questions to be a lot more indirect than in our Wikipedia example where you grab a specific article).
Then, over the last 6 months or so we have seen a rise in what is most commonly called "Tools" (sometimes called Function Modules or similar). Thats when you tell the LLM what kind of data (and actions) it can take, and let the LLM request data rather than just getting fed the data passively.
To me, the main thing of this is that the next leaps in improvements for LLM's are unlikely to be about the LLM's themselves, but rather us getting better at feeding it the correct data at the correct time. Simply put, AI Data Orchestration.
There are more elaborate versions that use vector embeddings, and actually that's what i was expecting
RAG is not Prompt Engineer, it's Search and Retrievel, Indexing and NLP, the plugging in the prompt aspect is moreso the final output of the whole rag archi.
@simon5007
Oh, I see. That's more like what I was expecting, and makes more sense.
Even vector databases are not terribly complex. In my opinion as someone who has implemented RAG, companies scrambling to make money off the AI gold rush are playing up both how complicated it is and how effective it is. The technique itself is extremely straightforward, but the implementation is a bit tricky to get right. Marketing departments would have you believe it's a silver bullet, and it's just not.
Now we need a video on fine tuning!
And adversarial tuning.
I remember having a whole box of the green printer paper. A family friend worked at the state and gave it to me for drawing, etc. some of it had phone numbers and addresses. Long ago in the city dump now.
Mike, if when you're finished your career in academia and if you find yourself bored, please considering starting a TH-cam channel explaining literally anything vaguely related to computing!
I'm a simple person.
I see Mike Pound, I click on the video.
More than a few people have been saying recently that yes, most of the LLM output for generic questions is pretty rubbish, but now imagine what happens when most of what they are trained upon is also LLM output. Almost certainly the quality of any results is going to get exponentially worse, no?
As more content becomes generated, LLMs are more often going to be using self provided information as training data.
I think that's called bias right?
@@HelloThere-xs8ss Yes, or a negative feedback loop. Since over time it will use more and more of it's own degrading content. Hopefully there is some way to prevent that though, I would like it to get better not worse!
Expertise will still remain and exist at the level of first-hand experimentation with real-world data. So whatever it is you are trying to learn/model/understand/improve upon, at some point you MUST come up with scenario(s) in which you can test your understanding/models/theories against the physical world (aka REALITY) to validate all the pseudo-knowledge contained in the LLMs and vectorized libraries out there. These testing scenarios ideally should be rigorous enough to construct complete confusion matrixes that fully characterize the level of precision and recall in your understanding/models/theories.
Funny. just had to do this in a hackathon last week :)
The fact this channel doesnt have daily uploads is sad af
Doesn't RAG make an LLM more susceptible to prompt injection hijacking. If I can get an LLM to grab data that I control that itself includes prompt injection attacks, then RAG is giving me a way to possibly bypass some of the prompt sanitation that is built into the LLM general interface. The hacker WunderWuzzi seems to leverage these edge cases in a lot of his recent AI security research.
I mean, yes? But also, if a malicious user is able to intercept the data gathering, they'd probably have an easier time just prompt engineering an attack anyway. You may be surprised at how easy it is to bypass prompt-level sanitization by just changing up the formatting away from what's expected...
I am JUST now studying this. This very uncanny computerphile
they're in your walls
It would be interesting to get a sense of how much the context helps. What -would- the answer have been without it? If I really did have context about things the model itself could not have learned, how does it do?
Given that LLMs hold knowledge in their weights, but are also taking in knowledge through RAG I wonder how those things interact if the sources of knowledge conflict... Specifically what happens if an LLMs learned knowledge falls behind compared to stuff like Wikipedia that's updated constantly? Or on the opposite end, can you poison a model using RAG by deliberately feeding in bad knowledge as part of that additional context...
Depends on the model. In my experience, the search - focused ones will favor the retrieved data, but chat-focused models are more likely to point out a conflict in what it knows, or simply say that "according to the data,...) and not mention anything that wasn't included in it until receiving a follow up prompt.
Always trust someone who writes on green fanfold paper
What's used for the animation at 6:57?
Jane street? OCAML MENTIONEDDD
Prof. Pound looks younger by the day
@4:19 is "inverted commas" the same in UK English as "quotes" in US English? I.e. that previous sentence used inverted commas twice
"ChatGPT is more than double the size." Yeah, much more; like 21 times the size 😂
Also, you have 33GB of VRAM on a laptop? I'm envious 😅 My *desktop* only has 8GB and it's limited my ability to play with local models.
He said it’s running on their servers I believe.
There is no GPU (laptop or desktop) that has 33GB of VRAM. The max is 24 GB.
Unless you want to spend $5,000 or more on a workstation card that can have 48GB. And that's not going in any laptop.
a little bit of trivia for you:
Jane Street was where Sam Bankman-Fried and Caroline Ellison honed their quantitative trading 'skills', before leaving to run Sam's newly founded company, Alameda Research, which was a complete disaster.
9:22 Something about writing the initial prompt to the LLM in second person has always rubbed me the wrong way. Wouldn't it play a lot better to the strengths of an LLM to write a prompt like "below is a transcript of a conversation where a chatbot successfully answers a user's question" rather than a prompt like "You are an AI assistant who answers questions"?
I understand that instruction-tuned models are tuned to handle these second-person prompts, but it seems like a weird stopgap.
Apple apparently uses a third person prompt.
The LLMs have been fine-tuned to act as 2nd person assistants, because this is a paradigm many people already understand. It is very natural and easy for users to treat the system as another person. It didn't (and doesn't) need to be this way, but if you know the LLM model you are working with has been trained in this fashion, it's best to lean into it.
@@ZandarKoad -Ok but this was using stable diffusion, right? Which doesn’t have an assistant part as a user-interface?- edit: thought this was from a different thread, nvm
RAG without mentioning Vector Embeddings, Cosine Similarity and Principal Component Analysis feels so incomplete.
Ollama can be used directly by creating a new model based on Llama from the command line.
Langchain is nothing more than smoke and mirrors hiding the important stuff...
How is principal component analysis relevant?
@@little_cal_bear Embeddings are high dimensional. For fast searches you need an index. PCA reduces dimensions and can be used for creating indexes or clustering vectors.
🎉@@froop2393
Maybe this is not the right place to ask for this but what courses or materials do you recommend to learn more about Gen AI, in particular, creating applications such as Midjourney or any other Gen AI that generates an output, could be an image, music, etc.?
If you try to learn from courses or classes, you'll always be at least 1-2 years behind current tech.
Can you do Retrieval Interleaved Generation
I saw a great AI talk about RAG last week.
Can you please make a video on llm agents?
What they talk about in this video is not a RAG
Completely not the main topic, but what font are you using in Sublime?
Jane street only cover expenses?
I believe they cover all expenses
@@bobbyshen7826 but no wage??
Sir , We need a series on diffusion model
Lots of love from india 🇮🇳🇮🇳
Hallucination would seem to be an endogenous issue of the various models.
Painting broad brush and not capturing esoterica would seem to be an exogenous issue.
One test of the latter would be whether a model can tell the difference between a Kangol beret from the film “Jackie Brown,” the red felt beret of NYC’s Guardian Angels, and Prince’s “Raspberry Beret.” If a model gets that wrong, it’s on the wrong side, potentially offensively on the wrong side, of long-standing real-world social issues. Ultimately, that and other seemingly subtle semiotic fails are catastrophic fails.
Are you a bot?
@@tomasg920beep boop.
Would the python code for this demo be available ?
please advise .
thanks
Michael Jackson and U2? Wonder if kids know who those are. Still room for improvement.
Room for improvement for the kids not knowing these names yes.
Is it possible to simply create this as a manual prompt instead?
You can do this with simple f-strings in python. I would actually recommend against using langchain, it creates unnecessary abstractions imo
Yes. It's how my Synthia Nova framework works. Every piece of information is gathered in a separate prompt/thread, with info from the other responses interpolated into it. And a manually crafted RAG approach handles the entirety of its memory system (encouraging variety in imagined memories, detecting contradictions between memories, and recalling memories relevant to the current topic).
I've never used LangChain, but I've heard of it. Learning that it seems to do basically the same thing with some syntactic sugar makes me think it's not as useful as the hype would suggest.
7 year old me watching this:😮
can it understand geographic data?
Great can you explain the diffusion transformer?
RAG is a band aid for the awfulness of LLMs
I wouldn't say that. I'd say it's a middle ground until LLMs have true continual learning.
Every time you Google a question, check Wikipedia, or find answers in a book, you're doing the human equivalent of RAG. And none of those are considered band-aids, they're valid ways of making your information more accurate.
3 videos posted here in the last 5 months--is everything ok?
Is he using a Dell xps13 laptop?
*15" looks like
13, Mike is much smaller than people think
Llama isn't open source. It's more like an academic license.
👍
This seems like it's trying to solve a problem that doesn't exist. If you need to inject a source to get more accuracy, why not just present the source itself? Scrolling to find the part you want isn't hard. That's what the "find" function is for.
*left shoulder tug*
wait, it's just… chaining prompts with huge slabs of text from an external repository? with templates? that's it? that's the thing that they're doing?
wow, that's… that's not a lot, conceptually.
GCM
Im not first, but im not such a lazy sod that I just clicked a recent emoji to win the game.. >_>
🤯
Does the Prof make the code available so we can learn from it? Your fan base is calling for the code. PS I am not interested in football.
Like LotusAI search engine! 🔎🔎
0 hour
That’s a g. 😂
stopped the video when he said langchain
You stopped to get a snack or you do not like langchain, if so can you elaborate?
Best regards.
@@gustafsundberg langchain is overly complicated and completely unnecessary. i was told to ditch it like the plague by many folks in the industry when i started my local ai project.
Yeah, langchain had it's day in the spotlight. But it's just too bloated. Just build what you need for your project. Nothing inside langchain is magic. It is an unnecessary dependency in most cases.
Lots of people say LLM's are reaching their limit. I don't think so.
Techniques like these aid LLM's in their flaws. Ofcourse LLM's can't apply logic yet... but we can get very close. I like to stay up to date with these potential next-gen LLM technology.
The main concern with RAG is what Mike touches upon right at the start; The AI is being fed by some retrieval, but it's universally better for the user to just get those retrieval search results directly without the AI summary, because the summary can't be trusted; The AI might hallucinate. The RAG might have pulled up utter nonsense as a source. (e.g. spam websites)
Early
Lot of half truths and assumptions in this explanation...
can you share the code used?