Hey Matt - I know I'm just an internet stranger in an endless ocean of internet noise, but I just wanted to drop you a comment to let you know I've really enjoyed your videos. You have a casual approach to production, or maybe it could better be described as a thorough planning and preparation process that results in a casual vibe for the viewer, and I dig the nuggets of wisdom I've gleaned over the past few months I've been watching your content. I work in tech (professionally) myself, and I mostly recreate in tech as well. I have been all over the place with my inspiration and dabbling over the past year and change. Image and video generation, LLMs, open source architectures and paid platforms, basically whatever I stumble across that looks nifty. I've recently been seeking some new inspiration, and your videos have been a breath of fresh air to watch and get the gears turning for me as I consider what I could dig into for a next project. I'm not a "developer" either but I've had great fun with Python and Ollama, and you explain how to use these tools in a manner that is very approachable. Keep up the great work.
The most amazing video I have watched in over a year. Your style of educating users is amazing. Great videography and editing. My only wish is that your channel shoots the moon 🚀and you get rewarded from the YT algorithm and compensated.
PDF's are a nightmare. I started building a semantic search platform for PDFs back in 2017. This needed to process thousands of PDFs a day all within a very short time frame of half an hour. Then i added some early AI to recognize entities and sentiment. Now i am tasked to add LLMs to the mix. The result is a massive project with many moving parts, it's polyglot.. Java, Python, Rust JS. Uses old and new NLP and AI technologies. I hate it though there is no other way pull off what i need to do without all of these moving parts. It's also too expensive to run all of this in a cloud. I feel for anyone tasked with a large RAG project and cringe a bit when i see people trying to throw a bunch of PDFs into an LLM. don't do it! . Ollama has been very helpful as this system progresses. Thank you!
yeah, i wonder if it will be a big change or incremental. A lot of the science behind whats going on was started being research 60 -70 years ago. And the core concepts of language models and how they work was from the early 1990's. There was the transformers paper about 8 years ago which was evolutionary rather than revolutionary, but is the big change that got us to here. So i could see it going both ways. Maybe as things are moving fast the mythical AGI is only a decade away, or maybe its much further. who knows. Exciting time to be making videos.
Tashi Dalek! I figured out from your background picture hanging on the side wall. Great video. I recently been using ollama at work and I am loving it.
I really like your clear, measured, logical presentation style. This is a great, informative, video that will help get anyone up and running with RAG and chroma db, quickly, without getting bogged down in langchain, which does not seem necessary for this task, and yet is often lazily used, along with openAI. My questions would be: (i). Why not use e.g. llama index? Is that not a more powerful approach, especially if one is smart/targeted in the way one constructs indices for particular subsets of the data. (Ii). Should one finetune the embeddings model for the use case? E.g. specific tuned embedding models e.g. extracting particular information from annual reports, for example, one model to retrieve corporate compensation/options data, another for segmental/divisional data, and another for analysing notes to accounts/accounting standards etc. (Iii). Pre-processing data e.g. using eg pdf plumber to extract all tables from annual report, locate and extract each relevant section of annual report eg management report, notes etc. and then query relevant section for information sought. (Iv). Agentic use of models, possibly also fine tuned to the specific data retrieval tasks above. In particular, using several prompts asking the same question differently and passing the e.g. 3 responses back to another model for summarisation to ‘iterate’ to the best response. (V) optimal use of different llms for different tasks. E.g. could one finetune tinydolphin and use that for targeted, faster information retrieval tasks, and then use e.g. mistral for the final combination of responses into summary? (Vi). Basic reasoning applied to data sets. For example, I have my own, custom, ih house financial dat set: say i want to compare the leverage between different competitors in the same subindustry, what mode might be best to do that? Shoul i fine tune the model with examples of my particular analyses and conclusions that i would like to see? Or even, using multiple considerations e.g. company valuation, ‘quality’ metrics, growth, competitive positioning, and scenario analysis, it should be possible to construct a simple, reasoned, investment thesis. Re: (i), I think that you recently did a vid on this. But I have seen a number of seemingly knowledgeable people saying the best approach is to just finetune bert as it uses encoding and is the best starting point. Apologies if that sounds confused: it probably is, I am new to this area.
Great stuff. Here’s the issue: Most of the data I want to review like contracts, zoning laws…etc are in PDFs. So, the RAG apps I want to build will be for getting data out of PDFs. So, anything you can do on that front would be great.
your best bet is to find the source documents with the full text. PDFs are never the source. The amount of cleanup required to get good info will take longer. In some cases you may get lucky.
If you're on a Mac with homebrew, trying installing the "gc" package (Ghostscript) (brew install gc). Similar on Linux, using whatever package manager is appropriate. Ghostscript provides the "ps2ascii" tool - just call that giving it the input PDF filename and an output (text) filenaname as arguments and it will perform the translation. If your PDF is mostly just text, the output is usually pretty good. If there are lots of "design elements" within the PDF - not so much. For your type of content, it may do pretty well. You casn script this with zsh/bash to convert whole folders of PDF files to text quickly. Good luck.
It is unfortunate that you need to go through hoops like that. I hope to find a better way that works that doesn’t require a horrible approach like that.
I have a shell script called summerize_pdf which is pdf2text $1 | ollama run mistral "summarize this in 2000 words or less" pdf2text is a python program which is: !/usr/bin/env python3 import sys from PyPDF2 import PdfReader # Use `PdfReader` instead of `PdfFileReader` for more recent versions def extract_text(pdf_path): with open(pdf_path, 'rb') as f: reader = PdfReader(f) num_pages = len(reader.pages) # Get the total number of pages in the PDF document text = "" for page_num in range(num_pages): # Iterate over each page page = reader.pages[page_num] text += page.extract_text() + " " # Append the text from this page to the full text string, followed by a newline return num_pages,text def main(): if len(sys.argv) != 2: print("hi Usage: pdf_text_extractor.py ") return filename = sys.argv[1] # Get the PDF filename from the command line arguments numPages, text = extract_text(filename) # Extract text from the file print("Total pages: ", numPages) print("Extracted Text: ", text) # Print out the extracted text if __name__ == "__main__": main() Not elegant but it gets the job done.
Unfortunately it uses pypdf which does a terrible job for most PDFs. Sometimes it works ok, but way too often the text is jumbled up. Since many PDFs are simply images, an OCR step is often needed. I think most who think pypdf works don't actually look at the resulting text.
Great video! any suggestions regarding what embeddings to use if my RAG app is to consume pdfs, any document in spanish, i have tried nomic-text, fastembeddings and all-minilm for sentence transformers, but all of them fail to retrieve a good answer from chroma using search, similarity_search or similarity_search_whith relevant score, I have tried using only english language pdfs and works fairly ok
I launch the ChromaDB in a separate terminal within VS Code. Then I run the import.py script in a different terminal. When I run the script, I receive a Errno 61, however when I look at the logs of the localhost port 8000 ChromaDB server, I get multiple 200 API Responses. Is there any troubleshooting as to why it would generate 200 Responses while still erroring in the "for index, chunk in enumerate(chunks_: loop?
Thanks for your video, I think I understand the process of embedding. Is there a way to use the embedded docs with an API call? I want tot write a Winform App in C# and therefore a API call would come in handy.
you wouldn't use them directly. You can generate the embedding but then you need to put it somewhere. That is often a vector db. the model can't do anything with the embedding itself. you use the embedding to do a similarity search and then use the source in the model
@@technovangelist Ok, my question was misleading: I now got your scripts running, and I have the documents in ChromaDB. Is there a way to use the Ollama API to talk to my documents in the DB instead of using a python script to do so? I wrote a small WinForm app in C# to talk to my models via the Ollama API, but I don't see a way to use the API to support this talks with my documents in the DB.
Hello Matt would chunking knowledge into logical units like paragraphs or chapters not better than chopping after so many sentences? Could use llm and instruct it to do the chopping more intelligent or use nlp software for that. Did you consider this?
Would you use RAG for text categorization? For instance, would you create embeddings of all Twitter posts and then categorize each? And how would you retrieve the main categories of each day?
Yes, RAG can be a powerful tool for text categorization. For example, you could create embeddings of all Twitter posts using a model like Ollama or other embedding models. Then, you can store these embeddings in ChromaDB. To categorize, you could use similarity searches to cluster similar tweets into categories. To retrieve the main categories of each day, you could perform a temporal query on your database (filtering by date) and then run clustering algorithms or use a pre-defined set of categories to match tweets for that day. Let me know if you'd like me to make a detailed video or write some sample code on this.
Hey Matt. Love your work. This one took me a while to get up and running but with just the right amount of cursing I was able to get it up. When I looked at the data in database, there was a lot of blank lines in between text. I assume it would be preferable to strip that out before chunking?
Hi Matt, how would you approach dealing with code as the data you want to put into the vector store? I am thinking that sentence chunks might be function chunks?
Hey I have a deeply nested database which is too large. It can contain any level of nesting depending upon the user needs. I wanted to convert it to vector embeddings such that each nested part can be accessed with the query like show me the users where the task assigned to the user is cleaning, the task is completed and user have a birthday after 03.04.1990 . So I am confused how do I need to convert it to vector db such that the main objects can be found. Which ever videos I have seen on youtube are just working with the single level dataset. But what if we have multilevel database?
Hey Matt, great content, thanks. But what If I want to do RAG with a SQL database (for structured data assistance, for example, stock market prices). Let's say I already have a local SQL RDBS and the stored procedures which results can be injected to the model in order to produce analytic results. Can an AI model execute those stored-procedures and use the results as an input?
Hi Matt. Could you consider a video where you take this local RAG script you've made here and redo as using Langchain to demonstrate the process and if you think the abstraction approach is efficient or helpful for 1)new coders and/or 2) experienced coders?
Hi Matt, I have one question. I am building a chatbot using langchain + chainlit + Ollama + ChromaDB. The only problem is I have JSON data. And when I search I don't get relevant results. In your opinion which embedding model would be best to achieve decent results? Thanks in Advance!
Any embed model should be fine. Nomic embed text is my favorite. My one recommendation is to get rid of langchain. For simple apps like that there is no value.
Hi Matt, further to the idea of chunks and the use case being code as input to the RAG, how would you think about context of related functions.. thinking that the retrieval could miss the important interdependence of functions..
Hello Math, Thank you for this great video. I try to implement your solution but I am facing issues when using the ollama libray "Connection refused". Are the embedding model and LLM dynamically downloaded from a website with your code or should we do it ourselves before using it ?
Say, we have a database like this, which includes medical criteria for different conditions, examples of cases etc. and we want to use it as a context for LLM. Now we provide a description of a new case and we prompt the model to compare the provided information with the database and suggest a proper diagnosis. Is RAG a good choice in this scenario? RAG + prompt engineering? No-RAG solution? What would be your suggestion?
Hi Matt, I've cloned the project repo for this video, and I'm trying to play along, but I'm running my Ollama service on a separate machine, and I can't figure out where/how I'd specify that in either the config file or the individual ollama.embeddings() and ollama.generate() invocations. Sorry if I've missed something obvious. I have zero experience with Python.
Great video, Matt. This is so cool. One small suggestion: at 6:00, could you please use syntax highlighting in your code? The all white font makes it hard to follow which finctions youre improtong from 3rd party libraries vs UDFs. I think a color scheme similar to what VS Code uses in its default theme would help readability. Thanks again for the excellent videos.
are there enough folks talking about it. whats enough? does it ever come up in any of my circles. a lot of stuff comes up. that never does anymore... 8 months ago the topic came up a bit (not this msft version, but other previous implementations). and it looks like it’s a slight improvement over previous graph rag approaches from others over the last few months..... maybe if it progresses a bit further...
10 หลายเดือนก่อน
Thanks - very good explanations! Would there be any advantages using Ollama RAG Application using Langchain or LlamaIndex?
I have few doubts here: 1. The model always respond to a question which means, if I am asking something outside the vector database, the LLM will respond using the knowledge on which it is been trained. Is there any way to handle this? 2. How to identify the model suitable for RAG, I have tried multiple models some are, extremely slow, some are fast with low quality output. unable to find the right model which can work for a large enterprise application. 3. Is RAG is also good for document summarisation?
if you dont want the model to respond if nothing was found in the db, then don't ask the model if there are no results. easy. Most models can respond well, but its easy to get the chunk size wrong. Too big or too small will result in bad output. Document summarization isn't really something RAG can help with.
Hello Matt, I know this video is already 3 months old, but I hope you can read and answer my comment. I'm building a software to extract information from documents, sometimes using OCR. What sets my software apart is how I use the extracted information. After I get the info in JSON format, I have an AI categorize each document based on a fixed list of categories. However, I often encounter grammatical errors in the text or incorrect category selections. I can manually correct these errors, but I want the model to "learn" from past mistakes and improve over time, mimicking user corrections. My idea is to create a database filled with mistakes and their corrections and somehow give the model access to this data for a pseudo-learning feature. I was considering using Retrieval-Augmented Generation (RAG) for this, though I have no experience with it. Given that the database will be constantly updated with new data, sometimes needing to override old data, and potentially facing other challenges, what approach would you recommend? I'm also open to entirely new approaches. I hope my English is clear enough to convey my message.
@@technovangelist Oh, you mean the source document file. I thought you meant a processed file ready to print. I see what you mean. It can also be latex or asciidoc.
After playing around with RAG I have several questions * What Vector DB is the best option? * Multi-Agent? CrewAI?? * What orchestrator is the best? lanchain, lamaindex * What open source models is the best? * What is ideal workflow? Goal = reliable answers and reduce hallucinations
Well keep watching. For rag orchestrators add complexity without benefit. Which model is best depends on what your needs are and only you can decide. Workflow again is all about you.
I think I watched you say you now use Obsidian. How about a video where you write some Python to ingest your Obsidian Vault to RAG for Ollama LLM access to the content?
Hi Matt, i have the chromadb running on 1 terminal, and on another terminal, I run: python3 import.py however ... Exception: {"error":"ValueError('Collection buildragwithpython does not exist.')"}
I added """ try: chroma.delete_collection("buildragwithpython") except Exception as e: print("An error occurred:", e) """ in import.py and now I am seeing: """ /import.py", line 23, in chunks = chunk_text_by_sentences(source_text=text, sentences_per_chunk=7, overlap=0 ) Resource punkt not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt') """
doh, you can't delete something that doesn't exist, and it wont till you run the app which you can't run till the thing exists, which it wont until you run it..... fixed. thanks for pointing that out
Great work and explanation Sir. Thanks for sparing your valuable time and the code but could you please add folders for documents and DB in the code where we can add our own files.Sorry I am not a SW guy, just copy the OpenSource codes and try/run them.THX.
The RAG app in this video is designed to work with Ollama and ChromaDB. While the setup focuses on these tools, you can definitely integrate SQLite for training or data retrieval. You would need to write a script to extract data from the SQLite database and then pass it into ChromaDB for indexing.
@@technovangelist Thanks. How do I update the list in the sourcedocs.txt file, pls? I tried to just add a url and save it, but received a 500 Server Error when ran import.py / do you know how to fix this?
On your machine is ideal. But I have another video that shows one option called Brev.dev. See Unlocking The Power Of GPUs For Ollama Made Simple! th-cam.com/video/QRot1WtivqI/w-d-xo.html
Hi great video and will try to use if for a project here at the Metropole de Nice. Source documents ad .docx. A separate question: can you exlain how to get a copilot-like behaiour? Meaning: I ask ollama "summary of the three top news from cnn and corriere.it, in italian and in a radio-newscast style" it performs a google search on the two websites, puts everything in the prompt (or maybe builds an embeddings ? not clear). And gives me the answer.
Docx is a bit better than pdf. Change the extension to zip and unzip it and you have a bunch of xml files and they are much better to pull text out of. Not sure what you mean by copilot behavior. I have used that in vscode but I am not a windows user.
Hello Matt, I know this video is already 3 months old, but I hope you can read and answer my comment. I'm building a software to extract information from documents, sometimes using OCR. What sets my software apart is how I use the extracted information. After I get the info in JSON format, I have an AI categorize each document based on a fixed list of categories. However, I often encounter grammatical errors in the text or incorrect category selections. I can manually correct these errors, but I want the model to "learn" from past mistakes and improve over time, mimicking user corrections. My idea is to create a database filled with mistakes and their corrections and somehow give the model access to this data for a pseudo-learning feature. I was considering using Retrieval-Augmented Generation (RAG) for this, though I have no experience with it. Given that the database will be constantly updated with new data, sometimes needing to override old data, and potentially facing other challenges, what approach would you recommend? I'm also open to entirely new approaches. I hope my English is clear enough to convey my message.
Hey Matt - I know I'm just an internet stranger in an endless ocean of internet noise, but I just wanted to drop you a comment to let you know I've really enjoyed your videos. You have a casual approach to production, or maybe it could better be described as a thorough planning and preparation process that results in a casual vibe for the viewer, and I dig the nuggets of wisdom I've gleaned over the past few months I've been watching your content. I work in tech (professionally) myself, and I mostly recreate in tech as well. I have been all over the place with my inspiration and dabbling over the past year and change. Image and video generation, LLMs, open source architectures and paid platforms, basically whatever I stumble across that looks nifty. I've recently been seeking some new inspiration, and your videos have been a breath of fresh air to watch and get the gears turning for me as I consider what I could dig into for a next project. I'm not a "developer" either but I've had great fun with Python and Ollama, and you explain how to use these tools in a manner that is very approachable. Keep up the great work.
The most amazing video I have watched in over a year. Your style of educating users is amazing. Great videography and editing. My only wish is that your channel shoots the moon 🚀and you get rewarded from the YT algorithm and compensated.
It’s growing faster than I expected so all good. Thanks so much for the comment
PDF's are a nightmare. I started building a semantic search platform for PDFs back in 2017. This needed to process thousands of PDFs a day all within a very short time frame of half an hour. Then i added some early AI to recognize entities and sentiment. Now i am tasked to add LLMs to the mix. The result is a massive project with many moving parts, it's polyglot.. Java, Python, Rust JS. Uses old and new NLP and AI technologies. I hate it though there is no other way pull off what i need to do without all of these moving parts. It's also too expensive to run all of this in a cloud. I feel for anyone tasked with a large RAG project and cringe a bit when i see people trying to throw a bunch of PDFs into an LLM. don't do it! . Ollama has been very helpful as this system progresses. Thank you!
I work in a project to build a RAG from pdf using php. We tested somo tools seperately and they works.
@@BelgranoK may i ask list of tools?
Wow, I can only imagine what things will be like 5 years in the future, thanks for all you do.
yeah, i wonder if it will be a big change or incremental. A lot of the science behind whats going on was started being research 60 -70 years ago. And the core concepts of language models and how they work was from the early 1990's. There was the transformers paper about 8 years ago which was evolutionary rather than revolutionary, but is the big change that got us to here. So i could see it going both ways. Maybe as things are moving fast the mythical AGI is only a decade away, or maybe its much further. who knows. Exciting time to be making videos.
Tashi Dalek! I figured out from your background picture hanging on the side wall. Great video. I recently been using ollama at work and I am loving it.
Ahh. I picked that up on one of my trips to Nepal. My sister used to run a free healthcare clinic in a small town called Jiri east of Kathmandu.
I really like your clear, measured, logical presentation style. This is a great, informative, video that will help get anyone up and running with RAG and chroma db, quickly, without getting bogged down in langchain, which does not seem necessary for this task, and yet is often lazily used, along with openAI.
My questions would be:
(i). Why not use e.g. llama index? Is that not a more powerful approach, especially if one is smart/targeted in the way one constructs indices for particular subsets of the data.
(Ii). Should one finetune the embeddings model for the use case? E.g. specific tuned embedding models e.g. extracting particular information from annual reports, for example, one model to retrieve corporate compensation/options data, another for segmental/divisional data, and another for analysing notes to accounts/accounting standards etc.
(Iii). Pre-processing data e.g. using eg pdf plumber to extract all tables from annual report, locate and extract each relevant section of annual report eg management report, notes etc. and then query relevant section for information sought.
(Iv). Agentic use of models, possibly also fine tuned to the specific data retrieval tasks above. In particular, using several prompts asking the same question differently and passing the e.g. 3 responses back to another model for summarisation to ‘iterate’ to the best response.
(V) optimal use of different llms for different tasks. E.g. could one finetune tinydolphin and use that for targeted, faster information retrieval tasks, and then use e.g. mistral for the final combination of responses into summary?
(Vi). Basic reasoning applied to data sets. For example, I have my own, custom, ih house financial dat set: say i want to compare the leverage between different competitors in the same subindustry, what mode might be best to do that? Shoul i fine tune the model with examples of my particular analyses and conclusions that i would like to see? Or even, using multiple considerations e.g. company valuation, ‘quality’ metrics, growth, competitive positioning, and scenario analysis, it should be possible to construct a simple, reasoned, investment thesis.
Re: (i), I think that you recently did a vid on this. But I have seen a number of seemingly knowledgeable people saying the best approach is to just finetune bert as it uses encoding and is the best starting point. Apologies if that sounds confused: it probably is, I am new to this area.
Great stuff. Here’s the issue: Most of the data I want to review like contracts, zoning laws…etc are in PDFs. So, the RAG apps I want to build will be for getting data out of PDFs. So, anything you can do on that front would be great.
your best bet is to find the source documents with the full text. PDFs are never the source. The amount of cleanup required to get good info will take longer. In some cases you may get lucky.
If you're on a Mac with homebrew, trying installing the "gc" package (Ghostscript) (brew install gc). Similar on Linux, using whatever package manager is appropriate. Ghostscript provides the "ps2ascii" tool - just call that giving it the input PDF filename and an output (text) filenaname as arguments and it will perform the translation. If your PDF is mostly just text, the output is usually pretty good. If there are lots of "design elements" within the PDF - not so much. For your type of content, it may do pretty well. You casn script this with zsh/bash to convert whole folders of PDF files to text quickly. Good luck.
It is unfortunate that you need to go through hoops like that. I hope to find a better way that works that doesn’t require a horrible approach like that.
I have a shell script called summerize_pdf which is
pdf2text $1 | ollama run mistral "summarize this in 2000 words or less"
pdf2text is a python program which is:
!/usr/bin/env python3
import sys
from PyPDF2 import PdfReader # Use `PdfReader` instead of `PdfFileReader` for more recent versions
def extract_text(pdf_path):
with open(pdf_path, 'rb') as f:
reader = PdfReader(f)
num_pages = len(reader.pages) # Get the total number of pages in the PDF document
text = ""
for page_num in range(num_pages): # Iterate over each page
page = reader.pages[page_num]
text += page.extract_text() + "
" # Append the text from this page to the full text string, followed by a newline
return num_pages,text
def main():
if len(sys.argv) != 2:
print("hi Usage: pdf_text_extractor.py ")
return
filename = sys.argv[1] # Get the PDF filename from the command line arguments
numPages, text = extract_text(filename) # Extract text from the file
print("Total pages: ", numPages)
print("Extracted Text:
", text) # Print out the extracted text
if __name__ == "__main__":
main()
Not elegant but it gets the job done.
Unfortunately it uses pypdf which does a terrible job for most PDFs. Sometimes it works ok, but way too often the text is jumbled up. Since many PDFs are simply images, an OCR step is often needed. I think most who think pypdf works don't actually look at the resulting text.
that awkward silence in the end 😅, Thanks a lot for the insights 🎉🎉❤
You deserve a subscribe just for this vid. 9:41 minutes spent great.
Great video! any suggestions regarding what embeddings to use if my RAG app is to consume pdfs, any document in spanish, i have tried nomic-text, fastembeddings and all-minilm for sentence transformers, but all of them fail to retrieve a good answer from chroma using search, similarity_search or similarity_search_whith relevant score, I have tried using only english language pdfs and works fairly ok
Hi @Matt, can you update your repo? So that we can have a full working one? Some steps are missing.
Thanks.
I launch the ChromaDB in a separate terminal within VS Code. Then I run the import.py script in a different terminal. When I run the script, I receive a Errno 61, however when I look at the logs of the localhost port 8000 ChromaDB server, I get multiple 200 API Responses. Is there any troubleshooting as to why it would generate 200 Responses while still erroring in the "for index, chunk in enumerate(chunks_: loop?
I am facing the same issue as of now. Have you managed to find a solution?
I am having the same issue. Were you able to find the fix? Thanks!
Thanks for your video, I think I understand the process of embedding.
Is there a way to use the embedded docs with an API call? I want tot write a Winform App in C# and therefore a API call would come in handy.
The full api is documented in the docs. Https://github.com/ollama/ollama
@@technovangelist Yes, I know, but there is only a small chapter for generating embeddings, but not on how to use them with the API.
you wouldn't use them directly. You can generate the embedding but then you need to put it somewhere. That is often a vector db. the model can't do anything with the embedding itself. you use the embedding to do a similarity search and then use the source in the model
@@technovangelist Ok, my question was misleading: I now got your scripts running, and I have the documents in ChromaDB.
Is there a way to use the Ollama API to talk to my documents in the DB instead of using a python script to do so?
I wrote a small WinForm app in C# to talk to my models via the Ollama API, but I don't see a way to use the API to support this talks with my documents in the DB.
Hello Matt would chunking knowledge into logical units like paragraphs or chapters not better than chopping after so many sentences? Could use llm and instruct it to do the chopping more intelligent or use nlp software for that. Did you consider this?
Would you use RAG for text categorization? For instance, would you create embeddings of all Twitter posts and then categorize each? And how would you retrieve the main categories of each day?
Yes, RAG can be a powerful tool for text categorization. For example, you could create embeddings of all Twitter posts using a model like Ollama or other embedding models. Then, you can store these embeddings in ChromaDB. To categorize, you could use similarity searches to cluster similar tweets into categories.
To retrieve the main categories of each day, you could perform a temporal query on your database (filtering by date) and then run clustering algorithms or use a pre-defined set of categories to match tweets for that day. Let me know if you'd like me to make a detailed video or write some sample code on this.
Matt, have very good content.
I first saw the webui vid and then came to this vid. Qn: so what you do here can be done via openui >Documents. Correct?
Hey Matt. Love your work. This one took me a while to get up and running but with just the right amount of cursing I was able to get it up. When I looked at the data in database, there was a lot of blank lines in between text. I assume it would be preferable to strip that out before chunking?
super useful and clear! Subscribed.
Hi Matt, how would you approach dealing with code as the data you want to put into the vector store? I am thinking that sentence chunks might be function chunks?
Hey I have a deeply nested database which is too large. It can contain any level of nesting depending upon the user needs. I wanted to convert it to vector embeddings such that each nested part can be accessed with the query like show me the users where the task assigned to the user is cleaning, the task is completed and user have a birthday after 03.04.1990 . So I am confused how do I need to convert it to vector db such that the main objects can be found. Which ever videos I have seen on youtube are just working with the single level dataset. But what if we have multilevel database?
Hey Matt, great content, thanks. But what If I want to do RAG with a SQL database (for structured data assistance, for example, stock market prices). Let's say I already have a local SQL RDBS and the stored procedures which results can be injected to the model in order to produce analytic results. Can an AI model execute those stored-procedures and use the results as an input?
The ai never executes any queries. But if the content is better suited to sql, so isn’t text and is already numerical then great.
enjoying your videos and Ollama - looking forward to the TS version of this one!
Hi Matt. Could you consider a video where you take this local RAG script you've made here and redo as using Langchain to demonstrate the process and if you think the abstraction approach is efficient or helpful for 1)new coders and/or 2) experienced coders?
Lang chain only complicates things, especially in such a simple app. I don’t want to create videos about the wrong way to do something.
Hi Matt, I have one question. I am building a chatbot using langchain + chainlit + Ollama + ChromaDB. The only problem is I have JSON data. And when I search I don't get relevant results. In your opinion which embedding model would be best to achieve decent results?
Thanks in Advance!
Any embed model should be fine. Nomic embed text is my favorite. My one recommendation is to get rid of langchain. For simple apps like that there is no value.
Please do a STT (Speech to Text) / TTS (Text to Speech) integration!
This dude is so underrated
Hi Matt, further to the idea of chunks and the use case being code as input to the RAG, how would you think about context of related functions.. thinking that the retrieval could miss the important interdependence of functions..
Yes that is interesting. I was purely looking at English. I’m not sure how to look at code for this
Hello Math, Thank you for this great video. I try to implement your solution but I am facing issues when using the ollama libray "Connection refused". Are the embedding model and LLM dynamically downloaded from a website with your code or should we do it ourselves before using it ?
If you are getting some sort of error when running ollama pull look into your network connection
Say, we have a database like this, which includes medical criteria for different conditions, examples of cases etc. and we want to use it as a context for LLM. Now we provide a description of a new case and we prompt the model to compare the provided information with the database and suggest a proper diagnosis. Is RAG a good choice in this scenario? RAG + prompt engineering? No-RAG solution? What would be your suggestion?
I don’t know. Best way to find out is to try it.
One idea for a next video could be a guide on how to create a chatbot+RAG with Knowledge Graphs(Neo4J).
Yes please!
Link to Typescript version? Could find it looking through the channel videos.
Thats pure gold!
Hi Matt, I've cloned the project repo for this video, and I'm trying to play along, but I'm running my Ollama service on a separate machine, and I can't figure out where/how I'd specify that in either the config file or the individual ollama.embeddings() and ollama.generate() invocations. Sorry if I've missed something obvious. I have zero experience with Python.
Solved: I needed to create a "custom client". I should have RTFM for the Python SDK more carefully. Guess I glossed over that the first time.
Great video, Matt. This is so cool.
One small suggestion: at 6:00, could you please use syntax highlighting in your code? The all white font makes it hard to follow which finctions youre improtong from 3rd party libraries vs UDFs. I think a color scheme similar to what VS Code uses in its default theme would help readability.
Thanks again for the excellent videos.
very cool tutorial! are you planning to add graphrag, now that it's opensourced?
Who knows. Maybe if it gets any traction in the future
@@technovangelist would you mind sharing how you measure "traction on a specific topic"?
are there enough folks talking about it. whats enough? does it ever come up in any of my circles. a lot of stuff comes up. that never does anymore... 8 months ago the topic came up a bit (not this msft version, but other previous implementations). and it looks like it’s a slight improvement over previous graph rag approaches from others over the last few months..... maybe if it progresses a bit further...
Thanks - very good explanations! Would there be any advantages using Ollama RAG Application using Langchain or LlamaIndex?
Not for this. A much more complicated app might benefit but I haven’t seen it.
I have few doubts here:
1. The model always respond to a question which means, if I am asking something outside the vector database, the LLM will respond using the knowledge on which it is been trained. Is there any way to handle this?
2. How to identify the model suitable for RAG, I have tried multiple models some are, extremely slow, some are fast with low quality output. unable to find the right model which can work for a large enterprise application.
3. Is RAG is also good for document summarisation?
if you dont want the model to respond if nothing was found in the db, then don't ask the model if there are no results. easy. Most models can respond well, but its easy to get the chunk size wrong. Too big or too small will result in bad output. Document summarization isn't really something RAG can help with.
Hello Matt, I know this video is already 3 months old, but I hope you can read and answer my comment.
I'm building a software to extract information from documents, sometimes using OCR. What sets my software apart is how I use the extracted information. After I get the info in JSON format, I have an AI categorize each document based on a fixed list of categories. However, I often encounter grammatical errors in the text or incorrect category selections.
I can manually correct these errors, but I want the model to "learn" from past mistakes and improve over time, mimicking user corrections. My idea is to create a database filled with mistakes and their corrections and somehow give the model access to this data for a pseudo-learning feature.
I was considering using Retrieval-Augmented Generation (RAG) for this, though I have no experience with it. Given that the database will be constantly updated with new data, sometimes needing to override old data, and potentially facing other challenges, what approach would you recommend? I'm also open to entirely new approaches.
I hope my English is clear enough to convey my message.
Thank you, Matt.
what about htmls? strip em and use it as it is?
i've create a venv python3.11, activated, installed requirements but doesn't work. too bad 😢
The goal isn’t to give you a working solution but to show you what to do.
@@technovangelist I understand but I was hoping to customize your code to develop my RAG
I suppose that you prefer EPUB to PDF for printable file format. Right?
Well ideally txt or md. Even a docx which is a zipped xml is better.
@@technovangelist Oh, you mean the source document file. I thought you meant a processed file ready to print. I see what you mean. It can also be latex or asciidoc.
Well any format where the text is accessible as is. PDF obfuscates it
After playing around with RAG I have several questions
* What Vector DB is the best option?
* Multi-Agent? CrewAI??
* What orchestrator is the best? lanchain, lamaindex
* What open source models is the best?
* What is ideal workflow?
Goal = reliable answers and reduce hallucinations
Well keep watching. For rag orchestrators add complexity without benefit. Which model is best depends on what your needs are and only you can decide. Workflow again is all about you.
the ending 😆💯
I think I watched you say you now use Obsidian. How about a video where you write some Python to ingest your Obsidian Vault to RAG for Ollama LLM access to the content?
Obsidian already has an ollama plugin. No need.
Hi Matt,
i have the chromadb running on 1 terminal, and on another terminal, I run:
python3 import.py
however ...
Exception: {"error":"ValueError('Collection buildragwithpython does not exist.')"}
I added
"""
try:
chroma.delete_collection("buildragwithpython")
except Exception as e:
print("An error occurred:", e)
"""
in import.py
and now I am seeing:
"""
/import.py", line 23, in
chunks = chunk_text_by_sentences(source_text=text, sentences_per_chunk=7, overlap=0 )
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
"""
# for ntlk
/Applications/Python\ 3.12/Install\ Certificates.command
doh, you can't delete something that doesn't exist, and it wont till you run the app which you can't run till the thing exists, which it wont until you run it.....
fixed. thanks for pointing that out
the certificates thing is weird. definitely didn't need that. I wonder if that’s a windows thing
Awesome content ❤
Thank you 🙌
Thanks for this video
Great work and explanation Sir. Thanks for sparing your valuable time and the code but could you please add folders for documents and DB in the code where we can add our own files.Sorry I am not a SW guy, just copy the OpenSource codes and try/run them.THX.
Just add paths to the source docs file
Good explanation
Can it access SQLite databases for training?
The RAG app in this video is designed to work with Ollama and ChromaDB. While the setup focuses on these tools, you can definitely integrate SQLite for training or data retrieval. You would need to write a script to extract data from the SQLite database and then pass it into ChromaDB for indexing.
Typescript 👌 please also pdf tutorial also 👌
the typescript version will be published on Monday. And then will look at pdf in the next few weeks.
@@technovangelist I really appreciate what you're doing man, I'm acting as an advocate to use JS/TS with AI and your videos help me a lot. Success!
@@technovangelistI put some python code in the comments, for a quick and dirty system.
Where is your Discord link, pls?
I don't have a Discord, but the Discord for Ollama is Discord.gg/Ollama
@@technovangelist Thanks. How do I update the list in the sourcedocs.txt file, pls? I tried to just add a url and save it, but received a 500 Server Error when ran import.py / do you know how to fix this?
If it’s a 500 it’s probably not a valid url to a real server
Plus it’s meant as a code sample so you can start building your own
You’re breathtaking 😘
Waiting for the typescript and pdf videos
🌟
So now the old .chm format of older digital books is getting its revenge.
Maybe if enough people rightly trash PDF, it will stop being a dominant format for document distribution? I can dream, can’t I?
I am just happy you know it pdf's suck...
Where to host ollama? Without expend million of dollars hahaha
On your machine is ideal. But I have another video that shows one option called Brev.dev. See Unlocking The Power Of GPUs For Ollama Made Simple!
th-cam.com/video/QRot1WtivqI/w-d-xo.html
"useless tools like ... PyMuPDF"
Hard disagree.
Hi great video and will try to use if for a project here at the Metropole de Nice. Source documents ad .docx.
A separate question: can you exlain how to get a copilot-like behaiour?
Meaning:
I ask ollama "summary of the three top news from cnn and corriere.it, in italian and in a radio-newscast style"
it performs a google search on the two websites, puts everything in the prompt (or maybe builds an embeddings ? not clear). And gives me the answer.
Docx is a bit better than pdf. Change the extension to zip and unzip it and you have a bunch of xml files and they are much better to pull text out of. Not sure what you mean by copilot behavior. I have used that in vscode but I am not a windows user.
Hello Matt, I know this video is already 3 months old, but I hope you can read and answer my comment.
I'm building a software to extract information from documents, sometimes using OCR. What sets my software apart is how I use the extracted information. After I get the info in JSON format, I have an AI categorize each document based on a fixed list of categories. However, I often encounter grammatical errors in the text or incorrect category selections.
I can manually correct these errors, but I want the model to "learn" from past mistakes and improve over time, mimicking user corrections. My idea is to create a database filled with mistakes and their corrections and somehow give the model access to this data for a pseudo-learning feature.
I was considering using Retrieval-Augmented Generation (RAG) for this, though I have no experience with it. Given that the database will be constantly updated with new data, sometimes needing to override old data, and potentially facing other challenges, what approach would you recommend? I'm also open to entirely new approaches.
I hope my English is clear enough to convey my message.