Multimodal RAG: Chat with PDFs (Images & Tables) [latest version]

Alejandro AO - Software & Ai

มุมมอง 13 255

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 24 พ.ย. 2024

ความคิดเห็น • 36

@ZevUhuru 3 วันที่ผ่านมา
Bro I literally came to back to get your old video on PDFs and you already have an update. Thank You!
@algatra6942 12 วันที่ผ่านมา ⁺⁴
Idk, i just finally found most understandable AI Explanation Content. Thank you Alejandro
@alejandro_ao 11 วันที่ผ่านมา
glad to hear this :)
@argl1995 11 วันที่ผ่านมา
@@alejandro_aoI want to create a multi llm chatbot for telecommunication, is it a way to connect with you apart from TH-cam so that I can share the problem statement with you ?
@whouchekin 12 วันที่ผ่านมา ⁺²
the best touch is when you add front-end
good job
@alejandro_ao 11 วันที่ผ่านมา ⁺³
hey! i'll add a ui for this in a coming tutorial 🤓
@onkie.ponkie 12 วันที่ผ่านมา ⁺²
i was about to learn from the previous video. But you brother. just bring more gold.
@alejandro_ao 12 วันที่ผ่านมา
you’re the best
@jaimeperezpazo 6 วันที่ผ่านมา
Excellent!!!! Thank you Alejandro
@muhammadadilnaeem 12 วันที่ผ่านมา ⁺¹
Amazing Toturial
@SidewaysCat 11 วันที่ผ่านมา ⁺³
Hey dude what are you using to screen record? Mouse sizing and movement looks super smooth id like to create a similar style when giving tutorials
@alejandro_ao 11 วันที่ผ่านมา
hey there, that's this screen studio app for mac developed by the awesome Adam Pietrasiak @pie6k, check it out :)
@ahmadsawal2956 12 ชั่วโมงที่ผ่านมา
thanks for great content, how can we modify this to user local LLM , Ollama3.2 and Ollama-vision
@olexiypukhov-KT 12 วันที่ผ่านมา ⁺²
You should look into llamaparse rather than unstructured. The amount of content I've indexed into the vector db would take 15 days with unstructured, vs. with llamaparse it only takes a few hours. Plus, you can make the api calls async as well.
@alejandro_ao 11 วันที่ผ่านมา ⁺¹
i LOVE llamaparse. i'll make a video about it this month
@daniellopez8078 3 วันที่ผ่านมา
Do you know if unstructured is open-source (meaning free)? Do you know any other free alternative to unstructured?
@olexiypukhov-KT 3 วันที่ผ่านมา
@@daniellopez8078 Unstructured is free but its slow. Its default with langchain. Lllamaparse offers a free plan, in which it gives 1000 free pages to parse daily.
@olexiypukhov-KT 2 วันที่ผ่านมา
@@daniellopez8078 Unstructured is free. They have a open source ver and a propriety version. The propriety version is paid, and apparently offers better quality. The free unstructured is slow. Llamaparse is fast, and it gives you 1000 pages free per day.
@ronnie333333 10 วันที่ผ่านมา
Thank you for the video. Just curious, how to go about persisting the multivector database? What data sources are available that cater to such requirements? Also, how do we go about getting an image as an input from the user, so the language model can relate to it within the documents and predict an answer!
@GowthamRaghavanR 6 วันที่ผ่านมา
Good one!!, did you see any opensource alternatives like Markers?
@duanxn 12 วันที่ผ่านมา
Great tutorial, very detailed. Just one question, any options to link the text chunk that describes the image as the context of the image to create more accurate summary of the image?
@alejandro_ao 11 วันที่ผ่านมา
beautiful question. totally. as you can see, the image is actually one of the `orig_elements` inside a `CompositeElement`. and the `CompositeElement` object has a property called `text`, which contains the raw text of the entire chunk. this means that instead of just extracting the image alone like i did here, you can extract the image alongside the text in its parent `CompositeElement` and send that along with the image when generating the summary. great idea 💪
@Pman-i3c 12 วันที่ผ่านมา ⁺¹
Very nice, is it possible to be done with local LLM like Ollama model?
@alejandro_ao 12 วันที่ผ่านมา ⁺²
Yes, absolutely. just use the langchain ollama integration and change the line of code where i use ChatOpenAI or ChatGroq. Be sure to select multimodal models when dealing with images though
@alexramos587 12 วันที่ผ่านมา ⁺¹
Nice
@Diego_UG 10 วันที่ผ่านมา
What do you recommend or how do you suggest that the conversion of a PDF of images (text images) to text can be automated? The problem is that traditional OCR does not always do the job well, but ChatGPT can handle difficult images.
@julianomoraisbarbosa 11 วันที่ผ่านมา
# til
tks for you video.
is possible using crewAI in the same example ?
@AkashL-y9q 12 วันที่ผ่านมา ⁺¹
Hi Bro, Can you create a video for Multimodal RAG: Chat with video visuals and dialogues.
@alejandro_ao 12 วันที่ผ่านมา ⁺³
this sounds cool! i’ll make a video about it!
@AkashL-y9q 12 วันที่ผ่านมา
Thanks @@alejandro_ao
@blakchos 4 วันที่ผ่านมา
any idea to install poppler tesseract libmagic in windows maqhine?
@karansingh-ce8yy 12 วันที่ผ่านมา
what about mathematical equatons?
@alejandro_ao 11 วันที่ผ่านมา ⁺³
in this example, i embedded them with the rest of the text. if you want to process them separately, you can always extract them from the `CompositeElement` like i did here with the images. then you can maybe have a LLM explain the equation and vectorize that explanation (like we did with the description of the images). in my case, i just put them with the rest of the text, i feel like that gives the LLM enough context about it.
@karansingh-ce8yy 11 วันที่ผ่านมา ⁺¹
@@alejandro_ao thanks for the context i was stuck at this for a week now

ต่อไป

เล่นอัตโนมัติ

Stop Paying for Microsoft GraphRAG! This Alternative is 10x Cheaper