AI Generated Blog with Langchain + FastAPI Python Tutorial (Part 2)

Chat with Multiple PDFs | LangChain App Tutorial in Python (Free LLMs and Embeddings)

[1hr Talk] Intro to Large Language Models

New Colour Match Puzzle Challenge with Cola and McDonald’s Avengers Logo - Incredibox Sprunki

เอก - ตาสว่าง - Live Show - The Voice Thailand 2024 - 15 Dec 2024

The White Lotus Season 3 | Official Teaser | Max

AI Summarize HUGE Documents Locally! (Langchain + Ollama + Python)

debugverse

มุมมอง 13 669

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 8 ก.พ. 2025
Today we are looking at a way to efficiently summarize huge PDF (or any other text) documents using clustering method with HuggingFace embeddings, Langchain Python framework and Ollama Llama 3.1 model.

ความคิดเห็น • 24

@yashnarang3014 2 วันที่ผ่านมา
I can't tell you how much greatefull I am that you made this video and I got this. This saved me so much of effort. I was trying to solve this problem for past 2 days Thank You !!!!!!!!!!
@jakubzakowski7422 หลายเดือนก่อน ⁺¹
one of the best videos i have ever seen. I just want to tell you Thank you and good job
@DebugVerseTutorials 5 หลายเดือนก่อน ⁺⁶
Source code
github.com/debugverse/debugverse-youtube/tree/main/summarize_huge_documents_kmeans
@yashnarang3014 2 วันที่ผ่านมา
Damn man Great Video would you mind if I use this is my own project and make a video about it? Will sure give you creadit for it !!!!
@srivenkateswaraswamy3403 3 หลายเดือนก่อน ⁺⁵
what if images of tables and equations are there in that case?
@jonm691 21 วันที่ผ่านมา
Nice video - thanks for sharing that
@drakouzdrowiciel9237 วันที่ผ่านมา
thx
@ajays6393 หลายเดือนก่อน
Thank you very informative!
@mikew2883 24 วันที่ผ่านมา
Very cool! Do you mind providing an example of how to filter the data like you mention in closing?
@jonm691 21 วันที่ผ่านมา ⁺²
I looked at this. Basically, you use the results to provide your source pages, and then use that as the context. For example:
filter = EmbeddingsClusteringFilter(embeddings=embeddings, num_clusters=10, num_closest=3)
result = filter.transform_documents(documents=texts)
context=""
for i in result:
context += f"{i.page_content}
"
# convert your result pages into a single text blob by combining them
prompt = " Ask your question here... use the context within triple backticks ``` {context}```"
response = llm.invoke(prompt)
print(response)
However... this is not a replacement for RAG, because remember that much of the document has been discarded and so you're unlikely to find your answer. k-means is basically just collating similar pages, but not necessarily the one with the unique information you need. K-means is therefore great for summarisation, but not necessarily good for specific questions. So, if your specific question relates to something that is summary-like, then if should be more relevant.
Maybe I've missed something here, but that's my conclusion from playing with it.
@mightyboessu 3 หลายเดือนก่อน ⁺⁵
Why do you use the HuggingFaceBgeEmbeddings and not OllamaEmbeddings?
@MITdork หลายเดือนก่อน ⁺¹
😎
@thingX1x หลายเดือนก่อน
Will this work for a procedurally generated file containing a conversation? Or should I look at another method?
@danila8823 หลายเดือนก่อน
Using gemini vision to describe the video?? Nice technique
@meereslicht 4 หลายเดือนก่อน
Excellent, thank you! A very clever strategy for large documents. However, I am a little at a loss in the search of a good embedding model for texts in Spanish. I am not sure whether the BGE models are the best option for these. Can you suggest one that could be integrated seamlessly within your code?
@DebugVerseTutorials 4 หลายเดือนก่อน ⁺²
Hi, for Spanish language take a look at jinaai/jina-embeddings-v2-base-es . In your code simply replace the model_name variable and everything should work.
@meereslicht 4 หลายเดือนก่อน ⁺¹
@@DebugVerseTutorials Thank you very much for your kind answer. I'll do that 😊🤗🤗
@igorcastilhos 3 หลายเดือนก่อน
@@DebugVerseTutorials Hi, if I would to use the Ollama model, how can I know the exact name necessary to put in the model_name?
@mukeshkund4465 หลายเดือนก่อน
@@igorcastilhosdo ollama list to see the model available and copy the name.
@allok501 หลายเดือนก่อน
you can use latest jina embeddings v3 as it is multilinugal.
@RedCloudServices 25 วันที่ผ่านมา
I think the latest vision models will make RAG obsolete
@chulung3190 19 วันที่ผ่านมา
Hi, I am working on a company project. Can this help me extract the required data from a PDF?
I receive a monthly PDF that includes all our company clients' monthly statements. I need to extract the 'Brought Forward' and 'Realized Loss/Profit Amount' from the PDF, which is nearly a thousand pages long. I will need to perform this process monthly.
@DebugVerseTutorials 18 วันที่ผ่านมา
I have worked on a similar task with both vision LLM and pdfminer so I would recommend those tools.

ต่อไป

เล่นอัตโนมัติ

AI Generated Blog with Langchain + FastAPI Python Tutorial (Part 2)

AI Generated Blog with Langchain + FastAPI Python Tutorial (Part 2)

Chat with Multiple PDFs | LangChain App Tutorial in Python (Free LLMs and Embeddings)

Chat with Multiple PDFs | LangChain App Tutorial in Python (Free LLMs and Embeddings)

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

เอก - ตาสว่าง - Live Show - The Voice Thailand 2024 - 15 Dec 2024

เอก - ตาสว่าง - Live Show - The Voice Thailand 2024 - 15 Dec 2024

The White Lotus Season 3 | Official Teaser | Max

The White Lotus Season 3 | Official Teaser | Max

Players vs Trophies 🤯

Players vs Trophies 🤯

Extracting Structured Data From PDFs | Full Python AI project for beginners (ft Docker)

Extracting Structured Data From PDFs | Full Python AI project for beginners (ft Docker)

RAG from the Ground Up with Python and Ollama

RAG from the Ground Up with Python and Ollama

Obsidian with Ollama

Obsidian with Ollama

Turn ANY Website into LLM Knowledge in SECONDS

Turn ANY Website into LLM Knowledge in SECONDS

All Machine Learning algorithms explained in 17 min

All Machine Learning algorithms explained in 17 min

Building a fully local "deep researcher" with DeepSeek-R1

Building a fully local "deep researcher" with DeepSeek-R1

host ALL your AI locally

host ALL your AI locally

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

Don’t Embed Wrong!

Don’t Embed Wrong!

ใครขยับไม่ได้เป็น!!

ใครขยับไม่ได้เป็น!!

🔴LIVE โหนกระแส บาร์โฮสสะเทือน!!! "สุนิสา" อาละวาดไล่หลอกเงิน

🔴LIVE โหนกระแส บาร์โฮสสะเทือน!!! "สุนิสา" อาละวาดไล่หลอกเงิน

Live!🔴 สิงคโปร์ VS ทีมชาติไทย เชียร์สดฟุตบอลฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

Live!🔴 สิงคโปร์ VS ทีมชาติไทย เชียร์สดฟุตบอลฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

人是不能做到吗？#火影忍者 #家人 #佐助

人是不能做到吗？#火影忍者 #家人 #佐助

ถ้าทาสไม่ขุดทอง แล้วทาสจะขุดอะไร #hererm #เกม #gaming

ถ้าทาสไม่ขุดทอง แล้วทาสจะขุดอะไร #hererm #เกม #gaming

마시멜로우 버섯 만들기🍄😂Making marshmallow mushrooms#funny

마시멜로우 버섯 만들기🍄😂Making marshmallow mushrooms#funny

ช้างศึกโดนก่อน ไล่ยิงคืนสิงคโปร์ ทะลุน็อคเอาท์

ช้างศึกโดนก่อน ไล่ยิงคืนสิงคโปร์ ทะลุน็อคเอาท์

Cat mode activated 🤣

Cat mode activated 🤣