AI Summarize HUGE Documents Locally! (Langchain + Ollama + Python)
ฝัง
- เผยแพร่เมื่อ 8 ก.พ. 2025
- Today we are looking at a way to efficiently summarize huge PDF (or any other text) documents using clustering method with HuggingFace embeddings, Langchain Python framework and Ollama Llama 3.1 model.
I can't tell you how much greatefull I am that you made this video and I got this. This saved me so much of effort. I was trying to solve this problem for past 2 days Thank You !!!!!!!!!!
one of the best videos i have ever seen. I just want to tell you Thank you and good job
Source code
github.com/debugverse/debugverse-youtube/tree/main/summarize_huge_documents_kmeans
Damn man Great Video would you mind if I use this is my own project and make a video about it? Will sure give you creadit for it !!!!
what if images of tables and equations are there in that case?
Nice video - thanks for sharing that
thx
Thank you very informative!
Very cool! Do you mind providing an example of how to filter the data like you mention in closing?
I looked at this. Basically, you use the results to provide your source pages, and then use that as the context. For example:
filter = EmbeddingsClusteringFilter(embeddings=embeddings, num_clusters=10, num_closest=3)
result = filter.transform_documents(documents=texts)
context=""
for i in result:
context += f"{i.page_content}
"
# convert your result pages into a single text blob by combining them
prompt = " Ask your question here... use the context within triple backticks ``` {context}```"
response = llm.invoke(prompt)
print(response)
However... this is not a replacement for RAG, because remember that much of the document has been discarded and so you're unlikely to find your answer. k-means is basically just collating similar pages, but not necessarily the one with the unique information you need. K-means is therefore great for summarisation, but not necessarily good for specific questions. So, if your specific question relates to something that is summary-like, then if should be more relevant.
Maybe I've missed something here, but that's my conclusion from playing with it.
Why do you use the HuggingFaceBgeEmbeddings and not OllamaEmbeddings?
😎
Will this work for a procedurally generated file containing a conversation? Or should I look at another method?
Using gemini vision to describe the video?? Nice technique
Excellent, thank you! A very clever strategy for large documents. However, I am a little at a loss in the search of a good embedding model for texts in Spanish. I am not sure whether the BGE models are the best option for these. Can you suggest one that could be integrated seamlessly within your code?
Hi, for Spanish language take a look at jinaai/jina-embeddings-v2-base-es . In your code simply replace the model_name variable and everything should work.
@@DebugVerseTutorials Thank you very much for your kind answer. I'll do that 😊🤗🤗
@@DebugVerseTutorials Hi, if I would to use the Ollama model, how can I know the exact name necessary to put in the model_name?
@@igorcastilhosdo ollama list to see the model available and copy the name.
you can use latest jina embeddings v3 as it is multilinugal.
I think the latest vision models will make RAG obsolete
Hi, I am working on a company project. Can this help me extract the required data from a PDF?
I receive a monthly PDF that includes all our company clients' monthly statements. I need to extract the 'Brought Forward' and 'Realized Loss/Profit Amount' from the PDF, which is nearly a thousand pages long. I will need to perform this process monthly.
I have worked on a similar task with both vision LLM and pdfminer so I would recommend those tools.