Waiting for this 🙏. Best Video. Thank you so much. Humbly request you to also create a video on complex html web pages to extract data for building Multimodal RAG pipelines.
Great video. Just out of curiosity, can we build a more reliable/efficient QnA application using NLP, probably training on the same PDF that we want to use RAG for...?
I think training model for single type pdf use cases is overkill. Though we can do that if format and pattern of data is similar across PDF. Like company specific vocabulary etc that doesn't change over time..
@@genieincodebottle what if we have different terminilogies used for the same type of words in a pdf(s), then ?. In that case many of the LLM's prompts has to be changed right...
See finetuning is basically tune behaviour of the model as per ur requirements and also it will have trained parameters as statistical patterns. Similar meaning different terminology doesn’t need different prompts
Thanks Rajesh for the video I was looking for something similar it’s very helpful 😊 I want to ask one question can I use same approach for invoice data extraction? In that scenario till I need to use Vector Db or directly using invoice markdown data will work. I have tried for few invoices it worked but I want to know best practice which I can use on prod 😊
Glad you liked it…You don’t need a vector db unless you are dealing with larger invoices and facing latency issues in a chatbot type app. If your app requires quick responses but extracting content takes time, consider using a vector DB with RAG. For larger contexts and faster models, you can dump all the content into the model to get results. However, this approach might not be ideal for real-time requests, chatbots, or when token costs become too high due to frequent invoice processing. I suggest processing invoices in batches by dumping their contents into the LLM, retrieving the response, and storing it either in a vector DB or as CSV data in an RDBMS for future use.
Is it possible to fine-tune the DocLing model to identify key sections of a scientific article, such as the abstract, introduction, references, and keywords? How can this be achieved?
Docling is not model as such. It uses different pdf parsing libraries with it's own few models for OCR. So may not possible. You can check internals of that. Though you can use any open multimodal or vision model to fine tune like Llama 3.2 11B/90B Vision model .
Waiting for this 🙏. Best Video. Thank you so much. Humbly request you to also create a video on complex html web pages to extract data for building Multimodal RAG pipelines.
Thank you :) . Sure. Could you send me example html page if you have ?
How it if compare with pdf marker
I didn’t try that so can’t compare that. Have u tried that?
Thank you @genieincodebottle for this valuable contents. Could you please provide a sample resume focused on Generative AI (GENAI)
Thanks..🙏 ..Will provide that..What’s ur yrs of experience?
Great video. Just out of curiosity, can we build a more reliable/efficient QnA application using NLP, probably training on the same PDF that we want to use RAG for...?
I think training model for single type pdf use cases is overkill. Though we can do that if format and pattern of data is similar across PDF. Like company specific vocabulary etc that doesn't change over time..
@@genieincodebottle what if we have different terminilogies used for the same type of words in a pdf(s), then ?. In that case many of the LLM's prompts has to be changed right...
See finetuning is basically tune behaviour of the model as per ur requirements and also it will have trained parameters as statistical patterns. Similar meaning different terminology doesn’t need different prompts
Thanks Rajesh for the video I was looking for something similar it’s very helpful 😊
I want to ask one question can I use same approach for invoice data extraction? In that scenario till I need to use Vector Db or directly using invoice markdown data will work. I have tried for few invoices it worked but I want to know best practice which I can use on prod 😊
Glad you liked it…You don’t need a vector db unless you are dealing with larger invoices and facing latency issues in a chatbot type app. If your app requires quick responses but extracting content takes time, consider using a vector DB with RAG. For larger contexts and faster models, you can dump all the content into the model to get results. However, this approach might not be ideal for real-time requests, chatbots, or when token costs become too high due to frequent invoice processing. I suggest processing invoices in batches by dumping their contents into the LLM, retrieving the response, and storing it either in a vector DB or as CSV data in an RDBMS for future use.
@@genieincodebottle will be using on prem Hosted LLM model llama 3.3 70b for this task ..
which tool have you used to create the flowchart?
It's lucid chart
Waiting for this
Thank u :) Recorded in rush due to time constraint. Let me know if any other explanantion needed. Will try to address in next video.
Is it possible to fine-tune the DocLing model to identify key sections of a scientific article, such as the abstract, introduction, references, and keywords? How can this be achieved?
Docling is not model as such. It uses different pdf parsing libraries with it's own few models for OCR. So may not possible. You can check internals of that. Though you can use any open multimodal or vision model to fine tune like Llama 3.2 11B/90B Vision model .