Pdf Parsing with Scanned Images, Tables, Text with Docling, Claude 3.5, GPT 4, Llama 3.2

แชร์
ฝัง
  • เผยแพร่เมื่อ 9 ก.พ. 2025

ความคิดเห็น • 21

  • @bodasuman1851
    @bodasuman1851 2 หลายเดือนก่อน +2

    Waiting for this 🙏. Best Video. Thank you so much. Humbly request you to also create a video on complex html web pages to extract data for building Multimodal RAG pipelines.

    • @genieincodebottle
      @genieincodebottle  2 หลายเดือนก่อน

      Thank you :) . Sure. Could you send me example html page if you have ?

  • @reserseAI
    @reserseAI หลายเดือนก่อน +2

    How it if compare with pdf marker

    • @genieincodebottle
      @genieincodebottle  27 วันที่ผ่านมา

      I didn’t try that so can’t compare that. Have u tried that?

  • @Srb0002
    @Srb0002 2 หลายเดือนก่อน +2

    Thank you @genieincodebottle for this valuable contents. Could you please provide a sample resume focused on Generative AI (GENAI)

    • @genieincodebottle
      @genieincodebottle  2 หลายเดือนก่อน

      Thanks..🙏 ..Will provide that..What’s ur yrs of experience?

  • @abdulkhadeer3586
    @abdulkhadeer3586 2 หลายเดือนก่อน +1

    Great video. Just out of curiosity, can we build a more reliable/efficient QnA application using NLP, probably training on the same PDF that we want to use RAG for...?

    • @genieincodebottle
      @genieincodebottle  2 หลายเดือนก่อน +1

      I think training model for single type pdf use cases is overkill. Though we can do that if format and pattern of data is similar across PDF. Like company specific vocabulary etc that doesn't change over time..

    • @abdulkhadeer3586
      @abdulkhadeer3586 2 หลายเดือนก่อน

      @@genieincodebottle what if we have different terminilogies used for the same type of words in a pdf(s), then ?. In that case many of the LLM's prompts has to be changed right...

    • @genieincodebottle
      @genieincodebottle  2 หลายเดือนก่อน

      See finetuning is basically tune behaviour of the model as per ur requirements and also it will have trained parameters as statistical patterns. Similar meaning different terminology doesn’t need different prompts

  • @bharatguptag1994bharat
    @bharatguptag1994bharat หลายเดือนก่อน +1

    Thanks Rajesh for the video I was looking for something similar it’s very helpful 😊
    I want to ask one question can I use same approach for invoice data extraction? In that scenario till I need to use Vector Db or directly using invoice markdown data will work. I have tried for few invoices it worked but I want to know best practice which I can use on prod 😊

    • @genieincodebottle
      @genieincodebottle  หลายเดือนก่อน +1

      Glad you liked it…You don’t need a vector db unless you are dealing with larger invoices and facing latency issues in a chatbot type app. If your app requires quick responses but extracting content takes time, consider using a vector DB with RAG. For larger contexts and faster models, you can dump all the content into the model to get results. However, this approach might not be ideal for real-time requests, chatbots, or when token costs become too high due to frequent invoice processing. I suggest processing invoices in batches by dumping their contents into the LLM, retrieving the response, and storing it either in a vector DB or as CSV data in an RDBMS for future use.

    • @bharatguptag1994bharat
      @bharatguptag1994bharat หลายเดือนก่อน

      @@genieincodebottle will be using on prem Hosted LLM model llama 3.3 70b for this task ..

  • @VVs-z6h
    @VVs-z6h หลายเดือนก่อน +1

    which tool have you used to create the flowchart?

  • @msk9182
    @msk9182 2 หลายเดือนก่อน +1

    Waiting for this

    • @genieincodebottle
      @genieincodebottle  2 หลายเดือนก่อน

      Thank u :) Recorded in rush due to time constraint. Let me know if any other explanantion needed. Will try to address in next video.

  • @mohammadkhalifeh6038
    @mohammadkhalifeh6038 2 หลายเดือนก่อน

    Is it possible to fine-tune the DocLing model to identify key sections of a scientific article, such as the abstract, introduction, references, and keywords? How can this be achieved?

    • @genieincodebottle
      @genieincodebottle  2 หลายเดือนก่อน +1

      Docling is not model as such. It uses different pdf parsing libraries with it's own few models for OCR. So may not possible. You can check internals of that. Though you can use any open multimodal or vision model to fine tune like Llama 3.2 11B/90B Vision model .