GPT PDF & Image Data Extraction (Power Automate)

Tyler Kolota

มุมมอง 15 264

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 17 ธ.ค. 2024

ความคิดเห็น • 62

@tylerkolota ปีที่แล้ว
A Version 2 is now available that is 2X faster & that uses 1/7th the action api calls.
This should make it even better for real-time scenarios like loading data to a Power App screen when a user uploads a document or processing many hundreds of documents a day.
@dmvogan ปีที่แล้ว ⁺¹
I'm not able to import your flows, I get an error. Can you briefly describe how you optimized them?
@tylerkolota ปีที่แล้ว
@@dmvogan If the standard import of the flow-only packages below do not work for you, you can also try importing the flows through a Power Apps solution package here: powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-GPT/m-p/2201670/highlight/true#M1637
@japanlove8249 2 หลายเดือนก่อน ⁺¹
Tyler thank you for the amazing video, this does exactly what I want!!!
@tylerkolota ปีที่แล้ว ⁺¹
A Microsoft Staff member just confirmed that the Create text with GPT action has been updated to use a 16k token model. So this template should now be able to work on 4x as many pages at once!
@ManouchehrNorouzi-gd5xg 9 หลายเดือนก่อน ⁺¹
Hi and very thanks to this flow and document. I need to extract information from scanned pdf files related to different kinds of contracts. I would like to ask for a suugestion on how can I improve this flow for this reason?
@tylerkolota 9 หลายเดือนก่อน
Hello,
You will mostly want to customize the prompt going to GPT to specify what you want to extract.
Now are you saying you have contracts with different formats but you want the same information from them, or different contracts you want different information from depending on their type of contract?
If it is the latter, & your use-case doesn't require faster processing speeds, then maybe you would want to split things out to two steps. One with a model, text parsing, or GPT prompt to categorize which type of contract it is, & then a switch action where depending on the contract type you send the text to different instances of the GPT action with a different prompt for each type of contract.
@ManouchehrNorouzi-gd5xg 9 หลายเดือนก่อน ⁺¹
I have different kinds of contracts, but I need to extract fixed information such as service type, service provider, service reciever, startdate and enddate of contract, and so on.
@tylerkolota 9 หลายเดือนก่อน
@@ManouchehrNorouzi-gd5xg Okay, then you should be able to adjust the example JSON fields in the prompt to extract those pieces of data.
@tylerbrooks17 7 หลายเดือนก่อน ⁺¹
Hi Tyler, great video and very helpful. What is the benefit with this flow using GPT vs AI Builder invoice reading? Also should a business be concerned with vendor data flowing through GPT services? Thanks!
@tylerkolota 7 หลายเดือนก่อน
GPT is much more flexible to different file formats/styles which is especially helpful when one may have files coming in from numerous sources like many different suppliers.
There are also times where tagging each thing in AI Builder may not be feasible if there are numerous possible instances of said things in the same file.
Also GPT prompts are much more customizable. They can interpret data & can transform the data during the extraction.
And here is the MS doc on data privacy of Azure GPT. The data is not shared or used for training or anything.
learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy
@emilypierce5944 7 หลายเดือนก่อน
Also massively cheaper IIRC. And for SMBs like myself more "pay as you go" rather than the huge blocks of $500 for a million credits you have to do for AI builder.
@tylerkolota 7 หลายเดือนก่อน
Yeah pricing is also another thing.
If you have the 5000 AI builder credits per month from the $15/mo premium power automate license then you can process a decent number of pages without needing to upgrade to the $500/mo AI Builder package.
Also if you wanted to make things more pay as you go, you could set up an HTTP action to call an Azure instance of GPT instead of using an AI Builder prompt action. That would enable you to pay like $.001 per page for the GPT prompt & use the 5000 AI credits for just the OCR action.
I also may set something up to do all this with the GPT4 Vision model in Logic Apps for a true pay as you go set-up.
@monching6919 9 หลายเดือนก่อน ⁺¹
nice content really helpful I have a project that needs to extract info from contracts. I can use this one for automation but may I know if those consumes ai credits with this workflow?
another question since its using ocr can it transcribe hadwritten text like dates after a signature?
@tylerkolota 9 หลายเดือนก่อน
Yes & Yes.
It consumes AI credits both for the OCR & for the GPT prompt.
It also does capture handwritten text.
@swarnpriyaswarn 9 หลายเดือนก่อน ⁺¹
Hey..thanks for this amazing tutorial.
Just wanted to know how you make connection with OpExOptimization...after importing it to the power automate. I am kind of stuck over there....plz do help
@tylerkolota 9 หลายเดือนก่อน ⁺¹
Are you stuck on the import screen where you add connections or after the flow actions have loaded?
@itrmendoza 11 หลายเดือนก่อน ⁺¹
@tylerkolta, as a use case scenario, how would it handle a checkmark next to text? I have pdfs with a check marks Id like to pull into the flow.
@tylerkolota 11 หลายเดือนก่อน ⁺¹
I’m not sure it would pick up a checkmark.
It does often pick up hand written signatures though, so it may do better with any x in the checkbox.
@tylerkolota 6 หลายเดือนก่อน
There are now ways to do this with GPT4o
powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-PDF-Data-With-GPT4o/m-p/2805514#M2882
@rachellim4147 6 หลายเดือนก่อน ⁺¹
Hi Tyler ,thank you for sharing this amazing video. I encountered an errot with the package version 1.7, which gives the error message: "The 'Create text with GPT' action doesn't have a content approval action after it." I did not add anything to the package. Could you please advise what might be going wrong?
@tylerkolota 6 หลายเดือนก่อน
Microsoft later added a requirement for an approval action after that preview GPT action.
Please use a more recent version where an approval action has been added after the GPT action with a static result or where the new Create text with prompt action is used.
@brandonvelasquez3530 ปีที่แล้ว ⁺¹
I am working one extracting data from medical insurance claims. 2 pages out of a potentially 20 page pdf might have info that i need. Can this still process that many pages in a file?
The other potentially 18 pages has a bunch of disclaimer stuff that is boiler plate and comes with every claim. If i stick that text file output inside the gpt action won't that go over the token limit for input?
@tylerkolota ปีที่แล้ว
Hello Brandon,
On newer versions of the flow I added some actions after the OCR read where you can set what page numbers you want it to process.
@brandonvelasquez3530 ปีที่แล้ว
@tylerkolota the thing is I don't ever know what page numbers I want it to process. Sometimes it might be 2 and 5, maybe another 3 and 9, maybe another 10 and 15. Sometimes, there might only be 5 pages total and some times 20 total. I tried using unstructured document extraction custom model and am leveraging the multi page table field, but that only works on consecutive pages and some times it's not always consecutive. Any thoughts?
@tylerkolota ปีที่แล้ว
@@brandonvelasquez3530 Well you can try hacking something together to cut out some of the material on the non-relevant pages. Otherwise if I were you, I might be waiting for GPT4Turbo to come out on Azure. Even if MS doesn't immediately include image/document support in Power Automate for it, it is still possible to set up an LLM service with it that you could call in a flow & pass the text to its much larger context window.
I already tested & set something like that up for GPT3.5Turbo incase MS started charging larger AI Builder credit fees for it.
@brandonvelasquez3530 ปีที่แล้ว
@@tylerkolota I only have 2 weeks left in this engagement with the customer. So I will just leverage what I have done so far and five this advice to them on how to make it better. There is not much I can do based on the situation this engagement finds itself in. I may try to build something like that on my own though. And do my best to make it a reusable solution because I can imagine this being a common issue companies find themselves looking to solve.
@tylerkolota ปีที่แล้ว
@@brandonvelasquez3530 I mean, is the information you’re looking for on these pages usually in a specific part of the page? Like top left/right or something?
Because there are ways to use Filter array action(s) to limit the extracted text outputs to just the text in a specific part of each page.
@rameshbabuc5981 9 หลายเดือนก่อน ⁺¹
Thanks Tyler, One quick query - Is it possible to read table rows content continuing from Page 1 to page 2. My use case is below
I need to extract information in tabular format from order confirmation pdfs received. Each pdf has multiple items and each item will have a Name, description, Vendor and delivery date.
So the table will have four columns: Name , description, Vendor, Delivery Date with each row representing an item.
The problem arises when some details for an item are present at the bottom of one page and the remaining details are on the next page. Example : Description in the table continuing in the page 2 from page 1 bottom , So unable to tag these rows which is continuing from page 1 to page 2.

For example: if this is the pdf
-----some text--------------------------------------------
-----some text---------------------------------------------
code: 1
description: this is first item
Vendor: XYZ1
delivery date: 12.01.2024

code: 102
description: this is second item
Vendor: XYZ2
delivery date: 13.01.2024

code: 103
description: this is third item

-------page 1 ends here---------

-------page 2 begins here--------
description(Continuing from Page): this is third item Continuing
Vendor: XYZ3
delivery date: 14.01.2024

code: 104
description: this is fourth item
Vendor: XYZ4
delivery date: 15.01.2024

code: 105
description: this is fifth item
Vendor: XYZ5
delivery date: 16.01.2024

---------some text here--------------------------------
------------------------------page 2 ends----------------------
------------------------------pdf ends----------------------------

The document cannot be tagged correctly using custom model when page 1 content - Description is continuing on Page 2 . For the above document, the tagged tables look like this
Code Description Vendor Delivery Date
101 this is first item XYZ1 11.01.2024
102 this is second item XYZ2 12.01.2024
103 this is third item XYZ3 13.01.2024

Code Description Vendor Delivery Date
Some text are
continuing
from page 1
104 this is fourth item XYZ4 14.01.2024
105 this is fifth item XYZ5 15.01.2024
@tylerkolota 9 หลายเดือนก่อน
This is a common use-case for this set-up because the GPT prompts generally do a better job determining that text before & after a page break belong to the same item.
Feel free to set it up & test it on your files.
@rameshbabuc5981 9 หลายเดือนก่อน
@@tylerkolota Thanks Tyler , i will look into GPT prompts , if you have such reference could you please provide more details on the GPT prompts.
@tylerkolota 9 หลายเดือนก่อน
@@rameshbabuc5981 Yes that would be this video that you are commenting on and its associated thread / download page where you can get the template: powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-GPT/td-p/2201345
@madhavilatha7881 4 หลายเดือนก่อน
Hi @tylerkolota , Thank you so much for this solution, it is so helpful. Here I am looking for sorting the text by top property along with the coordinates. My pdf documents are scanned tilted which is causing these not coming as expected, especially the tables. I appreciate your input on this.
@tylerkolota 4 หลายเดือนก่อน
@@madhavilatha7881 The template orders the text replica based on wherever the center of the text boxes are. I don’t have any further adjustments to help with a significantly tilted page on this template.
However if you want you can try using a different method with premium HTTP actions to call GPT4o Mini’s image/vision component to extract from documents community.powerplatform.com/galleries/gallery-posts/?postid=73cdb790-11c9-45b7-80d0-b991d1f43f34
@madhavilatha7881 4 หลายเดือนก่อน
@@tylerkolota Thank you for the input. I checked the above approach but I may not be able to go with this approach because of the premium actions and Azure functions logic.
@brentallard2087 ปีที่แล้ว ⁺¹
How would you configure it to work on many pages at once? I'm struggling with the SharePoint Connector to Get file Metadata and Get file Content to pass the File Content to the AI. Any help would be greatly appreciated. Great Template!
@tylerkolota ปีที่แล้ว
It automatically works on many pages if the PDF file content you pass it has many PDF pages. If you have multiple PDF files you want to work on at once, then you may need to combine them beforehand or maybe after a txt conversion on each.
What error are you getting?
@brentallard2087 ปีที่แล้ว ⁺¹
@@tylerkolota totally makes sense about the many pages in a PDF.
The error I am getting is - {"operationStatus":"Error","error":{"type":"Error","code":"InvalidPredictionInput","message":"Input prompt length cannot exceed 15788 characters or 4097 tokens. Please try again with a shorter prompt","properties":{"BackendErrorCode":"InvalidInferenceInput","DependencyHttpStatusCode":"400"},"innerErrors":[{"scope":"Generic","target":null,"code":"TooManyInputTokens","type":"Error","properties":{"maxCharacters":"15788","MlIssueCode":"TooManyInputTokens"}}]},"predictionId":null}
It would appear that the (SharePoint) Get file Content is not extracting the PDF content in the same way in which the (OneDrive) Get file Content to pass to the AI hence why it the error sees too many token.
@tylerkolota ปีที่แล้ว ⁺¹
Yes, it’s going over the token / character limit for prompts.
If you only need select pages for your workflow, I added a new version 1.8 that allows you to customize which page numbers go to the prompt.
@tylerkolota ปีที่แล้ว ⁺¹
@@brentallard2087 A Microsoft Staff member just confirmed that the Create text with GPT action has been updated to use a 16k token model. So this template should now be able to work on 4x as many pages at once!
@brentallard2087 ปีที่แล้ว ⁺¹
How cool is that! Thanks for the update.
@error.muskann ปีที่แล้ว ⁺¹
Hi, Is this able to extract QR code information from any document (pdf or something)?
@tylerkolota ปีที่แล้ว ⁺¹
This only pulls text data. It doesn’t copy QR codes.
@error.muskann ปีที่แล้ว
@@tylerkolota thanks for the quick reply
@saidajimenez2159 ปีที่แล้ว ⁺¹
Hello,
I am trying to create a flow so that when I receive CVs in my email, it automatically saves them in a share point folder.
Up to this point I have a clear flow, there is no major problem.
My problem comes when I want to extract the text found in the PDF of the CV, all the content is saved in a variable but I don't know how to send it back to a sharepoint list in this way to be able to make requests to gpt
Could someone tell me if they can think of how to do it?
@tylerkolota ปีที่แล้ว
Could you explain more about what the content is in the variable & what you mean by sending it to SharePoint?
Are you using this template & the content is the text output? Do you have a multiline text column in SharePoint to send it to?
And why are you trying to save the text to SP instead of going directly to the GPT action?
@MsKaryn ปีที่แล้ว ⁺¹
Is there a way to extract only specific images from a PDF (not text) and classify those images?
@tylerkolota ปีที่แล้ว
Hello Karyn,
This template is mainly for extracting text data from pages of a PDF, but if you want to just extract entire images from within a PDF, then there are 3rd party connectors for that & some AI Builder models can help classify them.
support.encodian.com/hc/en-gb/articles/360006998058-Extract-Images-from-PDF
@saidajimenez2159 ปีที่แล้ว ⁺¹
Hola podrias subir como podria ser para un cv?
@saidajimenez2159 ปีที่แล้ว ⁺¹
Hello, in my country the gpt chat function has not yet been implemented, therefore I have to make an HTTP request to GPT4 chat, I'll tell you.
I have to take a CV, send it to GPT chat and have it return me according to a list of jobs so that three jobs are qualified. That person must also return me first name, last name, address, training, experience and languages spoken of the person.
It returns a json with all this information within a message, therefore it returns a string array, and I have managed to separate all this message within the array but now I need to get the different values and I don't know how to do it
@tylerkolota ปีที่แล้ว
You could manually parse it with expressions or use a Parse JSON action th-cam.com/video/e0dzMqoJGtY/w-d-xo.htmlsi=9lONmcJMMdmH41RS
@antoniocgonzalez8013 ปีที่แล้ว ⁺¹
OMG this is awesome, you sir are a genius. Do you do freelance? how can I contact you?
@tylerkolota ปีที่แล้ว
Thank You!
You can reach out at takolota@gmail.com or on LinkedIn at www.linkedin.com/in/kolota?
@manifesttthat 10 หลายเดือนก่อน
This is some amazing stuff
@tylerkolota 10 หลายเดือนก่อน
Thank you!
It will probably soon be replaced by GPT 4 Turbo’s built in image & pdf functionality, but this did:
1. Get PDF reading capability to people ~1 year sooner.
2. Ensure we all have a way to use less expensive models on PDFs, especially if Microsoft tries to charge extra for it on GPT4.
@tylerkolota 9 หลายเดือนก่อน
Anyone concerned about the amount of pages they can feed the model may want to check a new template using GPT 4 Turbo & Retrieval Augmented Generation (RAG) to expand querying to just about any length document here: powerusers.microsoft.com/t5/Power-Automate-Cookbook/Query-Large-PDFs-With-GPT-RAG/td-p/2650178
@suryagvs9296 ปีที่แล้ว ⁺¹
Hi @tylerkolota9031
I did not find "create text with GPT" in my power automate action list , to get this should i need to activate any feature or prerequests for this.
i'm able to see "GPTPromtengineeringmodel" in predict action, in this action i have given some prompt but i'm getting below error,
{"operationStatus":"Error","error":{"type":"Error","code":"InvalidPredictionInput","message":"Parameters JSON string could not be properly deserialized","properties":{"BackendErrorCode":"InvalidInferenceInput","DependencyHttpStatusCode":"400"},"innerErrors":[{"scope":"Generic","target":null,"code":"InvalidModelParameters","type":"Error","properties":{"MlIssueCode":"InvalidModelParameters"}}]},"predictionId":null}, please guide me on this.
@tylerkolota ปีที่แล้ว
Hello,
The Create text with GPT action is not yet available in all regions.
However if you want to try setting something up before the general availability of the action, and if you are not dealing with sensitive data, then you can try requesting access & setting up an OpenAI API connection so you can send HTTP requests to GPT.
th-cam.com/video/zfbVNUFxVhw/w-d-xo.htmlsi=SfkP2gGts6WaQVIP
learn.microsoft.com/en-us/azure/ai-services/openai/chatgpt-quickstart
techcommunity.microsoft.com/t5/azure-ai-services-blog/working-with-gpt-4-and-chatgpt-models-on-azure-preview/ba-p/3773595
platform.openai.com/docs/guides/gpt
@suryagvs9296 ปีที่แล้ว ⁺¹
Thanks for quick reply
@rameshn1195 ปีที่แล้ว
@tylerkolota9031 I am facing error like this in AI biluder Predict action "{"operationStatus":"Error","error":{"type":"Error","code":"InvalidPredictionInput","message":"Parameters JSON string could not be properly deserialized","properties":{"BackendErrorCode":"InvalidInferenceInput","DependencyHttpStatusCode":"400"},"innerErrors":[{"scope":"Generic","target":null,"code":"InvalidModelParameters","type":"Error","properties":{"MlIssueCode":"InvalidModelParameters"}}]},"predictionId":null}" Could you please assist?

ต่อไป

เล่นอัตโนมัติ

3 Ways to Extract Data from PDFs with Microsoft - ChatGPT, AI Builder, Syntex