I have update the workflow by adding the Train Classifier Scope - This would allow you to train keyword based classifiers in cases where they are unable to classify the document.
Hello Swati! the dataset that is created from the export extraction result is a collection of DataTables. This collection has has two DataTables - *Simple Field* and *Simple Field Formatted* This is the reason you are getting two message boxes. To check the names of the dataTables yourself you can add a message box in the for each loop with "table.TableName"
@@botbotgo4902 Thanks for the response. I did implement the for each loop with "table.TableName" and saw formatted and unformatted output. But if I unchecked the ‘FormatValuesIfPossible’ option in data extraction scope, then there will be duplication of data. How can I get only one set of data here?
A question how do I do so that it no longer shows the percentage or the "validation station" screen because every time it says to select the area 96% and it always takes it well? "Present Validation Station"
if you are so confident about the extraction confidence percentage then no need to use present validation activity in the flow, Directly you can check in the export result in the excel.
Anurag, can you please tell me how to extract line items from the invoice along with these details. I want to write it to excel preferably for a case when each document might have different number of line items
Thanks for a great informative video. Just had a question in mind. If we have to define a keyword.json file for document classification then what us the use of texonomy.json?
Great video. What is not apparently clear for me is the documentPath variable you specified in the digitize document activity. I do not think you showed how you set that up, though I assume it is a variable that has the path of the file, correct ? If yes, alternatively we could also specify the file path directly in the Document path without creating the variable documentPath ? Thank you
Yes you understanding is correct. either we can directly specify the path in document path place or we can create a variable for same and pass it into specify area.
Hi Bro, I am not able to select 5 information on Page 1. I am only able to select one. Are you using shift or ctrl ki to select 5 information... I a,m working on 2 page PDF.Please suggest. I am waiting
I made a process based on your video, but I reported an error in one place: Data Extraction Scope: Index was outside the bounds of the array. And I cant fix it. Can you help me?
Hi, this is a good video. Actually I have question related to the intelligent ocr activities license. Is the activities free or must pay for the licenses. Thank you
Hello @botBotGo, That was a great explanation.currently i am able to extract a single page with specific extraction fields,so how to loop through all pages in a pdf file with similar invoices ?
if you are using community version then u can only process documents with max 2 pages at once. one work around would be use some uipath pdf activities to breakdown your single pdf file into multiple pdf files and then loop through them.
Great video, thanks. One question: if you are satisfied with the results, can you remove the "Present Validation Station" command so it does not prompt the user everytime ? I have dozens of invoices to be processed in an unattended machine.
Hey Year down! Sorry for replying late - I have made a video where i am solving a RPA challenge by extracting data from multiple pdfs - th-cam.com/video/56AOiixQPKY/w-d-xo.html Let me know if this what you were looking for.
@Year Down this is mainly happening because Classifier is not able to classify your document. You would have to validate if the classification worked and if did not work then you need to extract data manually in present validation station. In order to check if classification worked 1. After the classification scope activity add an *IF Activity* 2. In The *IF Activity* check for condition if *classificationResult.Any* is True 3. In the true section move your *data extraction scope* 4. in the false section add an *assign activity* and assign extractionResults = Nothing
Hi may I know why you used both Form Extractor and ML Extractor? And also why does the workflow produce 2 sets of the same data table? What do i do if i only need 1.
Hey Sorry for replying late! 1. *Why i used both extractors* - I wanted to show that it is possible to combine extractors. It could happen that some attributes cannot be accessed by one of the extractors and in such case the other extractor will be used. Also the order of extractor usage is from left to right, that is, if the left most extractor is not able to get a particular attribute (or the confidence score is less than set threshold) only then the next extractor would be used. Also with configure extractor you have the possibility to decide which attributes are to be accessed by which extractor. 2. You get a list of tables (also know as *DataSet* ) and always take the first one from the list. -> *Dataset.tables(0)*
Sir, can you please do it for pan card and aadhar jpg file ??because I have tried lot of time but didn't get and also when i have give the whole folder path it's showing error why is it so I don't know .... Please please help me to do the task where we have some folder of different candidate where each candidate have their own pan and aadhar card image from that need to extract the particular field like aadhar no.,pan no. And store in a file ... If u can store in MySQL that is very good for me but please sir can you do for whole folder to provide in the documentPath variable where each candidate have their own aadhar and pan card. Please i need it please do this.
Hi, while setting up the form extractor, you manually specify the location of document of 2 image (choose 2.jpg) , but at last you change document path location from 2 to 3.jpg. If we are manually specifying the location , how the form extractor fetches correct information!!
hey, file that you uploaded in the form extractor is just for generating a template. So no matter what document you read it will still work till the time the structure or the positions of various elements in the document remain same. Having said that if you try to extract data from an invoice with different structure, the extraction wont work.
@@botbotgo4902 Ok got it One more query, when i tried the same with create doc validation action and wait for validation action and comment out present validation it gives me that error "An extension of type 'UiPath.Activities.Contracts.Persistence.IPersistenceBookmarks' must be configured in order to run this workflow." ( I have created the storage bucket in orchestrator)
hey anurag, thank you for the wonderful explanation. i have one issue with the invoice date, its not comingb proper in csv file. its coming like : Key,Value "Month","5". my actual date in pdf is : May 26/20. it would be great if you could help on this..
Can you please check if the date is available in the text coming out of the digitize document activity. If not then it would not be possible to extract the date from any extractor. Then you might have to try with other OCR engines. If the date is available then you need to do some trial and error with different extractor activities.
Hi Anuraag .This video is very important for RPA beginners. Thank you for this. But I was facing some issue while creating a template after a custom supply to the keyword I'm extracting after configure I can see a long red color error. That even we cannot read.
@@botbotgo4902 I'm working with form extractor. Although I have invoices that includes pdfs, receipts, images , scanned pdf invoices etc of all types which extractors I should use to get the values from all types of invoices
Hello Shalini, You need to install this package before you can use it. To Install go to 06:37 1. go to Manage packages in Studio 2. click on All Packages 3. Search for UiPath.OmniPage.Activities 4. Install it
Hello!! Thanks for the video. This is Rohit S. Lanjewar. Please help me how I can change confidence percentage of each field of Invoices in Present Validation station using Intelligent Form extractor in Document Understanding using UiPath.
Hello Sample Demo!! sorry for replying late. The confidence score is something that you set for a kind of extractor and if any attribute needed to be extracted by this extractor is below this score that field is not extracted. In such cases you can try using combination of extractors, where in if one extractor fails then the next extractor would be used. And if all fail then the user has to explicitly enter it. Did I answer your question?
Page 1 has less than 5 selected words as Page Matching Information. Please select at least 5 words. This notification is appearing on the screen when i am creating template. It gets pop up again and again even after extracting the elements.
Also... seems like a mistake to ASSUME that classification result will only match 1 document type. You never check how many matches it got, and *assume* it's always classificationResult(0)
You overcomplicated the classification keywords by using "Add a new set" instead of just typing the right syntax into the first set to add multiple keywords. No need to have more than 1 set in these examples.
Hello! I followed your tutorial. I am trying to extract data from the receipt using ML extractor. I used "du.uipath.com/ie/receipts" as the end point but I am not getting the dropdown under the ML extractor while defining the attributes of the document to be extracted. Can you please help me solve this?
Thanks for a great informative video. Just had a question in mind. If we have to define a keyword.json file for document classification then what is the use of texonomy.json?
I have update the workflow by adding the Train Classifier Scope - This would allow you to train keyword based classifiers in cases where they are unable to classify the document.
any video explanation?
thank you for this very informative video
Very Nice
Amazing explanation, congratulations!
Glad you liked it!
Great explanations! Thanks a million
Superb explanation, it is clear and clean explanation. Thank you.
Excellent video! Very detailed. It took the complexities out of document understanding. I learnt a lot from this video. Thanks heaps!
Great tutorial, thank you for posting
Excellent Explanation
Glad you liked it
Hey Anurag, awesome work you are putting in 😊👍💪 Kind regards, Anders
Thanks Anders! :)
oh my gosh. i hope you can upload more videos.
Amazing content.Looking forward for more videos which will help us.
Thanks
I am glad :)
Beautyful video, please make note such videos.
Hello, awesome video. Many thanks. I am from Brazil and this was really helpfull. Looking forward for more videos.
Thanks for the feedback Mateus Lyra. Please comment if there are some specific video you are looking for
Very informative!!
Glad it was helpful!
Thanks for the good job you're doing. Please how did you get the endpoint or is it general for everyone?
31:09 I don't see a "due date" on that invoice, yet you seem to have configured a custom area and edited that process out. Seems to me like a mistake.
Where you get this end point while using machine learning extractor.that point I don't understand can u eloborate this point more
Thank you
Hi, good day. Please I can't seem to download those packages you mentioned. Do you have any idea how I can work it out?
great video. I had a question, why does the message box pop up twice with the outputDT string?
Hello Swati!
the dataset that is created from the export extraction result is a collection of DataTables. This collection has has two DataTables - *Simple Field* and *Simple Field Formatted* This is the reason you are getting two message boxes. To check the names of the dataTables yourself you can add a message box in the for each loop with "table.TableName"
@@botbotgo4902 Thanks for the response. I did implement the for each loop with "table.TableName" and saw formatted and unformatted output. But if I unchecked the ‘FormatValuesIfPossible’ option in data extraction scope, then there will be duplication of data. How can I get only one set of data here?
A question how do I do so that it no longer shows the percentage or the "validation station" screen because every time it says to select the area 96% and it always takes it well? "Present Validation Station"
if you are so confident about the extraction confidence percentage then no need to use present validation activity in the flow, Directly you can check in the export result in the excel.
Hey will u do a video on regex based extraction
Could you share the hole project that I want to learn carefully.
Anurag, can you please tell me how to extract line items from the invoice along with these details. I want to write it to excel preferably for a case when each document might have different number of line items
Hi Prema, You can go with Form based extractor in order to extract the line items from table.
Thanks for a great informative video. Just had a question in mind. If we have to define a keyword.json file for document classification then what us the use of texonomy.json?
Taxonomy for identify the fields on what needs to extracted and same is going to extracted by bot using intelligent OCR
Great video. What is not apparently clear for me is the documentPath variable you specified in the digitize document activity. I do not think you showed how you set that up, though I assume it is a variable that has the path of the file, correct ? If yes, alternatively we could also specify the file path directly in the Document path without creating the variable documentPath ? Thank you
Yes you understanding is correct. either we can directly specify the path in document path place or we can create a variable for same and pass it into specify area.
Hi Bro, I am not able to select 5 information on Page 1. I am only able to select one. Are you using shift or ctrl ki to select 5 information... I a,m working on 2 page PDF.Please suggest. I am waiting
What if our pdf have lots of pages and lots of pdf can it extract specific data?
I made a process based on your video, but I reported an error in one place:
Data Extraction Scope: Index was outside the bounds of the array.
And I cant fix it.
Can you help me?
when defining the keywords, make sure that you typed correctly "invoice" , "receipt", "walmart"
Hi, this is a good video. Actually I have question related to the intelligent ocr activities license. Is the activities free or must pay for the licenses. Thank you
Hello @botBotGo,
That was a great explanation.currently i am able to extract a single page with specific extraction fields,so how to loop through all pages in a pdf file with similar invoices ?
if you are using community version then u can only process documents with max 2 pages at once.
one work around would be use some uipath pdf activities to breakdown your single pdf file into multiple pdf files and then loop through them.
Can you do it with hand written documents,it will be helpful for everyone. Thank you
Hello Sushant!
For hand written documents with fixed formats (example - bank account opening form). You can use intelligent form extractor.
@@botbotgo4902 Thank you
@@botbotgo4902 is intelligent form extractor the extractor we used in this video? thanks a lot
Great video, thanks. One question: if you are satisfied with the results, can you remove the "Present Validation Station" command so it does not prompt the user everytime ? I have dozens of invoices to be processed in an unattended machine.
yes you can remove the present validation station in order to skip the human in loop.
Hey Anurag. Thanks for the video.
How to perform this on multiple pdf at time?
you can use for each loop and provide the folder path where the multiple PDF files are there
Hey Year down!
Sorry for replying late - I have made a video where i am solving a RPA challenge by extracting data from multiple pdfs - th-cam.com/video/56AOiixQPKY/w-d-xo.html
Let me know if this what you were looking for.
@@botbotgo4902 data extraction scope index was outside the bounds of the array. I am facing this error
@@patilrc data extraction scope index was outside the bounds of the array.I am facing this issue
@Year Down this is mainly happening because Classifier is not able to classify your document. You would have to validate if the classification worked and if did not work then you need to extract data manually in present validation station. In order to check if classification worked
1. After the classification scope activity add an *IF Activity*
2. In The *IF Activity* check for condition if *classificationResult.Any* is True
3. In the true section move your *data extraction scope*
4. in the false section add an *assign activity* and assign extractionResults = Nothing
Hi may I know why you used both Form Extractor and ML Extractor? And also why does the workflow produce 2 sets of the same data table? What do i do if i only need 1.
Dataset is collect of Datatables, you can try Dataset.Tables(0) and check
Hey Sorry for replying late!
1. *Why i used both extractors* - I wanted to show that it is possible to combine extractors. It could happen that some attributes cannot be accessed by one of the extractors and in such case the other extractor will be used. Also the order of extractor usage is from left to right, that is, if the left most extractor is not able to get a particular attribute (or the confidence score is less than set threshold) only then the next extractor would be used. Also with configure extractor you have the possibility to decide which attributes are to be accessed by which extractor.
2. You get a list of tables (also know as *DataSet* ) and always take the first one from the list. -> *Dataset.tables(0)*
I see. Thanks for the help
Sir, can you please do it for pan card and aadhar jpg file ??because I have tried lot of time but didn't get and also when i have give the whole folder path it's showing error why is it so I don't know .... Please please help me to do the task where we have some folder of different candidate where each candidate have their own pan and aadhar card image from that need to extract the particular field like aadhar no.,pan no. And store in a file ... If u can store in MySQL that is very good for me but please sir can you do for whole folder to provide in the documentPath variable where each candidate have their own aadhar and pan card. Please i need it please do this.
Hi, while setting up the form extractor, you manually specify the location of document of 2 image (choose 2.jpg) , but at last you change document path location from 2 to 3.jpg.
If we are manually specifying the location , how the form extractor fetches correct information!!
hey,
file that you uploaded in the form extractor is just for generating a template. So no matter what document you read it will still work till the time the structure or the positions of various elements in the document remain same.
Having said that if you try to extract data from an invoice with different structure, the extraction wont work.
@@botbotgo4902 Ok got it
One more query, when i tried the same with create doc validation action and wait for validation action and comment out present validation it gives me that error
"An extension of type 'UiPath.Activities.Contracts.Persistence.IPersistenceBookmarks' must be configured in order to run this workflow."
( I have created the storage bucket in orchestrator)
hey anurag,
thank you for the wonderful explanation.
i have one issue with the invoice date, its not comingb proper in csv file.
its coming like : Key,Value "Month","5".
my actual date in pdf is : May 26/20.
it would be great if you could help on this..
Can you please check if the date is available in the text coming out of the digitize document activity.
If not then it would not be possible to extract the date from any extractor. Then you might have to try with other OCR engines.
If the date is available then you need to do some trial and error with different extractor activities.
Hello sir,
I also want to extract the items along with its specified cost in the excel file. Can i do that?
Please help
You can use form based extractor for extract the line items in the table
How can we use Intelligent keyword classifer in Classify Document scope??
Intelligent Keyword Classifier for handwritten documents not for unstructured documents
Hi Anuraag .This video is very important for RPA beginners. Thank you for this. But I was facing some issue while creating a template after a custom supply to the keyword I'm extracting after configure I can see a long red color error. That even we cannot read.
Hello Sourav!
I am sorry but I cannot understand what you mean
@@botbotgo4902 that's cool Anuraag I could solve the error. Is this solution is applicable for images invoices also?
@@botbotgo4902 Hello Anurag Actually I'm using your instructed workflow but, It is not extracting the values always.
@@souravsingh4305 hello saurav!
So where are you facing problems? I mean with which extractor are you working?
@@botbotgo4902 I'm working with form extractor. Although I have invoices that includes pdfs, receipts, images , scanned pdf invoices etc of all types which extractors I should use to get the values from all types of invoices
Man link for invoice file download
Can you share me a slide ? Video is interesting and helpful. Thank you !!!
i am not gettgin omnipage OCR in my activity panel
Hello Shalini,
You need to install this package before you can use it.
To Install go to 06:37
1. go to Manage packages in Studio
2. click on All Packages
3. Search for UiPath.OmniPage.Activities
4. Install it
@@botbotgo4902 Yes, It is. Thanks for your prompt response. :)
Hello!! Thanks for the video. This is Rohit S. Lanjewar. Please help me how I can change confidence percentage of each field of Invoices in Present Validation station using Intelligent Form extractor in Document Understanding using UiPath.
Hello Sample Demo!!
sorry for replying late. The confidence score is something that you set for a kind of extractor and if any attribute needed to be extracted by this extractor is below this score that field is not extracted. In such cases you can try using combination of extractors, where in if one extractor fails then the next extractor would be used. And if all fail then the user has to explicitly enter it.
Did I answer your question?
Page 1 has less than 5 selected words as Page Matching Information. Please select at least 5 words.
This notification is appearing on the screen when i am creating template. It gets pop up again and again even after extracting the elements.
Hello Shalini!
You need to select 5 keywords on the page.
Please watch from 30:00
@@botbotgo4902 I did it in the same way. Let me recheck again if I am doing any mistake
Also... seems like a mistake to ASSUME that classification result will only match 1 document type. You never check how many matches it got, and *assume* it's always classificationResult(0)
You overcomplicated the classification keywords by using "Add a new set" instead of just typing the right syntax into the first set to add multiple keywords. No need to have more than 1 set in these examples.
Hello! I followed your tutorial. I am trying to extract data from the receipt using ML extractor. I used "du.uipath.com/ie/receipts" as the end point but I am not getting the dropdown under the ML extractor while defining the attributes of the document to be extracted. Can you please help me solve this?
Thanks for a great informative video. Just had a question in mind. If we have to define a keyword.json file for document classification then what is the use of texonomy.json?