Training UiPath Document Understanding ML Models - Data Manager - Part 1 | RPA

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ก.ค. 2024
  • Here comes the long-awaited series on how to train the Machine Learning models for Document Understanding.
    In this part of the video, we will mainly focus on how we can use the Data Manager to label documents and train it for invoices. I know the invoice model is already available. But doing this gives you an idea of how to train a model for any type of document (by using the generic DocumentUnderstanding model) apart from what is readily available in AI Center.
    ▬ Contents of this video ▬▬▬▬▬▬▬▬▬▬
    0:00 - Introduction
    1:09 - Explaining about Out-of-the-box DU models and Creating First Package
    04:05 - Explaining Model Schema and Creating ML Skill
    07:02 - Setting up Data Labeling Session
    09:28 - Showcasing Fields Used in Taxonomy and How it relates to ML Skill Fields
    14:55 - Configuring Data Labelling Session (Data Manager)
    28:39 - Import Documents in Data Manager
    30:55 - Labeling Fields in Documents in Data Manager
    45:01 - Exporting Labelled Data from Data Manager
    50:22 - Creating Training Pipelines to Train Model Based on Labeled Data
    In the next video, we will look at how we can use the validations done in the DU workflow to easily train the document understanding models by uploading the validated data into Data Manager.
    Additional Reading:
    docs.uipath.com/document-unde...
    docs.uipath.com/document-unde...
    #UiPath #MachineLearning #DocumentUnderstanding #ArtificialIntelligence #RPA #UiPathCommunity
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 117

  • @MukeshKala
    @MukeshKala 3 ปีที่แล้ว +2

    Finally , Added to my Watchlist for Weekend !

    • @LahiruFernando
      @LahiruFernando  3 ปีที่แล้ว

      haha.. awesome!! Share your feedback too!! :)

  • @himanshusonawane9204
    @himanshusonawane9204 3 ปีที่แล้ว +2

    Very much helpful . Waiting for rest of the series too .

    • @LahiruFernando
      @LahiruFernando  3 ปีที่แล้ว +1

      Thank you so much!! I will upload the next parts soon!! Stay tuned :)

  • @PadurariuDragosh
    @PadurariuDragosh 3 ปีที่แล้ว +1

    Best ML Models video so far !

  • @theerthapadman4397
    @theerthapadman4397 2 ปีที่แล้ว +1

    very informative. thanks a lot

  • @cognitiveautomationwithuip9159
    @cognitiveautomationwithuip9159 3 ปีที่แล้ว +1

    Thank you Lahiru for this detailed session. Waiting for the rest sessions of this series...

  • @CelticDarkArrow
    @CelticDarkArrow 3 ปีที่แล้ว +1

    EXCELLENT!!!! Looking forward for the next videos!

    • @LahiruFernando
      @LahiruFernando  3 ปีที่แล้ว +1

      Thank you so much for the amazing words!!! Means a lot!

    • @CelticDarkArrow
      @CelticDarkArrow 3 ปีที่แล้ว +1

      @@LahiruFernando I'm making my first custom ml model right now following your steps ;-)
      I have a few questions concerning parrallell procesing and excel files creations using document understanding actions...may i have your contact and send you an email? Thank you very much in advance

    • @LahiruFernando
      @LahiruFernando  3 ปีที่แล้ว

      @@CelticDarkArrow
      Yea sure..
      Let me share my email here, and once you are there, I will share my number so we can connect to discuss :)
      My email: lahirufernando90@gmail.com

  • @utdps123
    @utdps123 2 ปีที่แล้ว +1

    Nicely explained! Thank you so much Lahiru!

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว

      Thank you so much for the awesome feedback my friend

  • @vijaymakwana9287
    @vijaymakwana9287 3 ปีที่แล้ว +1

    Thank you for clear explanation 🙏🙏

    • @LahiruFernando
      @LahiruFernando  3 ปีที่แล้ว

      Thank you so much for the feedback Vijay! Really means a lot!

  • @divyakolekar5744
    @divyakolekar5744 3 ปีที่แล้ว +2

    Thanku Sir .....This is very helpfull to us. The way ur are explaining is too good.

    • @LahiruFernando
      @LahiruFernando  3 ปีที่แล้ว +1

      HI Divya,
      Thank you so much for your awesome thoughts!!! This really means a lot!
      Thanks again..
      Also, please feel free to tell me if you feel that I should cover any specific topic apart from what you see in the channel :)
      Would be more than happy to take it up

  • @KunalKumar-tt4bt
    @KunalKumar-tt4bt 2 ปีที่แล้ว +1

    Impressing sir Really loved working on this

  • @Artech.Ranjit
    @Artech.Ranjit ปีที่แล้ว +1

    great explanations..

  • @balduin_544
    @balduin_544 4 หลายเดือนก่อน +1

    Hello sir
    I am glad that I found your videos. This helps me a ton for my bachelor thesis. I know this video is older, some things within Ui Path and the AI Center look a bit different now, but I still managed to do it. Very Helpful video. Thank you!❤

    • @LahiruFernando
      @LahiruFernando  4 หลายเดือนก่อน

      Hi,
      Thank you so much for sharing your thoughts. This means a lot.
      Yes, the video is a bit old now and things are slightly different. But still, the concept we follow is the same. That’s the main thing we need to learn.. Great to hear you still find it helpful. Thanks once again for sharing

  • @mahantesh8058
    @mahantesh8058 2 ปีที่แล้ว +1

    Thank you sir

  • @rachitachauhan2650
    @rachitachauhan2650 3 ปีที่แล้ว +1

    When will you upload the second part? Please upload it soon....Thank you so much for all the wonderful tutorials.

    • @LahiruFernando
      @LahiruFernando  3 ปีที่แล้ว

      Hi Rachita,
      I will try to upload it within this weekend 😀

  • @geetishreerao6033
    @geetishreerao6033 3 ปีที่แล้ว +2

    Hi Lahiru,
    Hope you are doing good.
    A wonderful detailed session. They are always informative and logical. They have been very helpful to me in understanding UIPath Document Understanding (New AI ML Models).Specially training a Document Understanding ML Model(Classifier and ML Skill Training).
    Please keep posting such beautiful sessions.
    I have downloaded some sample invoices while trying out this session. It would be really helpful if you could share the sample invoice file links to try training with them as well
    Thanks!!
    P.S:Thanks for sharing your knowledge.

    • @LahiruFernando
      @LahiruFernando  3 ปีที่แล้ว +1

      Hey Geetishree
      Thank you so much for you amazing words.. Im really glad that I was able to help and it means a lot!
      About the sample invoices, sometimes it can be difficult to find some samples online. But always you can create simple, but your own invoices through free websites.. I did create some through the links below
      invoice-generator.com/#/2
      www.zoho.com/invoice/free-invoice-generator.html
      There are many sites like this which you can try out..
      Just search for "online invoice generator"

    • @geetishreerao6033
      @geetishreerao6033 3 ปีที่แล้ว +1

      @@LahiruFernando Thank you for sharing the links... :) .Searching for samples is quite difficult..Will use these links to generate them.

  • @saileshtiwari8242
    @saileshtiwari8242 3 หลายเดือนก่อน

    Can we add extra field which is not is schema.json file

  • @jacobchiengsh2210
    @jacobchiengsh2210 2 ปีที่แล้ว +1

    Hi Lahiru, Thank a lot for the amazing video tutorial on Machine learning. I have been looking around all the videos in the youtube and find out that this video are most exciting, helpful and detail.
    I have few question would like to ask you about the machine learning:
    1. I actually want to extract more detail than the schema provided in the invoice. For example, in the schema, there are no other field, like section activity and ticket number. May i know if i can add into the schema manually? and upload into the AI center? Or i need to create document understanding from scratch? If i can add into the json file manually, is there any extra steps that i need to do?
    2.If the above step is allowed, can i replace the schema which was previously uploaded in the AI center? If the above step are not, what is your recommendation so that we can add more fields?
    3.Based on your experience, how many documents are needed to train the ML model well? Understand that the AI center require 10 documents at least.🙂

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว

      Hello Jacob,
      Sorry for my late reply. I was on vacation for last 2 weeks :)
      Happy New Year!!
      Regarding your questions, below are my answers :)
      1. yes, you can add more fields to the existing schema through Data Manager. However, since the model does not know how to pick values to the new fields that yo are adding, you will need to do training on the model through Data Manager.
      2. As mentioned, the additional step would be training the model. So you will need to do the following.
      - Create a Data Manager session
      - Import the new schema with your new fields
      - Import documents and perform data labeling
      - Export the labeled dataset
      - Run the training pipeline (make sure to select the minor version in the training pipeline to 0 always)
      - Update the Skill to the latest once the pipeline is successful
      3. It depends on your scenario. you can start with the following. For each layout, at least try to get 10 documents. That would be a great start :)

  • @nishitkasbi5033
    @nishitkasbi5033 ปีที่แล้ว +1

    Hi Lahiru,
    Amazing video as always. I have been trying to learn UiPath from your videos and it has been extremely helpful.
    I had a question regarding the unlabelled pages while uploading the documents in the data labelling session.
    So the document from which i need to extract data is 10 pages long but I only need the information from pages 1,3 and 5 which have 2 extractions each, Can i just upload these 3 pages instead of uploading all 10 per document, in the data labelling session?

    • @LahiruFernando
      @LahiruFernando  ปีที่แล้ว +1

      Hey Nishit
      Thank you so much for your feedback. Really means a lot and happy to hear that what i shared was helpful. 😀
      About the question:
      I think it's best if you can upload only the pages (1.3 and 5) separately (as three separate documents) and do the labeling on those. It's not needed to upload the full document because we only focus on specific things..
      Hope this helps

  • @keerthikumar1869
    @keerthikumar1869 ปีที่แล้ว +1

    I have a scenario, same address is used as billing and shipping how to set field in labeling document understanding

    • @LahiruFernando
      @LahiruFernando  ปีที่แล้ว

      Hi.. Sorry for the late reply. I was out of town and just came back yesterday.
      Is this the case for all the documents you have? Or is it happening only for a selected set of documents coming from specific vendor?
      You might want to label those to either one of those fields, and do the validation step and assign the same value to the other field for those types. We have done something similar in one of our projects

  • @archanak3577
    @archanak3577 2 ปีที่แล้ว +1

    Thanks Lahiru for this video. In data manager, while labelling values to the fields, once any field value is selected it gets locked. Can we change the selected value?

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว

      Hi Archana,
      Yes it is possible to change. All you need to do is, select the new value that you need to assign to the field, and press the shortcut key. It will overwrite the value in the field

  • @lorenzoreybuenan3360
    @lorenzoreybuenan3360 ปีที่แล้ว +1

    Hi Lahiru,
    Thank you so much for this detailed session. May I know where can I find samples of invoices files or do you have link for the input files that you used for this demo? Thanks a lot in advance

    • @LahiruFernando
      @LahiruFernando  ปีที่แล้ว

      Hello Lorenzo,
      Im sorry for the late reply. I actually created these invoices myself for the demo. You can try some of those free online invoice creating websites to generate some samples. That is very easy.

  • @kirankumar-tn8om
    @kirankumar-tn8om 11 หลายเดือนก่อน +1

    if i want to use the pipeline for training for new dataset which created new folder in initial data set

    • @LahiruFernando
      @LahiruFernando  11 หลายเดือนก่อน

      You can add the new files to the same labeling session and export everything

  • @rachitachauhan2650
    @rachitachauhan2650 3 ปีที่แล้ว +1

    I am having an issue with the DU. We are extracting a table from the PO invoices but we are having trouble with Line items. They are not getting extracted in the Output data set. I have written this code: out_OutputDataSet.Tables("Simple Fields - Formatted")

    • @LahiruFernando
      @LahiruFernando  3 ปีที่แล้ว

      Hi Rachita,
      Can you try the following:
      - Export Extraction Results activity gives you a DataSet
      - Use For Each loop with DataSetVariable.Tables and the type argument property set to DataTable
      - in the body of the loop use a Write Range with following properties
      - File Path (give your file path)
      - Range: ""
      - SheetName: item.TableName
      Try this to see whether you get line items written in the excel..
      Let me know if it is not working

  • @prakashr9493
    @prakashr9493 2 ปีที่แล้ว +1

    Hello Sir! Can you please show us how to do an evaluation run with the trained package that shows us the confidence score please?

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว

      Hello Prakash.. sure bro... I can create a video on this.. I will prepare one and publish soon 😀

  • @michaelgrade8188
    @michaelgrade8188 3 ปีที่แล้ว +1

    Thanks Lahiru for this video! Great job!
    One question:
    I have one document type but with different classes (let's say from different suppliers). Would you advise creating one separate Data Labeling Session for each supplier (since the same field /information may be in a table for one, but not in a table (but rather regular field) for another)?
    Because creating one common layout at the start (defining regular fields, column fields,...) for all classes/suppliers seems difficult (because they differ from class to class). My guess is that for the ML model later, differentiating makes more sense (even though it's the same basic information)...

    • @LahiruFernando
      @LahiruFernando  3 ปีที่แล้ว +4

      Hi Michael,
      This is a great question.
      Let me answer it in this way...
      So, taking your first question, (item may be in a table, and sometimes as a regular field)
      In this scenario, if it is in a table, does it repeat itself for each line? or is it just one line always even though it resides inside a table?
      If it is always just one value, you can create a regular field for both scenarios and train it accordingly. We can only use Column fields for the items that always repeat themselves in the document as rows.
      Second question:
      We can create multiple ML models for different vendors. But imagine a scenario where we are having 100's of vendors. In such case, we will end up having so many ML models which will actually lose the purpose :)
      So, ideal scenario, its good to go with one model for all vendors.
      For the fields that do not appear in the documents all the time:
      During the initial training, we have to make sure we use a set of documents where we have all the fields that we need to extract and train. Later once you have trained the model and working fine, then you can decide to leave out the documents that do not have everything and use the rest for the training.
      Hope this makes sense?
      Let me know if anything is not clear :)

    • @michaelgrade8188
      @michaelgrade8188 3 ปีที่แล้ว

      @@LahiruFernando Overall, yes, makes sense. Understood - thank you!
      "item may be in a table, and sometimes as a regular field" - Yes, unfortunately, one time the information is in a table and repeats itself each line (e.g. think about units that are different for each row)...and another time the unit is the same for each row and thus is actually a header (in the column)...
      So you see that in one document a table (column field) would be right, in another document a regular field would be right...this was the issue I was having...
      I guess would I could do is, as your general answer suggests, make a 150% version and add a column field AND a regular field for the same information...and in one class column is filled and in another class the regular us filled...

    • @caamisccb
      @caamisccb ปีที่แล้ว +1

      @@michaelgrade8188 I got the same situation! Did it work for you to have both types of field (regular and column for the same information?)

    • @caamisccb
      @caamisccb ปีที่แล้ว +1

      @@LahiruFernando What is your recomentaion for the situation bellow?
      I guess would I could do is, as your general answer suggests, make a 150% version and add a column field AND a regular field for the same information...and in one class column is filled and in another class the regular us filled...

    • @LahiruFernando
      @LahiruFernando  ปีที่แล้ว +1

      Hi @@caamisccb
      You asked this question on Linkedin too, but I couldn't reply to you. Got held up with some work...
      So, about this question, Yes, as the recommendation, we can handle this kind of scenarios in the same data labeling session. I have come across this too in one project which I did that had 1000+ suppliers.
      We had the same data labeling session, but included fields in both regular and column for handling those variations (same column in regular for one supplier, and as a column for another).
      Then, we wrote a validation logic in the workflow, to capture where the value is, and get it from there.
      For example, in my scenario, the tax information in the invoice was sometimes mentioned in the regular field, column field, and sometimes in both. So We did create fields to handle both, and got it through the validation logic.
      In order to train the models to capture those properly, have some good number of samples for each layout, and train them accordingly. This way the model learns from where to capture based on the layout

  • @khooshbujani4949
    @khooshbujani4949 2 ปีที่แล้ว +1

    Thanks Lahiru. Lahiru I have a question. Do you give online training? If yes, can you please give me the information about that?

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว

      HI.. Yes I do.. We can connect on email or whatsapp so we can discuss the trainings you need. Let's plan a time so I can do the training for you :)
      Ping me on: lahirufernando90@gmail.com

  • @thanuthomas7003
    @thanuthomas7003 2 ปีที่แล้ว +1

    Hi Lahiru, Is this training in data manager is required for out of the box packages like invoicesindia or invociesaustralia? Is it good enough to train from the workflow itself using human in the loop approach?

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว

      Hello Thanu,
      A very good question. The out-of-the-box packages such as Invoices, Purchase Orders are pre-trained by UiPath. Hence, those will do its best to predict the values during execution. However, you can still train those to better suit your requirements through Data Manager. Further, sometimes these pre-trained models might not work on our invoices with high accuracy as it may not be trained on that specific stuff. Hence, its good to use Data Manager and do another round of training.
      When it comes to the initial training - it is recommended to us the Data Manager to do the training as it captures a lot more data than the Action Center. Once you have the initial model, then you can use the Action Center for fine-tuning the model.
      You can also refer to this video as it explains this concept:
      th-cam.com/video/DWAa6XPeOrE/w-d-xo.html

    • @thanuthomas7003
      @thanuthomas7003 2 ปีที่แล้ว +1

      Thank You Lahiru. Your videos' are helping me a lot! I have few more queries.
      I have more than 100 different invoice formats to train.
      1. How much in each format do you think is required for initial training?
      2. is it a good approach to train a smaller set of different invoice formats (lets say 5 different formats) and add up other formats gradually?
      3. Is it required to use the same OCR used in document manager to be used in the workflow?
      4. when you do labelling in data manager, if the data is repeating in same or multiple pages, is it is required to label all the data repeated?

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว

      @@thanuthomas7003 Thank you so much for your thoughts about the videos. Means a lot..
      Great set of questions.. let's pick one by one..
      1. In each format, try to get at least 10 documents so that you can train for those. so lets say you have 10 different formats. You get 10 from each (which is equal to 100) and do the labeling for all of those..
      2. It is okay to add different layouts to the template. But try to include as much as possible so you don't need to do multiple retrainings..
      3. Not specifically. You can use any OCR.
      4. It is always a good idea to try to capture all those possible scenarios as it may help the extraction process.

    • @thanuthomas7003
      @thanuthomas7003 2 ปีที่แล้ว

      @@LahiruFernando Awesome!! Thank you!

  • @yashobantadash6670
    @yashobantadash6670 2 ปีที่แล้ว +1

    Great bro! What are some use cases of du used?

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว +1

      Hey.. sorry for the late reply. I have worked on several use cases.
      1. Invoice processing for finance and accounting teams. This use case had many different countries, many languages, and complex business rules for each country.
      2. Worked on government sector project. This process requires classifying more than 150 document types and extracting data from documents to update downstream applications. This was one of the most complicated projects as the documents were structured, semi structured and largely, unstructured

    • @yashobantadash6670
      @yashobantadash6670 2 ปีที่แล้ว +1

      @@LahiruFernando wow! can you share in mail in zip, so that i can practice?if it is not classified then 😊

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว +1

      @Yashobanta Dash Cannot share that because it is specific to the client.
      However, when it comes to invoices, you can easily find invoices of different languages online so you can practice. I can also guide you on this..

    • @yashobantadash6670
      @yashobantadash6670 2 ปีที่แล้ว

      @@LahiruFernando i will try! thanks for your guidance.😊

  • @divyakolekar5744
    @divyakolekar5744 3 ปีที่แล้ว +1

    Hello Sir, How much time it takes to change status of pipeline Running to Successful

    • @LahiruFernando
      @LahiruFernando  3 ปีที่แล้ว

      Hi, Divya, it could depend on the number of training data you have given for the pipeline. Usually a pipeline run could take several hours and maximum 2 days as well..

    • @divyakolekar5744
      @divyakolekar5744 3 ปีที่แล้ว +1

      @@LahiruFernando Thanku Sir

  • @ajaydahiya1977
    @ajaydahiya1977 2 ปีที่แล้ว +1

    Hi Lahiru Sir, My package version is coming 9.0 in AICenter,I have just upgraded to enterprise edition still new version is not coming.Can u tell me how to upgrade aicenter so as to get 11.0 version

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว +1

      Hey Ajay,
      You are using the cloud version right?
      hmm... you should get the latest version automatically. Can you see which version it shows in the AI Center on the bottom left once you are in the AI Center screen?
      My one is showing as 21.11.2

    • @ajaydahiya1977
      @ajaydahiya1977 2 ปีที่แล้ว +1

      @@LahiruFernando Sir it is same 21.11.2 even when i am using Invoice India it is showing Version 8.0 whereas my other colleague has latest version

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว +1

      @@ajaydahiya1977 that's weird.. Probably you'll need to create a ticket for uipath. You will get their tech team support because up have the enterprise license.
      Let me know if you need any help doing that

    • @ajaydahiya1977
      @ajaydahiya1977 2 ปีที่แล้ว +1

      @@LahiruFernando Sir i am using Enterprise trial version for a POC..I am not able to find what to fill in the licensed 18 digits box while raising a ticket..Can u please help me in raising a ticket to upgrade aicenter

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว

      @@ajaydahiya1977 Hey.. sorry for the late reply.. You can use the Support ID for that field..
      You can get it in the Automation Cloud, under Admin -> Organization Settings section.

  • @TharaRaman
    @TharaRaman 3 ปีที่แล้ว +1

    Hello Sir, I am using icore 7 processor new Dell system. Taxonomy window is blank when it is opened. Only when I minimize and maximize all the data are updated and visible in the screen. What's the problem behind it? I am not able to use Taxonomy as well as Present validation screen. Since I am started using Taxonomy this issue exist. Kindly assist on this. whether this is related to .Net Packages or system configuration?

    • @LahiruFernando
      @LahiruFernando  3 ปีที่แล้ว

      Hi Thara,
      I heard this issue from many people in the community. I think this is a problem with the system resolution settings. I believe you have a new machine that has a resolution higher than 1980 x 1080?
      If so this could happen to you.. I think it is already reported to UiPath and they will soon fix it :)

    • @TharaRaman
      @TharaRaman 3 ปีที่แล้ว +1

      @@LahiruFernando Exactly, I tried with other lower setting also. No benefit. But this had been addressed in the forum very long back but still not resolved :( Since this is an environmental issue and I am unable to Procced further. Taxonomy.JSON file I create in another system and loading into my workflow. Thanks

    • @LahiruFernando
      @LahiruFernando  3 ปีที่แล้ว

      @@TharaRaman Yeah.. I got the same comments from many.. What they do is maximize and restore to normal size again.. it is bit annoying.. but for now I don't think there is another solution...

    • @TharaRaman
      @TharaRaman 2 ปีที่แล้ว +1

      @@LahiruFernando Just to share with you, Once updated my graphical driver it is working fine for me. Thanks

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว

      @@TharaRaman Thank you so much for sharing the solution :)
      I'm sure this will help many.. Thanks again!

  • @khooshbujani4949
    @khooshbujani4949 2 ปีที่แล้ว +1

    how to work with the format which is not defined in the Out of the Box DU model

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว

      A very good question. for such cases, we can use the generic "DocumentUnderstanding" model that we have under out-of-the-box DU models. This one is not trained, and we can use the Data Manager to train it to any kind of document that we have.
      The steps are:
      - Create the DU package
      - Create a data labeling session
      - Start labeling and export the data as explained in the videos
      - Perform a training run to train the model
      - Deploy the skill

    • @satyajeetkumar8816
      @satyajeetkumar8816 2 ปีที่แล้ว +1

      @@LahiruFernando Thanks for the excellent video. Have you creeated a video which covers this use case?

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว +1

      @Satyajeet Kumar Hi..
      Yes.. there is no much of a difference in the way we label data for a pre-trained model like invoices and the generic DU model. Only thing different is, you need to create the generic DU package, start a labeling session, define the fields and train it.
      The same flow as shown in this video 😀

    • @satyajeetkumar8816
      @satyajeetkumar8816 2 ปีที่แล้ว

      @@LahiruFernando Thanks for your quick response. I found that video. Will try and let you know how it goes.Thanks again.

  • @anandhavallisankaranarayan5424
    @anandhavallisankaranarayan5424 2 ปีที่แล้ว +1

    Hi Sir, I am getting Failure while Creating the Pipeline. I follwed all the steps as you have mentioned. Can you let me know how to fix it. Thried it thrice.

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว

      Hi.. Can you check the log messages in the pipeline and tell me the reason for the failure? The log contains some info describing the reason... That could help us figure out what needs to be done..

    • @anandhavallisankaranarayan5424
      @anandhavallisankaranarayan5424 ปีที่แล้ว +1

      ​@@LahiruFernando ​
      “/workspace/model/microservice/train.py”, line 69, in process_data
      self.trainer.preprocess_dataset()
      File “”, line 49, in preprocess_dataset
      Exception: Dataset preprocess Failed
      2022-07-06 00:59:47,362 - uipath_core.trainer_run:main:73 - INFO: Starting training job…
      2022-07-06 00:59:50,793 - matplotlib:_get_config_or_cache_dir:484 - WARNING: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-2d6j2p4z because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
      2022-07-06 00:59:51,569 - matplotlib.font_manager:_load_fontmanager:1443 - INFO: generated new fontManager
      2022-07-06 00:59:52,830 - uipath_core.logs.upload_log_service:upload_logs_file:87 - INFO: Retry Training Triggered:
      2022-07-06 00:59:53,952 - uipath_core.storage.azure_storage_client:download:112 - INFO: Dataset from bucket folder training-49aa85ea-2b71-479f-aae1-db1d6c2f3371/9877f90d-d8f6-4f8b-95b6-872d6e8ef9b7/ca150876-0c82-455d-81d1-9e082800ce2b/export/invoice_healthcare_1_22-07-05T212300 with size 44 downloaded successfully
      2022-07-06 00:59:53,953 - uipath_core.training_plugin:train_model:114 - INFO: Start model training…
      2022-07-06 00:59:53,953 - uipath_core.training_plugin:initialize_model:108 - INFO: Start model initialization…
      2022-07-06 00:59:53,954 - root:initialize_package:145 - INFO: Using package type provided by runtime argument with value: invoices
      2022-07-06 00:59:53,954 - root:initialize_package:154 - INFO: Initializing invoices package options …
      2022-07-06 00:59:53,955 - root:configure_options:158 - INFO: Document type invoices language: en
      2022-07-06 00:59:53,955 - root:initialize_tokenizer:49 - INFO: Loading cached BERT tokenizer at /workspace/model/microservice/bert-base-multilingual-uncased_tokenizer
      2022-07-06 00:59:54,028 - root:initialize_package:159 - INFO: System-Level Configuration:
      2022-07-06 00:59:54,029 - root:initialize_package:160 - INFO: ATen/Parallel:
      at::get_num_threads() : 3
      at::get_num_interop_threads() : 2
      OpenMP 201511 (a.k.a. OpenMP 4.5)
      omp_get_max_threads() : 3
      Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
      mkl_get_max_threads() : 3
      Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
      std::thread::hardware_concurrency() : 4
      Environment variables:
      OMP_NUM_THREADS : 3
      MKL_NUM_THREADS : [not set]
      ATen parallel backend: OpenMP
      2022-07-06 00:59:54,030 - root:configure_options:158 - INFO: Document type invoices language: en
      2022-07-06 00:59:54,030 - uipath_core.training_plugin:initialize_model:111 - INFO: Model initialized successfully
      2022-07-06 00:59:54,030 - root:log_data_version_info:13 - INFO: =========Data version information=========
      2022-07-06 00:59:54,044 - root:log_data_version_info:17 - WARNING: Unknown data version:
      2022-07-06 00:59:54,044 - root:log_data_version_info:17 - INFO: ==========================================
      2022-07-06 00:59:54,045 - root:preprocess_data:603 - INFO: Creating dataset for document type invoices…
      2022-07-06 00:59:54,277 - root:preprocess_data:605 - INFO: Doctype invoices Statistics:
      2022-07-06 00:59:54,277 - root:preprocess_data:608 - INFO:
      Extraction fields:
      tag = 9129
      tag[billing-name] = 50
      tag[invoice-no] = 10
      tag[total] = 10
      tag[due-date] = 10
      Subsets:
      subset[TEST] = 10
      2022-07-06 01:00:09,418 - root:preprocess_data:676 - INFO: train: (0, 15) pages
      2022-07-06 01:00:09,418 - root:preprocess_data:677 - INFO: test: (0, 15) pages
      2022-07-06 01:00:09,418 - root:preprocess_dataset:49 - ERROR: Dataset preprocess Failed
      Traceback (most recent call last):
      File “”, line 48, in preprocess_dataset
      File “”, line 143, in init
      File “”, line 31, in init
      File “”, line 678, in preprocess_data
      AssertionError: Training and / or validation set is empty, verify that training / validation split is correctly set
      2022-07-06 01:00:09,422 - uipath_core.training_plugin:model_run:150 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Dataset preprocess Failed
      2022-07-06 01:00:09,427 - uipath_core.trainer_run:main:90 - ERROR: Training Job failed, error: Dataset preprocess Failed
      Traceback (most recent call last):
      File “”, line 48, in preprocess_dataset
      File “”, line 143, in init
      File “”, line 31, in init
      File “”, line 678, in preprocess_data
      AssertionError: Training and / or validation set is empty, verify that training / validation split is correctly set
      During handling of the above exception, another exception occurred:
      Traceback (most recent call last):
      File “/model/bin/uipath_core/trainer_run.py”, line 85, in main
      wrapper.run()
      File “/workspace/model/microservice/training_wrapper.py”, line 64, in run
      return self.training_plugin.model_run()
      File “/model/bin/uipath_core/training_plugin.py”, line 151, in model_run
      raise e
      File “/model/bin/uipath_core/training_plugin.py”, line 143, in model_run
      self.run_train_only()
      File “/model/bin/uipath_core/training_plugin.py”, line 212, in run_train_only
      self.train_model(self.local_dataset_directory)
      File “/model/bin/uipath_core/training_plugin.py”, line 116, in train_model
      self.model.train(directory)
      File “/workspace/model/microservice/train.py”, line 36, in train
      self.process_data()
      File “/workspace/model/microservice/train.py”, line 69, in process_data
      self.trainer.preprocess_dataset()
      File “”, line 49, in preprocess_dataset
      Exception: Dataset preprocess Failed

    • @LahiruFernando
      @LahiruFernando  ปีที่แล้ว

      @@anandhavallisankaranarayan5424 Hi.. Sorry for my late reply..
      I checked your error message. It looks like you are using Train and Evaluate option. When you select both, you need to have datasets for both training and evaluation in two different folders. From the error message, it seems like either one of those datasets are empty.
      Try doing only the training part first and see if that works for you without any errors. In the pipeline creation, select "Training Run" option and provide the dataset you generated from the Document Manager and see if it works...
      Let me know if you still get any errors. Im happy to help. Sorry again for the late reply.. Got held up with some personal issues..

    • @anandhavallisankaranarayan5424
      @anandhavallisankaranarayan5424 ปีที่แล้ว +1

      @@LahiruFernando 2022-07-20 12:25:58,099 - uipath_core.training_plugin:model_run:150 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Dataset preprocess Failed
      2022-07-20 12:25:58,121 - uipath_core.trainer_run:main:90 - ERROR: Training Job failed, error: Dataset preprocess Failed
      Traceback (most recent call last):
      File "", line 48, in preprocess_dataset
      File "", line 143, in __init__
      File "", line 31, in __init__
      File "", line 678, in preprocess_data
      AssertionError: Training and / or validation set is empty, verify that training / validation split is correctly set
      I did Train_only option . But Still the error persists

    • @LahiruFernando
      @LahiruFernando  ปีที่แล้ว

      @@anandhavallisankaranarayan5424 It seems like the dataset is empty. Did you export the training data from Document Manager before running the pipeline?
      Maybe we can connect about this in an email conversation so we can share screenshots.. Can you mail me? You can find my email in the "About" section of the channel

  • @jyothivadlamani4843
    @jyothivadlamani4843 ปีที่แล้ว

    Hi iam trying to upload pdfs but its showing me this error as failed to load document

    • @LahiruFernando
      @LahiruFernando  ปีที่แล้ว

      Hi Jyothi,
      Does it say anything else that may refer to the reason of the failure?
      Hope you don't have any PDF files that are password protected or corrupted in some way...

    • @jyothivadlamani4843
      @jyothivadlamani4843 ปีที่แล้ว

      @@LahiruFernando no. actually I created invoices for practice. But its not loading at all. Is their anyway where I can share you the pdf invoice which I have created.

    • @LahiruFernando
      @LahiruFernando  ปีที่แล้ว

      @@jyothivadlamani4843 hmm... that's weird... Yes, you can share via my email below..
      email: lahirufernando90@gmail.com

  • @rajeenasuresh
    @rajeenasuresh 2 ปีที่แล้ว +1

    Hi
    For me data labelling deployment keeps failing.
    Thanks in advance
    Rajeena

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว

      Hello Rajeena,
      Can you tell me what the error you are getting is? You can check this in the logs section

    • @rajeenasuresh
      @rajeenasuresh 2 ปีที่แล้ว +1

      @@LahiruFernando I cannt see any error in the logs , it has info on ML package created successfully.
      Data labelling session shows status as failed

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว

      @@rajeenasuresh Hi.. Sorry for the late reply. Can you share a screenshot of this with me via my email? Not exactly sure what you are referring to..
      My email is: lahirufernando90@gmail.com

    • @rajeenasuresh
      @rajeenasuresh 2 ปีที่แล้ว +1

      @@LahiruFernando Thank you so much. I have sent the email describing the issues

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว +1

      @@rajeenasuresh Sure let me reply..

  • @CaracolillosWTF
    @CaracolillosWTF 2 ปีที่แล้ว +1

    Could you divide this video into parts? I am not able to find what I need and I can't spend 1 hour watching the video (sorry)

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว +1

      Hi.. Thank you for the feedback. Of course I will try this.
      I will add chapter sections for the video to find the sections in the video easily..
      While I do that, is there anything that I can do to help locate what you are looking for?

    • @LahiruFernando
      @LahiruFernando  2 ปีที่แล้ว +1

      Hi again, I added the chapters to the video. Now when you hover the mouse on the video you can see the separate sections that covers different topics. What is covered is also mentioned in the description of the video with time slots. Hope this is helpful for you to locate what you are looking for.
      Let me know if you need additional help from my side. Happy to help anytime.
      Thanks again for the valuable feedback. Means a lot!
      Have a great day!

    • @CaracolillosWTF
      @CaracolillosWTF 2 ปีที่แล้ว +1

      @@LahiruFernando Thank you very very much!