Liked your Vertex AI series. Nicely presented with the exact level of detail. Hope you do some Feature Store, Metadata, and Matching Enginee videos as well.
Hey thanks for this instructional. It's great! I uploaded the dataset, but can't seem to use the data splits to train the model when using the API. When I fill-in the filter_splits in the TextDataset.run() method I receive an error that "Provided data item filter is not valid." I used "training", "validation" and "test" in the 3 filter_splits parameter as recorded in my json input file for the ml_use parameter. Any idea why? Thanks again!!
Hello, thank you for your message, I am glad that you enjoyed the tutorial! Could you point me to this TextDataset.run() method documentation? I can't find it on the official Google documentation page: cloud.google.com/python/docs/reference/aiplatform/1.22.0/google.cloud.aiplatform.TextDataset - but I am happy to think about potential solutions to your problems once you give me more details :) I have a feeling that in order to train your model you simply need to run the training job AutoMLTextTrainingJob.run() and provide your dataset, if you added a split into those 3 sets during the dataset creation this should be applied automatically.
@@August-m8l I didnt create a text dataset - I just used the pdfs as is. I created a GCS bucket with the pdfs, and i used gemini multimodal to process the text within the files. Hope that helps!
The file is in the linked GitHub repository: github.com/rafaello9472/c4ds/blob/main/Create%20text%20dataset%20in%20Vertex%20AI/input_file_sentiment_analysis.jsonl
Liked your Vertex AI series. Nicely presented with the exact level of detail. Hope you do some Feature Store, Metadata, and Matching Enginee videos as well.
Thanks for the suggestions, I might take a look into that!
Hey thanks for this instructional. It's great! I uploaded the dataset, but can't seem to use the data splits to train the model when using the API. When I fill-in the filter_splits in the TextDataset.run() method I receive an error that "Provided data item filter is not valid." I used "training", "validation" and "test" in the 3 filter_splits parameter as recorded in my json input file for the ml_use parameter. Any idea why? Thanks again!!
Hello, thank you for your message, I am glad that you enjoyed the tutorial! Could you point me to this TextDataset.run() method documentation?
I can't find it on the official Google documentation page: cloud.google.com/python/docs/reference/aiplatform/1.22.0/google.cloud.aiplatform.TextDataset - but I am happy to think about potential solutions to your problems once you give me more details :)
I have a feeling that in order to train your model you simply need to run the training job AutoMLTextTrainingJob.run() and provide your dataset, if you added a split into those 3 sets during the dataset creation this should be applied automatically.
I want to create a text data set, but all of my text is in pdf form. How would I go about doing that?
same, did you find a resolution on this?
@@August-m8l I didnt create a text dataset - I just used the pdfs as is. I created a GCS bucket with the pdfs, and i used gemini multimodal to process the text within the files. Hope that helps!
@@bullmilkers thanks!
Could you please also attach JSONL input file for for sentiment analysis
The file is in the linked GitHub repository: github.com/rafaello9472/c4ds/blob/main/Create%20text%20dataset%20in%20Vertex%20AI/input_file_sentiment_analysis.jsonl
how i need make to create my own dataset text classification?
Hey, I am not quite sure about what you want to know, could you explain your question a little bit more?