We've edited the video to remove the part where you paste your GCP Service Account directly into a Kestra Workflow. This workflow was only intended to be a helper workflow to add everything to the KV Store for you and not to be committed to GitHub. Instead, you can go to Namespaces -> zoomcamp -> KV Store and add everything there from the UI. This will not get committed and pushed to git, only the workflows are.
I would say it's inaccurate to say that the data is loaded into the external table - as an external table is actually where the table metadata is stored in BigQuery storage but the data itself is external to BigQuery. It reads from GCS in this case, which is far more efficient than loading for just temporary purposes.
4:47, what should be the key in here? and should we past the whole json file for value? and what do you mean by "get the key correct, so it works correct?"
Make sure you copy and paste the entire content of the JSON file. Things to watch is making sure you don’t accidentally add some extra white space or new lines at the start
@@alexandreterrand-jeanne117 Check 5:23 where he shows the KV-Store, or look how it is used later in the taxi data flow. He uses GCP_CREDS. You can use whatever, just make sure that it matches whatever you put for serviceAccount in the plugin config of your flow
I don't know what is wrong but I am unable to do the setup on gcp through Kestra. The error given to me is that my service account does not have permissions to create the bucket. The service account has the roles Storage Admin, BigQuery Admin, I thought it was something wrong with the key but I typed it directly into the key value and still had the same issue. Switched the order on the flow to see if it would create the dataset but didn't help. The names for the objects are globally unique and I also gave the storage object admin role to the service account, didn't help either. Used gcloud cli to authenticate to the same service account and I was able to create the bucket from there so I guess the problem isn't the service account. Any clues what it might be?
hey WIll, apologies this is such a basic question but I've got myself into a muddle and possibly missed this in the other videos. I'm using a GCP vm and have been able to extract & load data into postgres just to test the flow between the two, but I haven't loaded all the data yet. my question is, do I need to import all the data from csv into postgres (as per 2.2.3), or should I just pickup the process from here and load data straight into big query? Thanks again for all your help, I'm learning this from scratch and it's been a really good experience so far :)
Thanks a lot for not spending hours in the videos for showing us how to write code 🙃 Very informative and great examples. This creates a good image for Kaestra.
We've edited the video to remove the part where you paste your GCP Service Account directly into a Kestra Workflow. This workflow was only intended to be a helper workflow to add everything to the KV Store for you and not to be committed to GitHub. Instead, you can go to Namespaces -> zoomcamp -> KV Store and add everything there from the UI. This will not get committed and pushed to git, only the workflows are.
Thank God, I almost freak out. You guys got me good right there 😌
protip: make sure when setting name for the dataset that you use underscores not hypens otherwise you get invalid dataset ID
I would say it's inaccurate to say that the data is loaded into the external table - as an external table is actually where the table metadata is stored in BigQuery storage but the data itself is external to BigQuery. It reads from GCS in this case, which is far more efficient than loading for just temporary purposes.
Good spot! Add it to the community notes on GitHub
Thanks for the module !
4:47, what should be the key in here? and should we past the whole json file for value? and what do you mean by "get the key correct, so it works correct?"
Make sure you copy and paste the entire content of the JSON file. Things to watch is making sure you don’t accidentally add some extra white space or new lines at the start
@@kestra-io It does not answer the question. Ok for the value, but what key should you add ?
@@alexandreterrand-jeanne117 GCP_CREDS
@@alexandreterrand-jeanne117 Check 5:23 where he shows the KV-Store, or look how it is used later in the taxi data flow. He uses GCP_CREDS. You can use whatever, just make sure that it matches whatever you put for serviceAccount in the plugin config of your flow
Very helpful, thank you
is there a video for ETL Pipelines in kestra using azure services? because I want to understand etl pipelines using azure services in kestra.
We’ve got it on the backlog!
I don't know what is wrong but I am unable to do the setup on gcp through Kestra. The error given to me is that my service account does not have permissions to create the bucket.
The service account has the roles Storage Admin, BigQuery Admin, I thought it was something wrong with the key but I typed it directly into the key value and still had the same issue.
Switched the order on the flow to see if it would create the dataset but didn't help. The names for the objects are globally unique and I also gave the storage object admin role to the service account, didn't help either.
Used gcloud cli to authenticate to the same service account and I was able to create the bucket from there so I guess the problem isn't the service account.
Any clues what it might be?
Have you correctly added the GCP project ID to kestra. The project ID and project name can look similar but are not the same so do check!
@@kestra-io Yeap that was the issue, oh well.... 😢
Thanks for the help you've been replying fairly quickly here, I appreciate it!
@@kestra-io This solved my problem, thanks for the pointer.
i am facing the same kind issue, kestra indicating success at all steps, but cloud find any change in gcp, kinldy help
hey WIll, apologies this is such a basic question but I've got myself into a muddle and possibly missed this in the other videos. I'm using a GCP vm and have been able to extract & load data into postgres just to test the flow between the two, but I haven't loaded all the data yet.
my question is, do I need to import all the data from csv into postgres (as per 2.2.3), or should I just pickup the process from here and load data straight into big query?
Thanks again for all your help, I'm learning this from scratch and it's been a really good experience so far :)
Ignore me, I have just continued to work through the 2.2.3 videos and imported all data :)
when I paste my GCP_CREDS I got an error message: Illegal flow yaml: Duplicate field 'type' at [Source: (StringReader); line: 12
Try pasting it directly in the KV Store from the UI. Namespaces -> zoomcamp -> KV Store
Thanks a lot for not spending hours in the videos for showing us how to write code 🙃
Very informative and great examples. This creates a good image for Kaestra.