In terms of automation how this data transfer process can be automated like pub/sub --> dataflow job --> Big Query, the moment data arrive in the GCP pub/sub... pipeline job should trigger automatically and store data into Big Query
Hey thx for teaching, good explaining, I want to ask a stupid question > < Why not send data directly to Bigquery ?? ( only 1 step ) Send to PubSub => Dataflow => Bigquery ( 3 steps .... ) Thx !!!
This is a valid question for sure. A use case can be implemented in different ways but as a professional we always tend to provide an efficient and optimized solution. 1. Why not send data directly to Big query ?? ( only 1 step ) Ans: To answer, If I do this then its not a streaming service its just a batch processing or a migration for that I can use BQ data transfer, CS transfer. But for implementing any streaming use case we need a streaming service like Pub Sub or relevant third party service in GCP. 2. Send to Pub Sub => Dataflow => Big query Ans: I can answer this in two ways a. I can use Pub sub with subscription and Big query, where I can stream data from topic and push it to BQ table through subscription. This is one way of doing it. (or) b. I can use the same as shown in video, where my Pub Sub topic doesn't have subscription, It just publishes the message to BQ from topic via Dataflow's predefined template. My objective of this video is to show how to use Predefined templates provided by GCP in Cloud Dataflow to Stream pipeline from Pub Sub topic to BQ. The same use case I can implement it using other services like Dataproc, Data fusion or a simple python script. I hope it makes sense now. Please let me know for any questions😀
Hey bro. Thanks for the video. I have a ETL process running on VM, using docker and Kafka. And the data is getting stored in big query, as soon as I run the producer and consumer manually. I wanted to use cloud compose to automate this (like whenever I login to my VM the etl process starts automatically), but I couldn’t. Can you tell me if it’s possible to do this with dataflow? I am having trouble setting it up.
Thanks for your videos, I find them helpful. I could get the message published by a python script to pub/sub, updated to the data column in a big query table, by simply creating a subscription that writes to Big Query (to the same topic) without using Dataflow. Since pub sub is schema less, it is receiving whatever schema is published by the python script. My question is , is there a way to update a big query table using the same schema received in pub/sub?
Hi, Thanks for the great informative video. can you explain the flow if the data source is from a Rest API. Can we have a dataflow configured to extract from a Rest API to big query with dataflow without having cloud functions or Apache beam scripts involved? Thanks a lot in advance..
Your videos Awesome....we wish more complex Data ETL pipeline Projects from you please
Sure
Sure
In terms of automation how this data transfer process can be automated like pub/sub --> dataflow job --> Big Query, the moment data arrive in the GCP pub/sub... pipeline job should trigger automatically and store data into Big Query
Hey thx for teaching, good explaining, I want to ask a stupid question > <
Why not send data directly to Bigquery ?? ( only 1 step )
Send to PubSub => Dataflow => Bigquery ( 3 steps .... ) Thx !!!
This is a valid question for sure. A use case can be implemented in different ways but as a professional we always tend to provide an efficient and optimized solution.
1. Why not send data directly to Big query ?? ( only 1 step )
Ans: To answer, If I do this then its not a streaming service its just a batch processing or a migration for that I can use BQ data transfer, CS transfer. But for implementing any streaming use case we need a streaming service like Pub Sub or relevant third party service in GCP.
2. Send to Pub Sub => Dataflow => Big query
Ans: I can answer this in two ways
a. I can use Pub sub with subscription and Big query, where I can stream data from topic and push it to BQ table through subscription. This is one way of doing it.
(or)
b. I can use the same as shown in video, where my Pub Sub topic doesn't have subscription, It just publishes the message to BQ from topic via Dataflow's predefined template.
My objective of this video is to show how to use Predefined templates provided by GCP in Cloud Dataflow to Stream pipeline from Pub Sub topic to BQ.
The same use case I can implement it using other services like Dataproc, Data fusion or a simple python script.
I hope it makes sense now. Please let me know for any questions😀
Amazing content! Thanks
Glad it was helpful!
Awsome explanation really wanted to know this, If I migrate Control-M Workload automation tool to GCP. How will I connect control-m to pub/sub?
Thanks. Im not really sure
Hey bro. Thanks for the video. I have a ETL process running on VM, using docker and Kafka. And the data is getting stored in big query, as soon as I run the producer and consumer manually. I wanted to use cloud compose to automate this (like whenever I login to my VM the etl process starts automatically), but I couldn’t. Can you tell me if it’s possible to do this with dataflow? I am having trouble setting it up.
Please drop a mail to cloudaianalytics@gmail.com
Thanks for your videos, I find them helpful. I could get the message published by a python script to pub/sub, updated to the data column in a big query table, by simply creating a subscription that writes to Big Query (to the same topic) without using Dataflow. Since pub sub is schema less, it is receiving whatever schema is published by the python script. My question is , is there a way to update a big query table using the same schema received in pub/sub?
Yes you can and PubSub is not schema less. you can define schema in PubSub
Hi, Thanks for the great informative video. can you explain the flow if the data source is from a Rest API. Can we have a dataflow configured to extract from a Rest API to big query with dataflow without having cloud functions or Apache beam scripts involved? Thanks a lot in advance..
Sure.. Ill make a video on this