Real time - Streaming Data from PubSub to BigQuery Using Dataflow in GCP

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 ธ.ค. 2024

ความคิดเห็น • 17

  • @naren06938
    @naren06938 หลายเดือนก่อน

    Your videos Awesome....we wish more complex Data ETL pipeline Projects from you please

  • @subhankarb100
    @subhankarb100 3 วันที่ผ่านมา

    In terms of automation how this data transfer process can be automated like pub/sub --> dataflow job --> Big Query, the moment data arrive in the GCP pub/sub... pipeline job should trigger automatically and store data into Big Query

  • @KyouKo-x7g
    @KyouKo-x7g ปีที่แล้ว +1

    Hey thx for teaching, good explaining, I want to ask a stupid question > <
    Why not send data directly to Bigquery ?? ( only 1 step )
    Send to PubSub => Dataflow => Bigquery ( 3 steps .... ) Thx !!!

    • @cloudaianalytics6242
      @cloudaianalytics6242  ปีที่แล้ว +3

      This is a valid question for sure. A use case can be implemented in different ways but as a professional we always tend to provide an efficient and optimized solution.
      1. Why not send data directly to Big query ?? ( only 1 step )
      Ans: To answer, If I do this then its not a streaming service its just a batch processing or a migration for that I can use BQ data transfer, CS transfer. But for implementing any streaming use case we need a streaming service like Pub Sub or relevant third party service in GCP.
      2. Send to Pub Sub => Dataflow => Big query
      Ans: I can answer this in two ways
      a. I can use Pub sub with subscription and Big query, where I can stream data from topic and push it to BQ table through subscription. This is one way of doing it.
      (or)
      b. I can use the same as shown in video, where my Pub Sub topic doesn't have subscription, It just publishes the message to BQ from topic via Dataflow's predefined template.
      My objective of this video is to show how to use Predefined templates provided by GCP in Cloud Dataflow to Stream pipeline from Pub Sub topic to BQ.
      The same use case I can implement it using other services like Dataproc, Data fusion or a simple python script.
      I hope it makes sense now. Please let me know for any questions😀

  • @riyanshigupta950
    @riyanshigupta950 7 หลายเดือนก่อน

    Amazing content! Thanks

  • @ainvondegraff5233
    @ainvondegraff5233 8 หลายเดือนก่อน

    Awsome explanation really wanted to know this, If I migrate Control-M Workload automation tool to GCP. How will I connect control-m to pub/sub?

  • @Rajdeep6452
    @Rajdeep6452 9 หลายเดือนก่อน

    Hey bro. Thanks for the video. I have a ETL process running on VM, using docker and Kafka. And the data is getting stored in big query, as soon as I run the producer and consumer manually. I wanted to use cloud compose to automate this (like whenever I login to my VM the etl process starts automatically), but I couldn’t. Can you tell me if it’s possible to do this with dataflow? I am having trouble setting it up.

    • @cloudaianalytics6242
      @cloudaianalytics6242  หลายเดือนก่อน

      Please drop a mail to cloudaianalytics@gmail.com

  • @ushasribhogaraju8895
    @ushasribhogaraju8895 8 หลายเดือนก่อน

    Thanks for your videos, I find them helpful. I could get the message published by a python script to pub/sub, updated to the data column in a big query table, by simply creating a subscription that writes to Big Query (to the same topic) without using Dataflow. Since pub sub is schema less, it is receiving whatever schema is published by the python script. My question is , is there a way to update a big query table using the same schema received in pub/sub?

    • @cloudaianalytics6242
      @cloudaianalytics6242  หลายเดือนก่อน

      Yes you can and PubSub is not schema less. you can define schema in PubSub

  • @zzzmd11
    @zzzmd11 8 หลายเดือนก่อน

    Hi, Thanks for the great informative video. can you explain the flow if the data source is from a Rest API. Can we have a dataflow configured to extract from a Rest API to big query with dataflow without having cloud functions or Apache beam scripts involved? Thanks a lot in advance..