Data Warehousing With BigQuery: Best Practices (Cloud Next '19)

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.ค. 2024
  • Take an in-depth look at modern data warehousing using BigQuery and how to operate your data warehouse in the cloud. During this session, we'll give lessons learned and best practices from prior implementations to give you the playbook for implementing your own modern data warehouse.
    "BigQuery ML → bit.ly/2Khkqoq
    BigQuery for data warehouse practioners → bit.ly/2TWh6P9"
    Watch more:
    Next '19 Data Analytics Sessions here → bit.ly/Next19DataAnalytics
    Next ‘19 All Sessions playlist → bit.ly/Next19AllSessions
    Subscribe to the GCP Channel → bit.ly/GCloudPlatform
    Speaker(s): Ryan McDowell, Alban Perillat-Merceroz
    Session ID: DA307
    product:BigQuery; fullname:Ryan McDowell; event: Google Cloud Next 2019; re_ty: Publish; product: Cloud - Data Analytics - BigQuery; fullname: Ryan McDowell;
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 21

  • @kunfang2457
    @kunfang2457 3 ปีที่แล้ว +4

    the best 'best practice' sharing session i ve ever had

  • @anandakumarsanthinathan4740
    @anandakumarsanthinathan4740 2 ปีที่แล้ว +1

    Beautifully explained. Many thanks. Now, why wouldn't anybody want to move to the sunny side of France and land a job at Teads !! Both of the presenters did an excellent job.

  • @3rdinnings326
    @3rdinnings326 ปีที่แล้ว +1

    Excellent session!

  • @TheMitali_
    @TheMitali_ 4 ปีที่แล้ว +2

    How to perform sum of columns value in big query. And no of columns are notfixed , at runtime we need to decide no of columns need to consider for sum. Depends on user inputs

  • @ThisIsAli_Off
    @ThisIsAli_Off ปีที่แล้ว

    Very useful!

  • @jennwng
    @jennwng 9 หลายเดือนก่อน +3

    At 13:36, why do we need to load into GCS first for batch loads? Can't we use Dafaflow/BigQuery BatchLoad / Data Fusion directly from GoldenGate?

  • @krishnabg1350
    @krishnabg1350 4 ปีที่แล้ว +5

    Can you share slides?

  • @LuckyHongTJ
    @LuckyHongTJ 8 หลายเดือนก่อน +2

    At 13:35, what is the benefit of loading data from Oracle into GCS first and then BigQuery (i.e. won't directly loading into BigQuery from Oracle be faster)? I know GCS can serve as a staging area and we get another copy of the data for fault tolerance, but is there any other benefit? Thanks :)

  • @amieewright4417
    @amieewright4417 3 ปีที่แล้ว

    H thanks u so much love you amiee Wright and Bella daws Xxxxx

  • @mzamanmintu3694
    @mzamanmintu3694 ปีที่แล้ว

    Congratulations

  • @jennwng
    @jennwng 9 หลายเดือนก่อน +2

    At 11:58 why DataFlow / Data Fusion can speed up pipelines? Like, the essence is either to do the transformation in BigQuery or in DataFusion. Even if the query is complex, won't it still be faster to do the transformation directly in BigQuery, rather than connecting to DataFusion and transforming data there?

    • @vishnureddys4801
      @vishnureddys4801 3 หลายเดือนก่อน

      It is because they can bill you in both Services, for using BigQuery and also dataflow.

  • @Thiago280690
    @Thiago280690 ปีที่แล้ว

    6:30 do nada aparece uma nota de dez conto hahaha adorei

  • @Vinch157
    @Vinch157 5 ปีที่แล้ว +15

    Hello, will you also share the slides? Thank you

  • @JOPINC
    @JOPINC 3 ปีที่แล้ว

    Dedicado a JMB!!!! :)

  • @LouisChiaki
    @LouisChiaki 3 ปีที่แล้ว +3

    How is this different from Spark?

    • @jean4j_
      @jean4j_ 2 ปีที่แล้ว

      Well it's auto-managed and much simpler to work with.
      That's just SQL. You don't need to fine-tune the configuration like you would for a Spark job I think.

  • @sunilpipara
    @sunilpipara 2 ปีที่แล้ว +2

    Kafka to S3 and S3 to Cassandra via Spark is wrong approach.

    • @anandakumarsanthinathan4740
      @anandakumarsanthinathan4740 2 ปีที่แล้ว +1

      Yes, @Sunil Jain. I feel the same too. The source application would directly write it out to S3 for batch processing. Kafka would be required only for the streaming jobs.

  • @GreenPower4ever
    @GreenPower4ever 3 ปีที่แล้ว

    P