Data & Cloud by GT
Data & Cloud by GT
  • 57
  • 16 108
What is a data architecture blueprint?
What is a data architecture blueprint?
What are the key design considerations?
What are common mistakes to avoid?
Watch this video to get answers to all these questions.
Topmate Link :
topmate.io/gauravthalpati
My book on lakehouse architecture:
www.oreilly.com/library/view/practical-lakehouse-architecture/9781098153007/
มุมมอง: 7

วีดีโอ

Databricks Compute Options
มุมมอง 1014 ชั่วโมงที่ผ่านมา
A quick primer on various compute options that Databricks provides and when to use these for your Data Engineering use cases. For any further questions, you can connect with me on topmate.io/gauravthalpati
AWS Compute Services for Lakehouse
มุมมอง 252 หลายเดือนก่อน
A point of view (PoV) on selecting various AWS compute services while implementing a data lakehouse architecture. In case you want to discuss further, you can book a session with me topmate.io/gauravthalpati
Azure Compute Services for Lakehouse
มุมมอง 362 หลายเดือนก่อน
Are you trying to select the right compute engine for your Azure-based lakehouse platform. This lecture will help you understand some of the key considerations. In case you want to discuss further, you can book a session with me : topmate.io/gauravthalpati
Storage & Compute | Why we should decouple them
มุมมอง 402 หลายเดือนก่อน
Why modern data platforms have loosely coupled storage and compute layers. What are the advantages does this provide? In case you want to discuss further, you can book a session with me : topmate.io/gauravthalpati
Amazon Athena for querying and updating Apache Iceberg tables
มุมมอง 1909 หลายเดือนก่อน
Quick demo to showcase how to query, update, and perform time travel using Amazon Athena with Apache Iceberg tables In case you want to discuss further, you can book a session with me : topmate.io/gauravthalpati
Create Apache Iceberg Tables using AWS Glue
มุมมอง 2.7K10 หลายเดือนก่อน
This demo will show you how to use AWS Glue ETL to create Apache Iceberg files and tables. In case you want to discuss further, you can book a session with me : topmate.io/gauravthalpati
AWS DMS S3 Catalog Support
มุมมอง 69ปีที่แล้ว
New Feature : AWS DMS now also generates an AWS Glue Data Catalog when migrating to Amazon S3. You can now migrate the data and immediately start running queries in Athena to explore and analyse this data Reference : aws.amazon.com/about-aws/whats-new/2023/03/aws-database-migration-service-aws-glue-data-catalog-amazon-s3/
Understanding Lakehouse
มุมมอง 98ปีที่แล้ว
In this lecture, we discuss lakehouse & its advantages. For a more detailed video, you can also have a look at th-cam.com/video/xLuBniDqpBU/w-d-xo.html In case you want to discuss further, you can book a session with me : topmate.io/gauravthalpati
Design Document | Part 12 -- The Conclusion
มุมมอง 36ปีที่แล้ว
Welcome to this lecture series on Design Document Guidelines for Data & Analytics Platforms. This lecture series is based on my medium blog post : medium.com/@gauravthalpati/design-document-for-data-platforms-932df482692d
Design Document | Part 7 -- Security & Governance Guidelines
มุมมอง 110ปีที่แล้ว
Welcome to this lecture series on Design Document Guidelines for Data & Analytics Platforms. This lecture series is based on my medium blog post : medium.com/@gauravthalpati/design-document-for-data-platforms-932df482692d
Design Document | Part 11 -- Appendix
มุมมอง 23ปีที่แล้ว
Welcome to this lecture series on Design Document Guidelines for Data & Analytics Platforms. This lecture series is based on my medium blog post : medium.com/@gauravthalpati/design-document-for-data-platforms-932df482692d
Design Document | Part 10 -- References
มุมมอง 9ปีที่แล้ว
Welcome to this lecture series on Design Document Guidelines for Data & Analytics Platforms. This lecture series is based on my medium blog post : medium.com/@gauravthalpati/design-document-for-data-platforms-932df482692d
Design Document | Part 9 -- Testing Guidelines
มุมมอง 62ปีที่แล้ว
Welcome to this lecture series on Design Document Guidelines for Data & Analytics Platforms. This lecture series is based on my medium blog post : medium.com/@gauravthalpati/design-document-for-data-platforms-932df482692d
Design Document | Part 8 -- Non Functional Requirements.
มุมมอง 33ปีที่แล้ว
Welcome to this lecture series on Design Document Guidelines for Data & Analytics Platforms. This lecture series is based on my medium blog post : medium.com/@gauravthalpati/design-document-for-data-platforms-932df482692d
Design Document | Part 6 -- Tech Solution
มุมมอง 61ปีที่แล้ว
Design Document | Part 6 Tech Solution
Design Document | Part 5 -- Design Principles
มุมมอง 33ปีที่แล้ว
Design Document | Part 5 Design Principles
Design Document | Part 4 -- Design Considerations
มุมมอง 91ปีที่แล้ว
Design Document | Part 4 Design Considerations
Design Document | Part 2 - Background & Overview
มุมมอง 79ปีที่แล้ว
Design Document | Part 2 - Background & Overview
Design Document | Part 3 -- Requirements
มุมมอง 62ปีที่แล้ว
Design Document | Part 3 Requirements
Design Document | Part 1 - The Beginning
มุมมอง 426ปีที่แล้ว
Design Document | Part 1 - The Beginning
Data Transformations - ETL vs ELT Approach Selection
มุมมอง 119ปีที่แล้ว
Data Transformations - ETL vs ELT Approach Selection
Data Storage Options
มุมมอง 81ปีที่แล้ว
Data Storage Options
Understanding Source Systems
มุมมอง 70ปีที่แล้ว
Understanding Source Systems
Data Analytics Platform | Building Blocks
มุมมอง 852 ปีที่แล้ว
Data Analytics Platform | Building Blocks
GCP Data & Analytics | Overview in 8 minutes.
มุมมอง 732 ปีที่แล้ว
GCP Data & Analytics | Overview in 8 minutes.
Why GCP?
มุมมอง 1352 ปีที่แล้ว
Why GCP?
Data Architecture Series | How to design the data access controls in data lake
มุมมอง 1042 ปีที่แล้ว
Data Architecture Series | How to design the data access controls in data lake
Data Architecture Series | How to ingest data from On-premise DB to Cloud
มุมมอง 1302 ปีที่แล้ว
Data Architecture Series | How to ingest data from On-premise DB to Cloud
Modern Data Fundamentals | Lecture 05 | Cloud-Native vs Managed vs External Services
มุมมอง 942 ปีที่แล้ว
Modern Data Fundamentals | Lecture 05 | Cloud-Native vs Managed vs External Services

ความคิดเห็น

  • @abhirajbhadane
    @abhirajbhadane วันที่ผ่านมา

    Nice overview GT👍

  • @VedantBopardikar
    @VedantBopardikar 12 วันที่ผ่านมา

    How can we enable schema evolution glue scripts?

  • @Learn2Share786
    @Learn2Share786 2 หลายเดือนก่อน

    Thank you for great explanation. Looking forward to more in series.

  • @user-ni7mu5hi7g
    @user-ni7mu5hi7g 2 หลายเดือนก่อน

    thank u

  • @viewermm1588
    @viewermm1588 2 หลายเดือนก่อน

    Hey I have created a very simple glue job to convert json file to iceberg, I have used a folderfrom s3 ( it does have multiple 1folder within ) as a source with json data format , and target to s3 , format apache iceberg, compression type = GZIP, selected a database , entered a table name for glue data catalog, but jobs fails, anything specific I should be looking or changing ?

    • @gauravthalpati
      @gauravthalpati 2 หลายเดือนก่อน

      What's the error you are getting? Is it related to access permission for creating tables or anything else?

    • @viewermm1588
      @viewermm1588 2 หลายเดือนก่อน

      @@gauravthalpati doesnt seems to be permission issue as I have used glue to run csv and other format, I believe this is specific to using Json file in source s3 bucket and covert the file to iceberg and the error is " unable to read cluserID from var /job/.....json , unable to read clusterif from local host , no custom resources configured for spark.driver

    • @gauravthalpati
      @gauravthalpati 2 หลายเดือนก่อน

      @@viewermm1588 ok, may be I'll try it out when I get some time.

  • @abhilashabaphna252
    @abhilashabaphna252 3 หลายเดือนก่อน

    Which iam role I need to create. what roles I need to make

  • @singhvineetkumar98
    @singhvineetkumar98 4 หลายเดือนก่อน

  • @NinjaSports007
    @NinjaSports007 4 หลายเดือนก่อน

    Nicely explained, but how to create iceberg table with glue with partition keys

  • @CHiRaStar1
    @CHiRaStar1 7 หลายเดือนก่อน

    I have scheduled a meeting but it is too late it seems to get your slot. I am getting lake formation issue on ‘default service AwSGlue status code 400’

  • @CHiRaStar1
    @CHiRaStar1 7 หลายเดือนก่อน

    Great explanation 👌. Is there any chance if I can get an opportunity to talk with you.

    • @gauravthalpati
      @gauravthalpati 7 หลายเดือนก่อน

      Sure, you can book a call at - topmate.io/gauravthalpati

  • @soumyakantarath4551
    @soumyakantarath4551 8 หลายเดือนก่อน

    Hi Can I drop you some design questions and if you could help me to solve it? if yes please let me know the medium where I can share.

    • @gauravthalpati
      @gauravthalpati 8 หลายเดือนก่อน

      You can book my time on topmate. Here is the link: topmate.io/gauravthalpati

  • @soumyakantarath4551
    @soumyakantarath4551 8 หลายเดือนก่อน

    Awesome - this is exactly I was looking for.😊

  • @gauravthalpati
    @gauravthalpati 10 หลายเดือนก่อน

    For more details, you can read this post - gauravthalpati.substack.com/p/vol-11-what-are-open-table-formats

  • @garydiaz8886
    @garydiaz8886 10 หลายเดือนก่อน

    this feature does support partitioned tables in s3?

  • @ramadevi574swarna5
    @ramadevi574swarna5 ปีที่แล้ว

    Nice explanation sir

  • @yuvip153
    @yuvip153 ปีที่แล้ว

    Nicely explained, thank you 😊👍

  • @iamuditmittal
    @iamuditmittal ปีที่แล้ว

    Nice.

  • @IndranilChampati
    @IndranilChampati ปีที่แล้ว

    Very nice Thank you ♾️

    • @gauravthalpati
      @gauravthalpati ปีที่แล้ว

      Thanks for your feedback, glad you liked it!

  • @gowthamsagarkurapati9388
    @gowthamsagarkurapati9388 ปีที่แล้ว

    Awesome explanation!

  • @ezeefied
    @ezeefied ปีที่แล้ว

    its such a critical document yet often teams do not pay much importance to keep it up to date. Well said that consumers are different and so everyone has different expectations .

    • @gauravthalpati
      @gauravthalpati ปีที่แล้ว

      Thanks for your input Saurabh. Yes - maintaining the design documents is very important yet neglected most of the time!

  • @jennamason4323
    @jennamason4323 ปีที่แล้ว

    ✨ ρɾσɱσʂɱ

  • @ezeefied
    @ezeefied ปีที่แล้ว

    Well explained!

  • @jyothipriyanka5000
    @jyothipriyanka5000 ปีที่แล้ว

    Simple and easy to understand...thank you

  • @niteshalaagh4107
    @niteshalaagh4107 ปีที่แล้ว

    insightful

  • @ezeefied
    @ezeefied 2 ปีที่แล้ว

    This is really good 👍🏽 explanation with great diagrams

  • @ezeefied
    @ezeefied 2 ปีที่แล้ว

    Very helpful video for anyone picking up GCP data analytics

  • @eliasbisset8406
    @eliasbisset8406 2 ปีที่แล้ว

    🌟 𝙥𝙧𝙤𝙢𝙤𝙨𝙢

  • @ezeefied
    @ezeefied 2 ปีที่แล้ว

    Amazing presentation , can see all the hard work done :) More power to you 🙌🏽

  • @ezeefied
    @ezeefied 2 ปีที่แล้ว

    Very well explained ! I am trying to learn GCP at the moment and the migration strategy that you explained for different table sizes is something I could relate to. Thank you for posting !

    • @gauravthalpati
      @gauravthalpati 2 ปีที่แล้ว

      Thanks Saurabh! Yes, this migration strategy can be applied to any cloud data movement - AWS/GCP/Azure. Also similar approach can be used for any data migrations - even for on-prem to on-prem with just tech modernization. BTW, I've also started learning GCP & enjoying it!

  • @richard8199
    @richard8199 2 ปีที่แล้ว

    Hi Gaurav, Thanks for these lectures. I am having one query. Please help me out in resolving this. I am currently moving data from onpremise db to Snowflake via Datastage by implementing CDC Type 2. So, here CDC job is working fine but facing issues with Upserts job where it is supposed to "Update Then Insert" the records. In Snowflake Connector stage (in datastage), there is no write mode like "Update Then Insert". Could you please tell me how to resolve this.

    • @gauravthalpati
      @gauravthalpati 2 ปีที่แล้ว

      @@richard8199 Thanks for your comment. It's been a decade since I have used Datastage! I am not very sure about the connector that you have mentioned. But there are other ways that you can try. Try using ELT method - Load the data first in Snowflake & then do the CDC in Snowflake You can also try using Snowflake Streams for CDC if that helps.

  • @ezeefied
    @ezeefied 2 ปีที่แล้ว

    great comparative analysis 👌

  • @pranavgodbole5688
    @pranavgodbole5688 2 ปีที่แล้ว

    Good content!

  • @rohankarnataki
    @rohankarnataki 2 ปีที่แล้ว

    Very insightful !

  • @ezeefied
    @ezeefied 2 ปีที่แล้ว

    Another great video . Describes not just the processes but also gives a good overview of various tools and uses cases when you use those tools 👍🏽

  • @ezeefied
    @ezeefied 2 ปีที่แล้ว

    Interesting ! Looking forward to this one

    • @gauravthalpati
      @gauravthalpati 2 ปีที่แล้ว

      Sure, will be adding more in the next few weeks.Do let me know if anything specific as well!

  • @ezeefied
    @ezeefied 2 ปีที่แล้ว

    Great effort Gaurav with the new series ! Let me add some unstructured data in the form of this comment here :)

    • @gauravthalpati
      @gauravthalpati 2 ปีที่แล้ว

      Thanks Saurabh ! ....haha ...yes pls...need more comments here .....analyzing comments data to extract sentiment analysis is one of the most popular use cases :)

  • @ezeefied
    @ezeefied 2 ปีที่แล้ว

    Great video Gaurav ! I was also planning to explore more on snowflake in future

  • @gauravthalpati
    @gauravthalpati 2 ปีที่แล้ว

    For more details, you can read my blog on Medium on below link medium.com/@gauravthalpati/10-key-considerations-for-cloud-data-lakes-835bb364aab9

  • @Buzzingfact
    @Buzzingfact 2 ปีที่แล้ว

    Nice video, What should we learn along with (Snowflake,SQL,python) for better career growth...any suggestions?

    • @gauravthalpati
      @gauravthalpati 2 ปีที่แล้ว

      Thanks! If you are starting your journey as data engineer then SQL & Python In case you already have experience in these then you can learn one of the Cloud platforms - there is a lot of content around AWS Data Services to learn from In addition - Snowflake & Databricks will be a huge plus. Cloud + Data + Al is the current trend for better carrier growth in the data world :)