Centralised Data Sharing using Analytics Hub

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 ต.ค. 2024
  • Sharing data in a medium - large organisation has always been a big challenge.
    In today's talk I've described some of these data sharing challenges I've seen over the past years in different organisations, and how the new Google Cloud product Analytics Hub can potentially solve this in a much easier and user friendly way in the analytics community.
    01:50 - Data Sharing challenges
    04:59 - What is Analytics Hub
    08:48 - a quick demo
    16:25 - Centralise data sharing using Analytics Hub
    21:41 - Data Clean Room
    24:16 - The trend to remove ETL on data sharing
    26:41 - Summary
    Link to the slide: docs.google.co...

ความคิดเห็น • 16

  • @nishantmiglani7021
    @nishantmiglani7021 4 หลายเดือนก่อน +1

    Thanks a lot, Richard He, for creating this insightful video on Analytics Hub.

  • @rudytrisaputra2301
    @rudytrisaputra2301 11 หลายเดือนก่อน +1

    Thank you for sharing, Richard! I am truly interested in exploring the concept of a Data Clean Room, there is a desire to facilitate data sharing for transformation without the need for data movement processes. So the better ways to do Data Sharing with Analytics Hub is we need to create a new project to deploy the Analytics Hub, this project will centralize the sharing process?

    • @practicalgcp2780
      @practicalgcp2780  11 หลายเดือนก่อน

      Thank you for the comment! In my opinion, it’s a good model to create a centralised project to create exchanges where you may want to centralise who owns them and who can publish, and consistent naming conventions. So it doesn’t become a mess.

  • @WiktorJurek
    @WiktorJurek ปีที่แล้ว

    This is a pretty cool breakdown - where do you see the analytics hub configuration sitting? In the data generator project, or in a project of it's own?

    • @practicalgcp2780
      @practicalgcp2780  ปีที่แล้ว +1

      Thank you! I am not sure what is the best design, but in my option it would be better to keep the all the exchanges in a single separate project that is managed by the data platform team. That way you can apply governance and privacy control must easily, if you keep them in the source projects, it could still end up with each team doing whatever they like problem and it’s more difficult to monitor as well

  • @andrzejmaj3190
    @andrzejmaj3190 ปีที่แล้ว

    Thank you for that. One question - if I'm not mistaken, Analytics Hub won't assist when querying tables located across multiple regions, like the US and EU, without some form of replication. Is that correct?

    • @practicalgcp2780
      @practicalgcp2780  ปีที่แล้ว +1

      Hi there, no it won’t. But google just announced dataset replication in preview, check it out here cloud.google.com/bigquery/docs/data-replication

    • @practicalgcp2780
      @practicalgcp2780  ปีที่แล้ว

      Actually I think I may have misunderstood the purpose of data-replication. I think this is more created for a primary / replica disaster recovery sort of use case, or data migrations between regions. Not for the ability to query the data on a separate region which I think is what you are trying to achieve.

  • @mohdabbas7794
    @mohdabbas7794 8 หลายเดือนก่อน +1

    Sir Please make video on same with VPCSC

    • @practicalgcp2780
      @practicalgcp2780  6 หลายเดือนก่อน

      Can you give a bit more detail on what problem you try to solve with VPC SC?

    • @practicalgcp2780
      @practicalgcp2780  6 หลายเดือนก่อน +1

      There is something published by our team a while back you might find useful medium.com/@vmo2techteam/how-we-secured-our-data-on-the-cloud-341d4ac394b9

    • @mohdabbas7794
      @mohdabbas7794 6 หลายเดือนก่อน +1

      @@practicalgcp2780 The Problem statement is something like
      Let's say we have a VPCSC restricted environment. where Project A is a centralised data sharing project for Bigquery. In that case to establish the communication between centralised project A and other project that are consuming the sharing data and those project for them we are creating exchanges and listing to share the data. what should be the VPCSC Service Perimeter Policies. Example Ingress and egress policies.

    • @practicalgcp2780
      @practicalgcp2780  6 หลายเดือนก่อน

      it really depends on how you set things up in your org. Typically you may not want to have too many perimeters in the same org, because the overhead maybe too much, one single perimeter for the whole org is also a valid setup, so you can prevent risks from outside of the org, but within the org no whitelisting is required.
      I haven’t done this for analytics hub, but I believe it’s the same, you need to whitelist both ingress and egress rules as you are trying to get access to data from outside your org.

  • @harshchoudhary6069
    @harshchoudhary6069 6 หลายเดือนก่อน

    How we can share the authorized view using analytics hub?

    • @practicalgcp2780
      @practicalgcp2780  6 หลายเดือนก่อน

      It makes no difference using authorised views, as authorised view permissions are managed the same way as tables, different to normal views. However, using authorised views has some tradeoffs, a key one being losing metadata such as column descriptions which isn’t great for data consumers. But it does have the advantage if you don’t want to duplicate data models or increase latencies