Dan Young
Dan Young
  • 17
  • 4 523
Looker to Omni Analytics - part 3
In this video, I demonstrate at a high level how to convert LookML content into Omni Analytics using a robust and scalable architecture in Google Cloud Platform (GCP). The solution leverages Cloud Run for serverless containerized application hosting, Cloud Build for continuous integration and deployment, and Artifact Registry for managing container images. Infrastructure-as-code is handled with Terraform, streamlining the deployment process while ensuring reliability. At the heart of this setup is a Clojure application, efficiently deployed and executed via Cloud Run, showcasing a modern and cost-effective cloud-native approach to analytics conversion. This video covers the architecture, tools, and workflow in detail to inspire cloud developers, data engineers, and analytics enthusiasts working on similar migrations or transformations.
มุมมอง: 10

วีดีโอ

Looker to Omni Analytics - part 2
มุมมอง 54 ชั่วโมงที่ผ่านมา
Just an update, there's no drones of interest here, move along......
Migrating LookML to Omni Analytics - part 1
มุมมอง 28วันที่ผ่านมา
In this video, I will demonstrate how I leverage context-free grammars in Clojure to effectively parse LookML code and transform it into a format suitable for Omni Analytics. LookML, the modeling language for Looker, can be complex, and parsing it accurately is crucial for data analysis and visualization. By utilizing Clojure's powerful parsing libraries, particularly Instaparse, I can define g...
Container overrides with Google Cloud Run Jobs/Workflows
มุมมอง 1163 หลายเดือนก่อน
In this video I demonstrate how to use container overrides with Cloud Workflows and Cloud Run Jobs. Google Cloud Run Jobs is a fully managed service on Google Cloud that enables the execution of containerized batch jobs without the need to manage infrastructure. Unlike Cloud Run services, which handle HTTP-based requests, Cloud Run Jobs are designed for workloads that run to completion, such as...
Using variables in Google Cloud Dataform for dynamic SQL workflows
มุมมอง 7277 หลายเดือนก่อน
In this video I demonstrate how to use variables in Dataform and orchestrate running your transformation via Google Cloud Workflows. In Google Cloud Dataform, you can pass variables to your Dataform projects via compilation overrides and use them within your SQL and JavaScript transformations. This feature enables dynamic behavior in your data pipelines, allowing you to parameterize aspects of ...
Using Google Data Loss Prevention (DLP) to detect and redact data in log files
มุมมอง 408 หลายเดือนก่อน
In this video I demonstrate on using Google Data Loss Prevention (DLP) to detect email addresses in log files. DLP allows for detection and redaction of sensitive personally identifiable information (PII) and offers several advantages: Accuracy: Google DLP leverages advanced machine learning algorithms to accurately identify and classify sensitive data, including PII. This helps ensure that no ...
Converting LookML to OmniML
มุมมอง 9310 หลายเดือนก่อน
In this video I go over a little project I've been working on to convert LookML to OmniML. I reach into the tool-bag for a trusted tool of my Swiss army knife, Context-free grammars (CFGs). CFGs offer a structured and powerful method for parsing content, allowing for the analysis of complex linguistic structures in natural language or syntactic patterns in programming languages. Unlike regular ...
Using Google Cloud Dataform for SQL workflows
มุมมอง 1.6K11 หลายเดือนก่อน
In this video I cover in more details about how to use Google Cloud Dataform for orchestration SQL workflows. Google Cloud Workflow, Cloud Run, and Dataform seamlessly come together to empower organizations in orchestrating scalable and efficient data pipelines within the Google Cloud Platform (GCP) ecosystem. This powerful combination enables the automation of complex workflows, the deployment...
How to leverage Cloud Workflows, Cloud Run Jobs, and Dataform for data pipeline orchestration
มุมมอง 38211 หลายเดือนก่อน
In this video I cover at a high level how we leverage Google Cloud Workflow, Cloud Run, and Dataform for orchestrating scalable and efficient data pipelines within the Google Cloud Platform (GCP) ecosystem. This powerful combination enables the automation of complex workflows, the deployment of containerized applications, and the management of data transformations for enhanced data analytics an...
Google Duet AI for SQL analysis
มุมมอง 60ปีที่แล้ว
In this video we roll our sleeves up and write some SQL without knowing SQL! AI is here to stay and getting more powerful everyday, let's lean into this advancement and leverage its power to make us smarter and more productive. #sql #duet #ai
Using Numaflow SideInputs with Duckdb
มุมมอง 178ปีที่แล้ว
Numaflow is a powerful Kubernetes-native tool that allows you to implement a streaming data processing pipeline quickly. In this demo I'll illustrate the use of Duckdb and Numaflow to enrichment streaming data using an external source. #numaflow #duckdb #googlecloud #kubernetes #streaming #dataengineers
Numaflow stream processing
มุมมอง 279ปีที่แล้ว
Demo using numaflow to calculate various aspects like Map, Reduce and using a fixed window to calculate an average
Argo Workflows: Orchestrating Salesforce Bulk API Exports
มุมมอง 111ปีที่แล้ว
Short follow up demonstration on using Argo Workflows to orchestrate the submission Bulk API exports via SOQL queries
Use Argo Workflows for a robust solution to orchestrate Salesforce Bulk Extracts
มุมมอง 53ปีที่แล้ว
Argo Workflows provides a robust and efficient solution for orchestrating data pipelines, particularly when integrating with the Salesforce Bulk Extract API. Leveraging Argo Workflows in this context enables the seamless coordination of complex data extraction processes from Salesforce. With Argo's declarative YAML syntax, users can define clear and reproducible workflows, specifying the order ...
Google Data Loss Prevention (DLP) and Clojurescript
มุมมอง 52ปีที่แล้ว
Google's Detection and Loss Prevention API offers a robust solution for scanning and alerting on sensitive and personally identifiable information (PII). Leveraging advanced machine learning algorithms, the API can analyze vast amounts of data to identify and categorize sensitive content such as credit card numbers, social security numbers, and other personal information. Through its sophistica...
Argo Workflows - Zendesk
มุมมอง 1482 ปีที่แล้ว
Argo Workflows - Zendesk
Using Argo Workflows orchestrating parallel jobs and tasks on Kubernetes
มุมมอง 6512 ปีที่แล้ว
Using Argo Workflows orchestrating parallel jobs and tasks on Kubernetes

ความคิดเห็น

  • @TheMatheus180993
    @TheMatheus180993 หลายเดือนก่อน

    Hi Dan, great video! I want to create repositories by domain, where each domain contains data layers divided into different BigQuery projects. How can I use variables so that when the project is in dev, Dataform creates the tables in dev, but when the release is in prod, it uses the database declaration from each SQLX file's config?

    • @danoyoung
      @danoyoung หลายเดือนก่อน

      To clarify, are you wanting to have "dev" go into a dev dataset and "prod" to go into prod dataset but all within the same GCP project?

    • @TheMatheus180993
      @TheMatheus180993 หลายเดือนก่อน

      @@danoyoung Not exactly, the data layers are in 4 different projects (raw, bronze, silver and gold) and the dataform repositories are per domain (sales, marketing, ...). The problem here is that each dataform need to save the tables in 4 different projects, so the release overrides does not work, we have to set manually the databse inside the config. I have tried to create a variable for that: vars: df_environment: "dev" database_dev: dev_database And then selecting the databse base on the env: database: "{{ dataform.projectConfig.vars.df_environment == 'production' ? 'prodmanual_database' : default_databse }}", this last part, the condition inside the config{} doesnt work

  • @udayverma6120
    @udayverma6120 3 หลายเดือนก่อน

    where can i find the code for this

  • @Satenc0
    @Satenc0 7 หลายเดือนก่อน

    Nice, I had no idea you could do it that way with workflows, no other tutorial shows how to do it that way, here are some questions: 1) I notice at 4:18 you mention that the tag is what targets that specific actions to execute, but I guess its also possible to specify the action names without using tags? 2) Also at 0:20 I see the syntax to access the vars with dataform.projectConfig.vars, wonder if you know how to access the vars in js block? I have not figured out how to access it from there (syntax), great video thanks mate

    • @danoyoung
      @danoyoung 7 หลายเดือนก่อน

      I think I know what you're trying to do as it relates to question #2, You can create a JS function that exports out a function(s) that returns the config data you want, for example. fns.js file: const funcs = { start_date: () => { return `${dataform.projectConfig.vars.my_start_date}`; }, end_date: () => { return `${dataform.projectConfig.vars.my_end_date}`; }, }; module.exports = funcs; in your SQLX file you can just reference the fns call as follows: SELECT * FROM my_table WHERE my_date_column BETWEEN "${fns.start_date()}" AND "${fns.end_date()}"; This will take the default set in the dataform.json vars section, and then if you use a compilation override, it will use those instead. As far as #1, you might want to check out the includedTargets via the InvocationConfig I have a working example of this if you want to see it in action, LMK.

    • @Satenc0
      @Satenc0 7 หลายเดือนก่อน

      @@danoyoung Thanks mate, the reason I was not being able to reference the variables at the js blocks was because I was missing the ` around it (lol)... and #1 ofcourse would be very appreciated to see that, I was thinking that maybe there is another property you could use on the .yml instead of the includedTags property, maybe another property that lets you directly put the names of your actions/files to execute

  • @Satenc0
    @Satenc0 7 หลายเดือนก่อน

    Hello, do you know how to use variables in dataform? For example if I have a query where I select records between specific timestamp/date, the date is variable? Or for example if I have a table with responses status codes, I can use a variable for the query to be able to get records with status code = variable so it can be 200, 404, etc.

    • @danoyoung
      @danoyoung 7 หลายเดือนก่อน

      I believe this is possible by defining your variable in your config.js, and then you can use the dataform cli run/compile with your vars. i.e dataform compile --vars=my_status_code=200

    • @danoyoung
      @danoyoung 7 หลายเดือนก่อน

      This works, I just created a quick demo, I'll go ahead and get it posted ....

    • @Satenc0
      @Satenc0 7 หลายเดือนก่อน

      @@danoyoung Nice, yes it also works by using the "compilation overrides" dataform interface when creating scheduled compilations, please share the link to the post would like to see this way too

    • @danoyoung
      @danoyoung 7 หลายเดือนก่อน

      @@Satenc0 yes, this is what I've covered in new video that's being uploaded, check back!

  • @asearson
    @asearson 10 หลายเดือนก่อน

    Definitely going to use this!

    • @danoyoung
      @danoyoung 10 หลายเดือนก่อน

      Let's sync up next week and review this, see what we should focus on next

  • @asearson
    @asearson 10 หลายเดือนก่อน

    This is awesome!!

  • @EnoEbiong
    @EnoEbiong 11 หลายเดือนก่อน

    Thanks once again for trying your best to showing every other things about Database form and etc.

  • @EnoEbiong
    @EnoEbiong 11 หลายเดือนก่อน

    I appreciate your lesson with understanding about Dataform based SQL Workflows

  • @asearson
    @asearson 11 หลายเดือนก่อน

    This is pretty rad! Is there a dependency UI or do you know if Google is planning on adding something like that?

    • @danoyoung
      @danoyoung 11 หลายเดือนก่อน

      Yes, if you click on the Compiled Graph link you'll get the entire dependency chain

  • @zacharyyoung6118
    @zacharyyoung6118 ปีที่แล้ว

    Super Swag Man!!

  • @asearson
    @asearson ปีที่แล้ว

    SFDC DUMMY!!!

  • @marcalanstreeter
    @marcalanstreeter ปีที่แล้ว

    Really enjoyed the walk through. Do you happen to have a demo repo to try this out locally?

    • @danoyoung
      @danoyoung ปีที่แล้ว

      I have it in a private git repo, but please reach out if you have any questions!

  • @pancakez7022
    @pancakez7022 ปีที่แล้ว

    it wa seem like mini apache nifi but in kubernetes

    • @danoyoung
      @danoyoung ปีที่แล้ว

      thank you for the comment, I see it as a competitor to both NiFi and more closely GCP Dataflow/Apache Beam streaming.

  • @wjava
    @wjava ปีที่แล้ว

    5:40, the "out" vertex pod was auto autoscaled down to 0 due to no traffic.

  • @asearson
    @asearson ปีที่แล้ว

    Pretty cool stuff. What would also be cool is if you could pass a salt/key and hash the address before the '@' . Then provide and option to un hash the values at request

    • @danoyoung
      @danoyoung ปีที่แล้ว

      Yes, that's doable

  • @asearson
    @asearson 2 ปีที่แล้ว

    ETL’ing like a Boss Man!!

    • @danoyoung
      @danoyoung ปีที่แล้ว

      skate to where the puck is going!