Mastering Data Validation🔍 : Building a Metadata-Driven Framework for Error-Free Data🎯

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 พ.ย. 2024

ความคิดเห็น • 8

  • @KnowledgeAmplifier1
    @KnowledgeAmplifier1  ปีที่แล้ว +1

    Snowflake SQL Query:
    --------------------------------------
    drop database if exists ramu;
    use role accountadmin;
    --Database Creation
    create database if not exists ramu;
    --Specify the active/current database for the session.
    use ramu;
    create or replace TABLE RAMU.PUBLIC.dummy_table (
    a VARCHAR(16777216),
    b int,
    c int
    );
    INSERT INTO dummy_table VALUES ('foo' ,1, 5),
    ('bar' ,2, 6),
    ('baz' ,3, null);
    INSERT INTO dummy_table VALUES ('foo' ,10, 5);
    select count(*) from dummy_table where C is null;
    ;
    select min(b) from dummy_table;
    create or replace TABLE RAMU.PUBLIC.dq_check (
    table_name VARCHAR(16777216),
    description VARCHAR(16777216),
    sql_query_1 VARCHAR(16777216),
    sql_query_2 VARCHAR(16777216),
    comparison_type VARCHAR(16777216)
    );
    INSERT INTO dq_check VALUES ('dummy_table','Check if all the values of c are not null','select count(*) from dummy_table where C is null','select 0','equals');
    INSERT INTO dq_check VALUES ('dummy_table','Check if all the values of a is unique','with a_dups as (
    select a,count(*) as count_value from dummy_table group by a having(count_value>1))
    select count(*) from a_dups','select 0','equals');
    INSERT INTO dq_check VALUES ('dummy_table','Check if the minimum value of b column is 2','select min(b) from dummy_table','select 2','greater_equals');
    select * from dq_check where table_name='dummy_table';
    Python Code:
    -----------------------
    #pip install snowflake-connector-python
    #pip install "snowflake-connector-python[pandas]" -t .
    #pip install pandas -t .
    from snowflake.connector import connect
    import pandas as pd
    import os
    def run_query(conn, query):
    cursor = conn.cursor()
    cursor.execute(query)
    cursor.close()
    def run_query1(conn, query):
    cursor = conn.cursor()
    cursor.execute(query)
    records=cursor.fetchone()[0]
    cursor.close()
    return records
    def execute_test(
    db_conn,
    script_1,
    script_2,
    comp_operator):
    print("1st SQL Query : ",script_1)
    result_1=run_query1(db_conn,script_1)
    print("2nd SQL Query : ", script_2)
    result_2 = run_query1(db_conn, script_2)
    print("result 1 = " + str(result_1))
    print("result 2 = " + str(result_2))
    # compare values based on the comp_operator
    if comp_operator == "equals":
    return result_1 == result_2
    elif comp_operator == "greater_equals":
    return result_1 >= result_2
    elif comp_operator == "greater":
    return result_1 > result_2
    elif comp_operator == "less_equals":
    return result_1

  • @mangesh4231
    @mangesh4231 11 หลายเดือนก่อน +1

    Very detailed contents and brief explanation, thanks a lot for all the efforts.

  • @madhubhardwaj4512
    @madhubhardwaj4512 ปีที่แล้ว +3

    You are most underrated TH-camr for data engineering.. thanks a lot for your time and effort for the videos.
    Can you please make some videos on spark streaming ? Thank you in advance.

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  ปีที่แล้ว +1

      Thank you so much for your kind words and support madhu bhardwaj! I truly appreciate your feedback 😊 Spark Streaming is definitely a topic I have in mind for future videos. I'll make sure to include it in my upcoming content roadmap. Stay tuned for more exciting videos on data engineering, including Spark Streaming. Thank you again for your encouragement and continued support! 🙏🎥

  • @aghulus111
    @aghulus111 ปีที่แล้ว +1

    Thank u so much🎉🎉

  • @lifelearningneo
    @lifelearningneo ปีที่แล้ว +2