5. Count rows in each column where NULLs present| Top 10 PySpark Scenario Based Interview Question|

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 ธ.ค. 2024

ความคิดเห็น • 8

  • @exploreyouremotions
    @exploreyouremotions หลายเดือนก่อน

    if with the latest version of pyspark, the null values are automatically removed from the count. 2 scenarios if you do count(columnName) the nulls are removed and counts are given, but if you do count(*) if will include the counts of null also. Anyways all videos are super helpful thanks!

  • @2412_Sujoy_Das
    @2412_Sujoy_Das ปีที่แล้ว +1

    Sagar sir.... my solution in spark sql
    1) df_1 = spark.read.csv("dbfs:/FileStore/tables/Spark_Practise_1.csv", header=True)
    2) df_1.createOrReplaceTempView("Sujoy_1")
    3) %sql Select SUM(CASE WHEN ID like 'null' THEN 1 ELSE 0 END) as ID,
    SUM(CASE WHEN Name like 'null' THEN 1 ELSE 0 END) as Name,
    SUM(CASE WHEN Age like 'null' THEN 1 ELSE 0 END) as Age
    from Sujoy_1

  • @biramdevpawar9902
    @biramdevpawar9902 ปีที่แล้ว +1

    df2=df1.columns
    column_counts={}
    for nums in df2:
    df3=df1.filter(col(nums).isnull()).count()
    column_counts[nums] = df3
    print(column_counts)

  • @tanushreenagar3116
    @tanushreenagar3116 10 หลายเดือนก่อน

    nice

  • @surbhinabira3514
    @surbhinabira3514 8 หลายเดือนก่อน +1

    from pyspark.sql.functions import count,when
    df = spark.read.option("nullValue","null").csv("dbfs:/FileStore/testing.csv", header=True)
    df.createOrReplaceTempView("temp")
    display(spark.sql("select count(*)-count(id) as nullcount_for_id, count(*)-count(name) as nullcount_for_name,count(*)-count(age) as nullcount_for_age from temp"))

  • @syedahamed3728
    @syedahamed3728 5 หลายเดือนก่อน

    df1=df.select([sum(col(c).isNull().cast('int')).alias('c') for c in df.columns])
    df1.show()

  • @manjulagulabal2923
    @manjulagulabal2923 ปีที่แล้ว +3

    data = [
    (1, "A", 23),
    (2, "B", None),
    (3, "C", 56),
    (4, None, None),
    (5, None, None)
    ]
    data_schema=['ID','Name','Age']
    df=spark.createDataFrame(data,data_schema)
    df1=df.select([(df.count()-count(i)).alias(i) for i in df.columns])
    df1.show()
    +---+----+---+
    | ID|Name|Age|
    +---+----+---+
    | 0| 2| 3|
    +---+----+---+