07 Spark Streaming Read from Files | Flatten JSON data

แชร์
ฝัง
  • เผยแพร่เมื่อ 13 ม.ค. 2025

ความคิดเห็น • 12

  • @gautamKumar-dg3ss
    @gautamKumar-dg3ss 8 หลายเดือนก่อน +2

    very informative, please make more projects on streaming

  • @revathinp4551
    @revathinp4551 7 หลายเดือนก่อน +1

    For me clearSource and sourceArchive is not working ,files are not getting archived and archive folder is not getting created.whta colud be the issues?

    • @easewithdata
      @easewithdata  6 หลายเดือนก่อน

      Please check this link to check if you are setting all parameters as per requirement - spark.apache.org/docs/latest/structured-streaming-programming-guide.html

  • @kamilstolarz7017
    @kamilstolarz7017 9 หลายเดือนก่อน

    Hi, in my example, I had to set schema for streaming input file. I figured it out, but i'm wondering if it was my mistake on my part, or if your env configuration was diffrent and allows streaming without setting a schema?

    • @easewithdata
      @easewithdata  9 หลายเดือนก่อน

      We specify schema in case of Streaming data to make sure the events are not malformed. But if you still want to infer the schema on run time, you can set spark.sql.streaming.schemaInference to true

    • @kamilstolarz7017
      @kamilstolarz7017 9 หลายเดือนก่อน +1

      @@easewithdata Oh sorry, I watched the video again and now I see your comment about schemaInference. Anyway, thanks for the reply and keep going because you are doing a good job!

    • @jayantmeshram7370
      @jayantmeshram7370 4 หลายเดือนก่อน +1

      static_df = spark.read.json("/home/jovyan/spark-streaming/data/input/device_files/")
      inferred_schema = static_df.schema
      # Print the inferred schema
      static_df.printSchema()
      spark.conf.set("spark.sql.streaming.SchemaInference",True)
      streaming_df = (
      spark
      .readStream
      .schema(inferred_schema)
      .option("cleanSource","archive")
      .option("sourceArchiveDir","archive_der")
      .option("maxFilesPerTrigger", 1)
      .format("json")
      .load("/home/jovyan/spark-streaming/data/input/device_files/")

  • @user-eg1ss7im6q
    @user-eg1ss7im6q 7 หลายเดือนก่อน

    thanks very much for the clip, very helpful, but i have two questions, my jupter notebook didn't show the left panel as the direcotry. and the write steam appeared to take forever, even it wrote to csv file. how to solve this please?

    • @easewithdata
      @easewithdata  6 หลายเดือนก่อน

      Thanks ❤️ If you like my content, please make sure to share the same with your LinkedIn network 🛜
      For write stream taking forever, can you share the code.

  • @vishalalagh1031
    @vishalalagh1031 7 หลายเดือนก่อน

    clearSource and sourceArchiveDir not working, files are not archived from input folder, still stands there, no archive folder is being created on running the same code, though the streaming works perfectly fine, what could be the possible reasons for it?
    for content: really helpful, to the point with actual use case, thanks for putting up such informative content

    • @easewithdata
      @easewithdata  7 หลายเดือนก่อน

      If possible can you paste your code here