Spark Installation | PySpark Installation | Windows 10 / 11 | Step by Step |

แชร์
ฝัง
  • เผยแพร่เมื่อ 18 ธ.ค. 2024

ความคิดเห็น • 51

  • @DEwithDhairy
    @DEwithDhairy  11 หลายเดือนก่อน +1

    If we don't want to use the virtual environment python then.
    add the below environment variable.
    Variable Name : PYSPARK_PYTHON
    Variable Value : C:\Users\{your_user_name}\AppData\Local\Programs\Python\{PYOUR_PYTHON_VERSION}\python.exe
    if you add the "PYSPARK_PYTHON" variable then you will not required to set the OS environ variables in the code.

  • @pramilaj78
    @pramilaj78 9 หลายเดือนก่อน

    Really, you are great tutor.
    I literally struggled googling for errors running pyspark files. Finally your video helped me..Many Thanks

  • @BiswajitSibun-n4b
    @BiswajitSibun-n4b 4 หลายเดือนก่อน

    The Best Video about this topic I found on YT

    • @DEwithDhairy
      @DEwithDhairy  4 หลายเดือนก่อน

      Glad u found helpful.

  • @ithisrinu9593
    @ithisrinu9593 4 หลายเดือนก่อน

    I really apricate you brother . i was encountering many issues even i could not figure out from out . but this video resolves all errors .Thank you .

    • @DEwithDhairy
      @DEwithDhairy  4 หลายเดือนก่อน

      Glad u found useful

  • @nagumallasravansai249
    @nagumallasravansai249 3 หลายเดือนก่อน

    you are awesome buddy!

  • @ПавелГорюнов-п3в
    @ПавелГорюнов-п3в 13 วันที่ผ่านมา

    Thank you so much!!!! very usefull stuff

    • @ПавелГорюнов-п3в
      @ПавелГорюнов-п3в 13 วันที่ผ่านมา

      Additionally , i need to install py4j through pip install py4j. After that it working

    • @DEwithDhairy
      @DEwithDhairy  13 วันที่ผ่านมา

      Glad you liked it.

  • @satheeshkumarak6708
    @satheeshkumarak6708 10 หลายเดือนก่อน

    You are a Life Saver!!!

  • @g.suresh430
    @g.suresh430 11 หลายเดือนก่อน

    Hi, nice explanation. Thank for making the video. I request you to make a video how to write df to csv file.

    • @DEwithDhairy
      @DEwithDhairy  11 หลายเดือนก่อน

      Thanks
      Sure will make.

    • @DEwithDhairy
      @DEwithDhairy  11 หลายเดือนก่อน

      Do checkout other playlist as well.

    • @g.suresh430
      @g.suresh430 11 หลายเดือนก่อน

      @@DEwithDhairy Sure

  • @MuekeMwangangi
    @MuekeMwangangi หลายเดือนก่อน

    This helped me

    • @DEwithDhairy
      @DEwithDhairy  หลายเดือนก่อน

      Glad it helped u

  • @nsreeabburi2292
    @nsreeabburi2292 2 หลายเดือนก่อน

    Thanks Diraj. Am trying to do via notebook when am execting the code am getting py4JJavaerror. And how can I see pyspark kernel in notebook do u have any idea about it

  • @srinivasn2646
    @srinivasn2646 4 หลายเดือนก่อน

    Thanks Man

    • @DEwithDhairy
      @DEwithDhairy  4 หลายเดือนก่อน

      Glad u liked it.

  • @ashokraj-g5o
    @ashokraj-g5o 11 หลายเดือนก่อน

    Is Java 17 version incompatible with Hadoop 3.3.5?

    • @DEwithDhairy
      @DEwithDhairy  10 หลายเดือนก่อน

      Cannot recollect
      Check the documentation

  • @Rayudu_Alapati
    @Rayudu_Alapati 11 หลายเดือนก่อน

    Hi. When im running pyspark in command prompt .it is showing the error. And when im initializing a varibale like x=sc.textFile("Readme")
    It is givinh the error as sc is not defined..please help

    • @DEwithDhairy
      @DEwithDhairy  11 หลายเดือนก่อน

      Pls send the entire code.
      And check if you have installed all the tools successfully

    • @Rayudu_Alapati
      @Rayudu_Alapati 11 หลายเดือนก่อน

      Resolved by installing python version inline with spark. Thank you..btw video is so helpful
      @@DEwithDhairy

    • @DEwithDhairy
      @DEwithDhairy  11 หลายเดือนก่อน +1

      @@Rayudu_Alapati glad you found helpful.
      Do checkout other playlist.
      And share in your network 😀.

  • @SandhyaRani-eu7tn
    @SandhyaRani-eu7tn 11 หลายเดือนก่อน

    Hello @Dhairy Gupta, I followed the same steps what u said, but I'm getting error for Spark and pyspark as --> "is not recognized as an internal or external command,
    operable program or batch file." could u please tell what I've to do?

    • @DEwithDhairy
      @DEwithDhairy  11 หลายเดือนก่อน

      Hi SandhyaRani
      It's very generic error ,
      Can you paste the entire error or you can send me the Screenshot of error on LinkedIn.

    • @SandhyaRani-eu7tn
      @SandhyaRani-eu7tn 11 หลายเดือนก่อน +1

      @@DEwithDhairyThanku for responding & this is the error when I run it in cmd -> C:\Users\USER>spark-shell
      'spark-shell' is not recognized as an internal or external command,
      operable program or batch file.

    • @DEwithDhairy
      @DEwithDhairy  11 หลายเดือนก่อน

      ​@@SandhyaRani-eu7tn got it,
      It seems like you have missed the environmental variable setup of one of the below
      Hadoop , Java or spark
      So check that first ,
      And check the Java and python also if they are proper...
      By doing
      Java --version
      Python --version

    • @pawanmishra56
      @pawanmishra56 11 หลายเดือนก่อน

      You Need to set up path JAVA_HOME, HADOOP_HOME and SPARK_HOME by giving the path. Afterword ensure to add them in path %JAVA_HOME%\bin, %HADOOP_HOME%\bin, %SPARK_HOME%\bin

    • @pawanmishra56
      @pawanmishra56 11 หลายเดือนก่อน

      Also, ensure to run cmd using Admin first time.

  • @ChandanDeveloper
    @ChandanDeveloper 2 หลายเดือนก่อน

    when I try to install spark in windows home then getting error

  • @yashrammalhotra229
    @yashrammalhotra229 9 หลายเดือนก่อน

    Hello help me i am getting crash error

    • @DEwithDhairy
      @DEwithDhairy  9 หลายเดือนก่อน

      Try cross checking with my setup and yours

  • @En-Coder-kp6qn
    @En-Coder-kp6qn 6 หลายเดือนก่อน

    Thanks bro!

  • @g.suresh430
    @g.suresh430 11 หลายเดือนก่อน

    I am getting error. Please help
    from pyspark.sql import SparkSession
    from datetime import datetime, date
    from pyspark.sql import Row
    import os
    import sys
    os.environ['PYSPARK_PYTHON'] = sys.executable
    os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable
    print(sys.executable)
    spark = SparkSession.builder.getOrCreate()
    data = [(1, 'A'), (2, 'B')]
    schema = ['id', 'name']
    df = spark.createDataFrame(data, schema)
    # df.show()
    df.write.csv(path='D:/Practice/PySpark/Files', header=True, mode='overwrite')

    • @DEwithDhairy
      @DEwithDhairy  11 หลายเดือนก่อน

      What are error r getting

    • @g.suresh430
      @g.suresh430 11 หลายเดือนก่อน

      Setting default log level to "WARN".
      To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
      24/01/18 11:05:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      24/01/18 11:06:20 ERROR FileFormatWriter: Aborting job d9f52533-0dd7-4058-8832-d49e12cd6773.
      java.lang.UnsatisfiedLinkError: 'boolean org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String, int)'
      at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
      at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:793)
      at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1249)
      at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1454)
      at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:601)
      at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972)
      at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014)
      at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:761)
      at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972)
      at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014)
      at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:334)
      at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:404)
      at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:377)
      at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:192)
      at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$writeAndCommit$3(FileFormatWriter.scala:275)
      at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
      at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:640)
      at org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:275)
      at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:304)@@DEwithDhairy

    • @DEwithDhairy
      @DEwithDhairy  11 หลายเดือนก่อน

      @@g.suresh430 it's seems like
      You have not configured the hadoop winutils properly.
      Go through the videos once again
      And check yours environmental variables and mine .
      That will solve the issue.

    • @g.suresh430
      @g.suresh430 11 หลายเดือนก่อน

      I am able to read data from CSV File but I am unable to write into CSV file.
      I added below paths also
      JAVA_HOME - C:\Program Files\Java\jdk-17
      HADOOP_HOME - C:\spark\spark-3.4.2-bin-hadoop3\hadoop
      SPARK_HOME - C:\spark\spark-3.4.2-bin-hadoop3
      %JAVA_HOME%\bin
      %SPARK_HOME%\bin
      %HADOOP_HOME%\bin
      %path%
      @@DEwithDhairy

  • @debajyotijana8997
    @debajyotijana8997 9 หลายเดือนก่อน

    while running this code this error occurred 24/03/12 11:52:23 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
    org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:601)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:583)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
    at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:772)
    at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:749)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)occured

    • @DEwithDhairy
      @DEwithDhairy  9 หลายเดือนก่อน

      Try checking the my setup and your setup again

  • @ashokraj-g5o
    @ashokraj-g5o 11 หลายเดือนก่อน

    Is Java 17 version incompatible with Hadoop 3.3.5?

    • @DEwithDhairy
      @DEwithDhairy  9 หลายเดือนก่อน

      Check the documentation it keeps on changing