Apache Spark-01- Setup your environment

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ธ.ค. 2024

ความคิดเห็น • 92

  • @ScholarNest
    @ScholarNest  3 ปีที่แล้ว +1

    Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code.
    www.learningjournal.guru/courses/

  • @MohammadShahid85
    @MohammadShahid85 7 ปีที่แล้ว +14

    We are waiting for more spark videos. Your tutorials are best on the planet.

  • @spyl42
    @spyl42 6 ปีที่แล้ว +1

    One of the best Spark 101 video series I've viewed! And you have great delivery- a very lucid presentation and engaging voice! Thank you for helping me along my journey to learn the basics of Spark!

  • @KoushikPaulliveandletlive
    @KoushikPaulliveandletlive 6 ปีที่แล้ว +7

    Liked,subscribed,shared given star to github, that was the least I can do, for such a structured, to the point tutorial, many thanks

  • @jamestanaka686
    @jamestanaka686 5 ปีที่แล้ว

    I have a question. You have 1 master node and 3 worker nodes in your cluster. When you spark-submit, 5 executors are allocated on 3 worker nodes. Why the number of available executors are 5? I know you have a total of 6 cores in 3 worker nodes. Shouldn't it be 6 executors on 3 worker nodes (1 executor / core)? And then 1 driver on the master node.

  • @ajanasathian8192
    @ajanasathian8192 4 ปีที่แล้ว

    It looks like " wget " command is no longer work like that. Need to create a account for oracle in order to install JDK rpm. Just an update. As a work around I ended up using File Zila after I downloaded the JDK rpm. Is there any we can still use the wget command?

  • @sushmashamsundar8219
    @sushmashamsundar8219 5 ปีที่แล้ว

    Very nice explaination.. thanks for vedios

  • @MohammadShahid85
    @MohammadShahid85 7 ปีที่แล้ว

    Sir, kindly include introduction on apache spark. The person like me who is totally new to spark can learn better about the spark than other available tutorials because of your awesome teaching techniques. thanks.

  • @RamSharma-XpressionsUnLimited
    @RamSharma-XpressionsUnLimited 6 ปีที่แล้ว

    जबरदस्त गुरुदेव , शास्तांग प्रणाम !! पहली बार - पूर्णा स्तंभित रह गया हूँ. ! नमस्कार गुरूजी. , मुझे सही-मै आज दीक्षा प्राप्त हुवी है! Intentionally wrote in hindi , to express my feeling openly and loudly. Love your Sir.

  • @ghoshsandipan
    @ghoshsandipan 5 ปีที่แล้ว

    Subscribed, also checked all spark videos, Really awesome. Please could you add some external spark videos as well like your kafka tutorial.

  • @naveen3046
    @naveen3046 4 ปีที่แล้ว

    Im a beginner to bigdata can I start from.here or ??? I don't have idea

  • @rgaur9
    @rgaur9 5 ปีที่แล้ว

    Best tutorial I ever had seen. Thanks

  • @robind999
    @robind999 7 ปีที่แล้ว +1

    Thank you for this new video. another great demo.

  • @shivamgupta9841
    @shivamgupta9841 4 ปีที่แล้ว

    Thank you sir , for such simple & elegant tutorial. I am facing one issue , while try to open the jupyter notebook through my local browser it is throwing "this site cant be reached" error . I have tried multiple times but couldnt able to resolve it . request you to guide me .
    thank you

  • @ajinkya112
    @ajinkya112 4 ปีที่แล้ว

    Sir, great videos. Very well explained. Content is also excellent.
    I wanted to know if you can help a student in interview preparation with mock interviews.

  • @subramanianchenniappan4059
    @subramanianchenniappan4059 5 ปีที่แล้ว

    Thanks bro. I am a java developer

  • @jshaaan
    @jshaaan 5 ปีที่แล้ว

    Hi Prashant, I tried to write mysql table by standalone spark using write.format option. But,as It took long time for 3M records, I cancelled the job. However,when i was using normal mysql load method, it was loaded quickly. Do i need to configure in spark such as executor, mem etc

  • @patilyogesh
    @patilyogesh 6 ปีที่แล้ว

    Wonderful videos.Are you planning to cover unit testing and debugging spark application in some video

  • @wandicui8516
    @wandicui8516 6 ปีที่แล้ว +1

    Hi Sir, I am using google cloud, when I started the jupyter following your method, the browser returned "cannot reach xxxx". What should I do?

  • @rmaleshri
    @rmaleshri 5 ปีที่แล้ว

    excellent tutorial

  • @souvikjup
    @souvikjup 5 ปีที่แล้ว

    Very good 👍, it really helped me a lot

  • @mohammedsufiyankhan8631
    @mohammedsufiyankhan8631 6 ปีที่แล้ว

    What are the admin configurations for spark on yarn?

  • @simanchalmaharana5870
    @simanchalmaharana5870 7 ปีที่แล้ว

    Hello Sir, Since long days I was searching for series of Spark tutorials video. It's really awesome. salute you. I completed Kafka series. continuing Spark now. But this series is containing only 7 videos. Are you uploading it one by one ? Could you please upload rest of videos in this series play list. Thank you sir..

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      +Simanchal Maharana This playlist is still under development. I am adding them as and when it is ready to upload.

  • @DeepakGupta-sc7hh
    @DeepakGupta-sc7hh 4 ปีที่แล้ว

    1. I installed toree directly with below command
    pip3 install --upgrade toree
    2. then I used ran the command
    jupyter toree install --spark_home=$SPARK_HOME --interpreters=Scala,PySpark,SQL --user
    In the 2nd command I got one line of below error.
    [ToreeInstall] ERROR | Unknown interpreter PySpark. Skipping installation of PySpark interpreter
    Please suggest how to resolve it.

  • @KoushikPaulliveandletlive
    @KoushikPaulliveandletlive 5 ปีที่แล้ว

    I installed toree , and its working fine for scala, but not for SQL , am I missing anything ?

  • @SuperSazzad2010
    @SuperSazzad2010 5 ปีที่แล้ว

    multiple delimiter problem.
    Hi, Can you please suggest, how to get rid of file having multiple delimiter need to load into dataframe. I mean how to clean or format it.

  • @prabhubentick7165
    @prabhubentick7165 6 ปีที่แล้ว

    Awesome explanation !!! Thanks a ton !!!

  • @Arif6087
    @Arif6087 6 ปีที่แล้ว

    Dear sir can you please start some videos on spark machine learning topic

  • @runridefit
    @runridefit 6 ปีที่แล้ว

    Toree is no more available on Git and the downloadable version seems incompatible.
    Please review this part, rest all is working

    • @rathodnagendrasinh9077
      @rathodnagendrasinh9077 6 ปีที่แล้ว

      You can install old version as shown in video and then upgrade i.e pip install --upgrade toree . it will get you 0.3.0 version.

  • @DeepakSharma_youtube
    @DeepakSharma_youtube 7 ปีที่แล้ว

    Great tutorial as always, thanks!
    How does Jupyter compare with Zeppelin ? I've used Zeppelin a little bit (their demo) to get an idea, and it looks similar to Jupyter.

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      Both are similar things. I will cover in Zeppelin as well.

  • @paraconscious790
    @paraconscious790 7 ปีที่แล้ว

    Thanks finally, long awaited 👍!!

  • @avinashbasetty
    @avinashbasetty 6 ปีที่แล้ว

    What is the use of Torre? Can I start running without Torre setup

    • @ScholarNest
      @ScholarNest  6 ปีที่แล้ว

      Yes you can if you don't want Jupiter notebook.

  • @SantoshSingh-ki8bx
    @SantoshSingh-ki8bx 7 ปีที่แล้ว

    sir, while executing "pip install toree-0.2.0.dev1.tar.gz" . I am getting eException:
    Traceback (most recent call last):
    File "/usr/local/lib/python2.7/dist-packages/pip-9.0.1-py2.7.egg/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
    File "/usr/local/lib/python2.7/dist-packages/pip-9.0.1-py2.7.egg/pip/commands/install.py", line 342, in run
    prefix=options.prefix_path. Kindly help me.

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      +Santosh Singh you are getting permission denied. Check your credentials

  • @veerukbr1184
    @veerukbr1184 7 ปีที่แล้ว

    From where you got people.json file that ur taking

  • @KoushikPaulliveandletlive
    @KoushikPaulliveandletlive 6 ปีที่แล้ว

    simple and elegant

  • @SantoshSingh-ki8bx
    @SantoshSingh-ki8bx 6 ปีที่แล้ว

    Sir, While installing jdk8 , I am getting
    "Cannot open: jdk-8u162-linux-x64.rpm. Skipping.
    Nothing to do". I am using Centos OS VM instance.

    • @ScholarNest
      @ScholarNest  6 ปีที่แล้ว

      Santosh, the jdk link that I have used in my demo is not working. Oracle has changed the url for that version of jdk.
      You have to download Oracle jdk rpm from otn website. Then upload it to your CentOS machine. Then use yum localinstall your-jdk-rpm-file-name
      Let me know if you still have problem.

    • @SantoshSingh-ki8bx
      @SantoshSingh-ki8bx 6 ปีที่แล้ว

      Thank you,sir

    • @yek3879
      @yek3879 6 ปีที่แล้ว

      Hello Santosh,
      If you work on local machine or cluster, you also can use open jdk, that have less issues while installation. openjdk.java.net/install/

    • @grrrajareddy
      @grrrajareddy 5 ปีที่แล้ว

      @@ScholarNest in GCP Debian Linux machine - trying to download jdk 1.8 using wget but giving error message as "Username/Password Authentication Failed." After this provided username and password of OTN sign-in, this too is not working

  • @selvaganapathi9708
    @selvaganapathi9708 5 ปีที่แล้ว

    Hi Sir,
    How can i contact to you?. I have to discuss some topics in pyspark.

  • @radhika6179
    @radhika6179 6 ปีที่แล้ว

    hello Sir,
    When specify the Spark path in vi .bash_profile, u missed to include spark folder. Thanks

  • @rgaur9
    @rgaur9 5 ปีที่แล้ว

    Please make some videos on Hadoop admin if possible. Thanks a lot

  • @pragyashukla8383
    @pragyashukla8383 6 ปีที่แล้ว

    Hello Sir...Awesome videos...Thanks for all your effort..I am basically from data analysis , DBA (oracle SQL background)..What exactly do you think I should learn in spark from your videos??? I don't have java or development background..

    • @ScholarNest
      @ScholarNest  6 ปีที่แล้ว

      Spark SQL and how Spark works

  • @HimanshuSingh-id4xl
    @HimanshuSingh-id4xl 7 ปีที่แล้ว

    Awesome tutorial on Spark.
    While doing hands on getting issue on yum installation.
    How to install YUM on linux machine ? Created a linux machine on google cloud. but yum is not available by default.
    Tried some online resources, but unable to install yum.
    Can you please share steps for installing yum on google cloud linux machine instace.

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      Yum is the default package manager for Red Hat, Fedora, and CentOS. Did you choose the Linux Image type? I think Default in GCP is Debian that doesn't come with Yum.

  • @a2zmovies771
    @a2zmovies771 5 ปีที่แล้ว

    Hi sir such a good videos Sir

  • @ahmedmjhool4822
    @ahmedmjhool4822 6 ปีที่แล้ว

    you are the best man!

  • @suyatnoyatno2487
    @suyatnoyatno2487 6 ปีที่แล้ว

    thank for your share video,

  • @tushibhaque863
    @tushibhaque863 7 ปีที่แล้ว

    sir plz make a video about the difference tools knowledge hadoop admin vs hadoop developer should have

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      We are living in the era of DevOps :-). My take is to learn as much as you can without thinking of Dev or Admin.

  • @rajkumarp7784
    @rajkumarp7784 7 ปีที่แล้ว

    sir, i am nota able to install java using yum. i selected centos 6 image. the error is cannot open : jdk144.rpm skipping Nothing to do.
    please help

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      +rajkumar p, your rpm maybe corrupt. Download it again and specify full path with yum local install.

    • @rajkumarp7784
      @rajkumarp7784 7 ปีที่แล้ว

      Learning Journal Hi Sir I tried the same and still showing the same error. is there anyway that you can create public image and share it with us, if possible pls do the same.

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      +rajkumar p We are not allowed to create public images. However, we can create an image and share it with specific people. Is there any one willing to help?

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      +rajkumar p, I can try to configure the VM for you. Drop me an email for instructions (check out my contact at www.learningjournal.guru )

    • @SantoshSingh-ki8bx
      @SantoshSingh-ki8bx 6 ปีที่แล้ว

      +rajkumar p, could you plz help me. I am also getting same issue here

  • @pranav966
    @pranav966 7 ปีที่แล้ว

    Awesome !

  • @veerukbr1184
    @veerukbr1184 7 ปีที่แล้ว

    Thank you so much..........

  • @ThanhNguyen-ox6zu
    @ThanhNguyen-ox6zu 5 ปีที่แล้ว

    hello,
    when i run
    jupyter toree install --spark_home=$SPARK_HOME --interpreters=Scala,PySpark,SQL --user
    the result
    Error executing Jupyter command 'toree': [Errno 2] No such file or directory
    help me!!!!!!!!!

    • @ThanhNguyen-ox6zu
      @ThanhNguyen-ox6zu 5 ปีที่แล้ว

      i was install toree with pip3 install --upgrade toree

  • @bicepjai
    @bicepjai 7 ปีที่แล้ว +3

    If anyone wants a docker image for this working environment you could use this docker file i created for myself. github.com/bicepjai/docker_files/tree/master/learn-spark

    • @rajkumarp7784
      @rajkumarp7784 7 ปีที่แล้ว

      Jayaram Prabhu Durairaj bro can u pls help me to setup the spark vm.

  • @dineshchebolu166
    @dineshchebolu166 7 ปีที่แล้ว

    how to integrate kafka with spark?

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      +Dinesh Chebolu I will cover that as soon as I reach to Spark streaming.

  • @rameshthamizhselvan2458
    @rameshthamizhselvan2458 6 ปีที่แล้ว

    excellent

  • @gopinathGopiRebel
    @gopinathGopiRebel 6 ปีที่แล้ว

    how to configure in mac any reference please.

    • @ScholarNest
      @ScholarNest  6 ปีที่แล้ว +1

      Yes. You can also try Zeppelin. That appears to be a better option compared to jupyter for Scala.

  • @veerukbr1184
    @veerukbr1184 7 ปีที่แล้ว

    dear sir when can i expect spark 02 video please make video as soon as possible

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      Soon. I am working on it.

  • @pc0riginal870
    @pc0riginal870 5 ปีที่แล้ว

    Sir, Thank You so much

  • @Drivebyeasy
    @Drivebyeasy 7 ปีที่แล้ว

    I think no need to install of toree
    you just add some line into bashrc
    export SPARK_HOME=/Users/akashsoni/spark
    export PATH=$SPARK_HOME/bin:$PATH
    export PYSPARK_DRIVER_PYTHON="jupyter"
    export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
    or some line in spark-env.sh
    export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Ho$
    export SPARK_HOME=/Users/akashsoni/spark
    export PYSPARK_PYTHON=/anaconda3/bin/python
    with java path

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      That's pyspark. What about Scala Notebook?

    • @Drivebyeasy
      @Drivebyeasy 7 ปีที่แล้ว

      P.s -I have tried for pyspark

  • @lakshminarayanag6601
    @lakshminarayanag6601 5 ปีที่แล้ว

    Liked your videos . am not sure how to get free google cloud account. Can you please help ? Can I practice your sessions in Cloudera?

    • @ScholarNest
      @ScholarNest  5 ปีที่แล้ว

      You can use cloudera. If that looks too heavy then download Zeppelin. It comes with embedded Spark :-)

  • @veerukbr1184
    @veerukbr1184 7 ปีที่แล้ว

    BELOW ERROR I AM GETTING WHEN I AM TRYING TO EXTRACT IN GCP
    tar (child): spark-2.2.0-bin-hadoop2.6.tgz: Cannot open: No such file or directory
    tar (child): Error is not recoverable: exiting now
    tar: Child returned status 2
    tar: Error is not recoverable: exiting now

    • @chsam4795
      @chsam4795 6 ปีที่แล้ว

      Same error .. any solution prodivided >

    • @ripudamansingh5866
      @ripudamansingh5866 6 ปีที่แล้ว

      go to the Downloads folder and right click on the tar and click on extract here
      it will work

  • @ajanasathian8192
    @ajanasathian8192 4 ปีที่แล้ว

    wget command for Apache Tore 0.2.0 is also gave me 404 error. It looks like that link is removed from git. Please see here:
    dist.apache.org/repos/dist/dev/incubator/toree/

    • @ajanasathian8192
      @ajanasathian8192 4 ปีที่แล้ว

      Actually I found the link in the anaconda site. pypi.anaconda.org/hyoon/simple/toree/

  • @durgak6717
    @durgak6717 6 ปีที่แล้ว

    [root@localhost ~]# pyspark
    /root/spark/spark-2.2.0-bin-hadoop2.7/bin/spark-class: line 71: /usr/java/jdk1.8.0_171/bin/java: No such file or directory
    -------------I am hitting the above error after installing spark and jdk on the linux machine. Is this something to do with the setting the path variables? - by the way you have done a fabulous job with your videos. Thanks a Ton