Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code. www.learningjournal.guru/courses/
One of the best Spark 101 video series I've viewed! And you have great delivery- a very lucid presentation and engaging voice! Thank you for helping me along my journey to learn the basics of Spark!
I have a question. You have 1 master node and 3 worker nodes in your cluster. When you spark-submit, 5 executors are allocated on 3 worker nodes. Why the number of available executors are 5? I know you have a total of 6 cores in 3 worker nodes. Shouldn't it be 6 executors on 3 worker nodes (1 executor / core)? And then 1 driver on the master node.
It looks like " wget " command is no longer work like that. Need to create a account for oracle in order to install JDK rpm. Just an update. As a work around I ended up using File Zila after I downloaded the JDK rpm. Is there any we can still use the wget command?
Sir, kindly include introduction on apache spark. The person like me who is totally new to spark can learn better about the spark than other available tutorials because of your awesome teaching techniques. thanks.
जबरदस्त गुरुदेव , शास्तांग प्रणाम !! पहली बार - पूर्णा स्तंभित रह गया हूँ. ! नमस्कार गुरूजी. , मुझे सही-मै आज दीक्षा प्राप्त हुवी है! Intentionally wrote in hindi , to express my feeling openly and loudly. Love your Sir.
Thank you sir , for such simple & elegant tutorial. I am facing one issue , while try to open the jupyter notebook through my local browser it is throwing "this site cant be reached" error . I have tried multiple times but couldnt able to resolve it . request you to guide me . thank you
Sir, great videos. Very well explained. Content is also excellent. I wanted to know if you can help a student in interview preparation with mock interviews.
Hi Prashant, I tried to write mysql table by standalone spark using write.format option. But,as It took long time for 3M records, I cancelled the job. However,when i was using normal mysql load method, it was loaded quickly. Do i need to configure in spark such as executor, mem etc
Hello Sir, Since long days I was searching for series of Spark tutorials video. It's really awesome. salute you. I completed Kafka series. continuing Spark now. But this series is containing only 7 videos. Are you uploading it one by one ? Could you please upload rest of videos in this series play list. Thank you sir..
1. I installed toree directly with below command pip3 install --upgrade toree 2. then I used ran the command jupyter toree install --spark_home=$SPARK_HOME --interpreters=Scala,PySpark,SQL --user In the 2nd command I got one line of below error. [ToreeInstall] ERROR | Unknown interpreter PySpark. Skipping installation of PySpark interpreter Please suggest how to resolve it.
multiple delimiter problem. Hi, Can you please suggest, how to get rid of file having multiple delimiter need to load into dataframe. I mean how to clean or format it.
Great tutorial as always, thanks! How does Jupyter compare with Zeppelin ? I've used Zeppelin a little bit (their demo) to get an idea, and it looks similar to Jupyter.
sir, while executing "pip install toree-0.2.0.dev1.tar.gz" . I am getting eException: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/pip-9.0.1-py2.7.egg/pip/basecommand.py", line 215, in main status = self.run(options, args) File "/usr/local/lib/python2.7/dist-packages/pip-9.0.1-py2.7.egg/pip/commands/install.py", line 342, in run prefix=options.prefix_path. Kindly help me.
Santosh, the jdk link that I have used in my demo is not working. Oracle has changed the url for that version of jdk. You have to download Oracle jdk rpm from otn website. Then upload it to your CentOS machine. Then use yum localinstall your-jdk-rpm-file-name Let me know if you still have problem.
@@ScholarNest in GCP Debian Linux machine - trying to download jdk 1.8 using wget but giving error message as "Username/Password Authentication Failed." After this provided username and password of OTN sign-in, this too is not working
Hello Sir...Awesome videos...Thanks for all your effort..I am basically from data analysis , DBA (oracle SQL background)..What exactly do you think I should learn in spark from your videos??? I don't have java or development background..
Awesome tutorial on Spark. While doing hands on getting issue on yum installation. How to install YUM on linux machine ? Created a linux machine on google cloud. but yum is not available by default. Tried some online resources, but unable to install yum. Can you please share steps for installing yum on google cloud linux machine instace.
Yum is the default package manager for Red Hat, Fedora, and CentOS. Did you choose the Linux Image type? I think Default in GCP is Debian that doesn't come with Yum.
Learning Journal Hi Sir I tried the same and still showing the same error. is there anyway that you can create public image and share it with us, if possible pls do the same.
+rajkumar p We are not allowed to create public images. However, we can create an image and share it with specific people. Is there any one willing to help?
hello, when i run jupyter toree install --spark_home=$SPARK_HOME --interpreters=Scala,PySpark,SQL --user the result Error executing Jupyter command 'toree': [Errno 2] No such file or directory help me!!!!!!!!!
If anyone wants a docker image for this working environment you could use this docker file i created for myself. github.com/bicepjai/docker_files/tree/master/learn-spark
I think no need to install of toree you just add some line into bashrc export SPARK_HOME=/Users/akashsoni/spark export PATH=$SPARK_HOME/bin:$PATH export PYSPARK_DRIVER_PYTHON="jupyter" export PYSPARK_DRIVER_PYTHON_OPTS='notebook' or some line in spark-env.sh export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Ho$ export SPARK_HOME=/Users/akashsoni/spark export PYSPARK_PYTHON=/anaconda3/bin/python with java path
BELOW ERROR I AM GETTING WHEN I AM TRYING TO EXTRACT IN GCP tar (child): spark-2.2.0-bin-hadoop2.6.tgz: Cannot open: No such file or directory tar (child): Error is not recoverable: exiting now tar: Child returned status 2 tar: Error is not recoverable: exiting now
wget command for Apache Tore 0.2.0 is also gave me 404 error. It looks like that link is removed from git. Please see here: dist.apache.org/repos/dist/dev/incubator/toree/
[root@localhost ~]# pyspark /root/spark/spark-2.2.0-bin-hadoop2.7/bin/spark-class: line 71: /usr/java/jdk1.8.0_171/bin/java: No such file or directory -------------I am hitting the above error after installing spark and jdk on the linux machine. Is this something to do with the setting the path variables? - by the way you have done a fabulous job with your videos. Thanks a Ton
Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code.
www.learningjournal.guru/courses/
We are waiting for more spark videos. Your tutorials are best on the planet.
One of the best Spark 101 video series I've viewed! And you have great delivery- a very lucid presentation and engaging voice! Thank you for helping me along my journey to learn the basics of Spark!
Liked,subscribed,shared given star to github, that was the least I can do, for such a structured, to the point tutorial, many thanks
I have a question. You have 1 master node and 3 worker nodes in your cluster. When you spark-submit, 5 executors are allocated on 3 worker nodes. Why the number of available executors are 5? I know you have a total of 6 cores in 3 worker nodes. Shouldn't it be 6 executors on 3 worker nodes (1 executor / core)? And then 1 driver on the master node.
It looks like " wget " command is no longer work like that. Need to create a account for oracle in order to install JDK rpm. Just an update. As a work around I ended up using File Zila after I downloaded the JDK rpm. Is there any we can still use the wget command?
Very nice explaination.. thanks for vedios
Sir, kindly include introduction on apache spark. The person like me who is totally new to spark can learn better about the spark than other available tutorials because of your awesome teaching techniques. thanks.
जबरदस्त गुरुदेव , शास्तांग प्रणाम !! पहली बार - पूर्णा स्तंभित रह गया हूँ. ! नमस्कार गुरूजी. , मुझे सही-मै आज दीक्षा प्राप्त हुवी है! Intentionally wrote in hindi , to express my feeling openly and loudly. Love your Sir.
Subscribed, also checked all spark videos, Really awesome. Please could you add some external spark videos as well like your kafka tutorial.
Im a beginner to bigdata can I start from.here or ??? I don't have idea
Best tutorial I ever had seen. Thanks
Thank you for this new video. another great demo.
Thank you sir , for such simple & elegant tutorial. I am facing one issue , while try to open the jupyter notebook through my local browser it is throwing "this site cant be reached" error . I have tried multiple times but couldnt able to resolve it . request you to guide me .
thank you
Sir, great videos. Very well explained. Content is also excellent.
I wanted to know if you can help a student in interview preparation with mock interviews.
Thanks bro. I am a java developer
Hi Prashant, I tried to write mysql table by standalone spark using write.format option. But,as It took long time for 3M records, I cancelled the job. However,when i was using normal mysql load method, it was loaded quickly. Do i need to configure in spark such as executor, mem etc
Wonderful videos.Are you planning to cover unit testing and debugging spark application in some video
Hi Sir, I am using google cloud, when I started the jupyter following your method, the browser returned "cannot reach xxxx". What should I do?
excellent tutorial
Very good 👍, it really helped me a lot
What are the admin configurations for spark on yarn?
Hello Sir, Since long days I was searching for series of Spark tutorials video. It's really awesome. salute you. I completed Kafka series. continuing Spark now. But this series is containing only 7 videos. Are you uploading it one by one ? Could you please upload rest of videos in this series play list. Thank you sir..
+Simanchal Maharana This playlist is still under development. I am adding them as and when it is ready to upload.
1. I installed toree directly with below command
pip3 install --upgrade toree
2. then I used ran the command
jupyter toree install --spark_home=$SPARK_HOME --interpreters=Scala,PySpark,SQL --user
In the 2nd command I got one line of below error.
[ToreeInstall] ERROR | Unknown interpreter PySpark. Skipping installation of PySpark interpreter
Please suggest how to resolve it.
I installed toree , and its working fine for scala, but not for SQL , am I missing anything ?
multiple delimiter problem.
Hi, Can you please suggest, how to get rid of file having multiple delimiter need to load into dataframe. I mean how to clean or format it.
Awesome explanation !!! Thanks a ton !!!
Dear sir can you please start some videos on spark machine learning topic
Toree is no more available on Git and the downloadable version seems incompatible.
Please review this part, rest all is working
You can install old version as shown in video and then upgrade i.e pip install --upgrade toree . it will get you 0.3.0 version.
Great tutorial as always, thanks!
How does Jupyter compare with Zeppelin ? I've used Zeppelin a little bit (their demo) to get an idea, and it looks similar to Jupyter.
Both are similar things. I will cover in Zeppelin as well.
Thanks finally, long awaited 👍!!
What is the use of Torre? Can I start running without Torre setup
Yes you can if you don't want Jupiter notebook.
sir, while executing "pip install toree-0.2.0.dev1.tar.gz" . I am getting eException:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/pip-9.0.1-py2.7.egg/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/usr/local/lib/python2.7/dist-packages/pip-9.0.1-py2.7.egg/pip/commands/install.py", line 342, in run
prefix=options.prefix_path. Kindly help me.
+Santosh Singh you are getting permission denied. Check your credentials
From where you got people.json file that ur taking
simple and elegant
Sir, While installing jdk8 , I am getting
"Cannot open: jdk-8u162-linux-x64.rpm. Skipping.
Nothing to do". I am using Centos OS VM instance.
Santosh, the jdk link that I have used in my demo is not working. Oracle has changed the url for that version of jdk.
You have to download Oracle jdk rpm from otn website. Then upload it to your CentOS machine. Then use yum localinstall your-jdk-rpm-file-name
Let me know if you still have problem.
Thank you,sir
Hello Santosh,
If you work on local machine or cluster, you also can use open jdk, that have less issues while installation. openjdk.java.net/install/
@@ScholarNest in GCP Debian Linux machine - trying to download jdk 1.8 using wget but giving error message as "Username/Password Authentication Failed." After this provided username and password of OTN sign-in, this too is not working
Hi Sir,
How can i contact to you?. I have to discuss some topics in pyspark.
hello Sir,
When specify the Spark path in vi .bash_profile, u missed to include spark folder. Thanks
Please make some videos on Hadoop admin if possible. Thanks a lot
Hello Sir...Awesome videos...Thanks for all your effort..I am basically from data analysis , DBA (oracle SQL background)..What exactly do you think I should learn in spark from your videos??? I don't have java or development background..
Spark SQL and how Spark works
Awesome tutorial on Spark.
While doing hands on getting issue on yum installation.
How to install YUM on linux machine ? Created a linux machine on google cloud. but yum is not available by default.
Tried some online resources, but unable to install yum.
Can you please share steps for installing yum on google cloud linux machine instace.
Yum is the default package manager for Red Hat, Fedora, and CentOS. Did you choose the Linux Image type? I think Default in GCP is Debian that doesn't come with Yum.
Hi sir such a good videos Sir
you are the best man!
thank for your share video,
sir plz make a video about the difference tools knowledge hadoop admin vs hadoop developer should have
We are living in the era of DevOps :-). My take is to learn as much as you can without thinking of Dev or Admin.
sir, i am nota able to install java using yum. i selected centos 6 image. the error is cannot open : jdk144.rpm skipping Nothing to do.
please help
+rajkumar p, your rpm maybe corrupt. Download it again and specify full path with yum local install.
Learning Journal Hi Sir I tried the same and still showing the same error. is there anyway that you can create public image and share it with us, if possible pls do the same.
+rajkumar p We are not allowed to create public images. However, we can create an image and share it with specific people. Is there any one willing to help?
+rajkumar p, I can try to configure the VM for you. Drop me an email for instructions (check out my contact at www.learningjournal.guru )
+rajkumar p, could you plz help me. I am also getting same issue here
Awesome !
Thank you so much..........
hello,
when i run
jupyter toree install --spark_home=$SPARK_HOME --interpreters=Scala,PySpark,SQL --user
the result
Error executing Jupyter command 'toree': [Errno 2] No such file or directory
help me!!!!!!!!!
i was install toree with pip3 install --upgrade toree
If anyone wants a docker image for this working environment you could use this docker file i created for myself. github.com/bicepjai/docker_files/tree/master/learn-spark
Jayaram Prabhu Durairaj bro can u pls help me to setup the spark vm.
how to integrate kafka with spark?
+Dinesh Chebolu I will cover that as soon as I reach to Spark streaming.
excellent
how to configure in mac any reference please.
Yes. You can also try Zeppelin. That appears to be a better option compared to jupyter for Scala.
dear sir when can i expect spark 02 video please make video as soon as possible
Soon. I am working on it.
Sir, Thank You so much
I think no need to install of toree
you just add some line into bashrc
export SPARK_HOME=/Users/akashsoni/spark
export PATH=$SPARK_HOME/bin:$PATH
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
or some line in spark-env.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Ho$
export SPARK_HOME=/Users/akashsoni/spark
export PYSPARK_PYTHON=/anaconda3/bin/python
with java path
That's pyspark. What about Scala Notebook?
P.s -I have tried for pyspark
Liked your videos . am not sure how to get free google cloud account. Can you please help ? Can I practice your sessions in Cloudera?
You can use cloudera. If that looks too heavy then download Zeppelin. It comes with embedded Spark :-)
BELOW ERROR I AM GETTING WHEN I AM TRYING TO EXTRACT IN GCP
tar (child): spark-2.2.0-bin-hadoop2.6.tgz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
Same error .. any solution prodivided >
go to the Downloads folder and right click on the tar and click on extract here
it will work
wget command for Apache Tore 0.2.0 is also gave me 404 error. It looks like that link is removed from git. Please see here:
dist.apache.org/repos/dist/dev/incubator/toree/
Actually I found the link in the anaconda site. pypi.anaconda.org/hyoon/simple/toree/
[root@localhost ~]# pyspark
/root/spark/spark-2.2.0-bin-hadoop2.7/bin/spark-class: line 71: /usr/java/jdk1.8.0_171/bin/java: No such file or directory
-------------I am hitting the above error after installing spark and jdk on the linux machine. Is this something to do with the setting the path variables? - by the way you have done a fabulous job with your videos. Thanks a Ton