- 23
- 22 738
Mayank Malhotra
India
เข้าร่วมเมื่อ 4 ธ.ค. 2011
"Be the Senior you needed when you were a Junior" said by a wise man.
Difficulties which I have faced while learning Big data technologies during my early years, I want that information to be available on this channel.
I will post the following Content on this channel:
- Interview Questions - With this you get some real questions asked in an Interview.
- Real-life Projects - That is very important and not available in any of the channels (even paid), I will give you real-life projects and try to simulate the environment as well. It is a huge task and I am already working towards the direction, projects will be available very soon so stay tuned and subscribed.
- Fundamentals - That's become very important, most of the interviews (almost 99.99%) require fundamentals only.
Always feel free to connect me over Linkedin for any suggestion and collaboration.
www.linkedin.com/in/mayank-malhotra-9b8987a6/
Difficulties which I have faced while learning Big data technologies during my early years, I want that information to be available on this channel.
I will post the following Content on this channel:
- Interview Questions - With this you get some real questions asked in an Interview.
- Real-life Projects - That is very important and not available in any of the channels (even paid), I will give you real-life projects and try to simulate the environment as well. It is a huge task and I am already working towards the direction, projects will be available very soon so stay tuned and subscribed.
- Fundamentals - That's become very important, most of the interviews (almost 99.99%) require fundamentals only.
Always feel free to connect me over Linkedin for any suggestion and collaboration.
www.linkedin.com/in/mayank-malhotra-9b8987a6/
Setup Unity Catalog in Azure Databricks
In this video, we discussed how to create a Unity catalog in Azure Databricks, Prerequisites:
1. You should have an Azure account set, you can signup for free. Azure provides $200 for free for the first 30 days to explore and learn the services.
2. You should have basic knowledge of data engineering.
If you have not watched my What is Unity Catalog Video, go and check that video first to understand the basics of it.
1. You should have an Azure account set, you can signup for free. Azure provides $200 for free for the first 30 days to explore and learn the services.
2. You should have basic knowledge of data engineering.
If you have not watched my What is Unity Catalog Video, go and check that video first to understand the basics of it.
มุมมอง: 361
วีดีโอ
Unity Catalog In Databricks | Benefits and Limitations
มุมมอง 1836 หลายเดือนก่อน
Unity Catalog is one of the main component of databricks, let's discuss the benefits, hierarchy and limitations of Unity Catalog. #databricks #unitycatalog Complete list of limitations can be found in the following link: Credits: docs.databricks.com/en/data-governance/unity-catalog/index.html#:~:text=Unity Catalog also captures lineage,help data consumers find data.
AutoLoader Vs DLT(Delta Live Tables) - Databricks Interview Questions
มุมมอง 9297 หลายเดือนก่อน
In this video we discussed about Autoloader vs DLT(Delta Live Tables) in databricks which is generally used to load/ingest the data in delta lake and also in further processing of data across different layers in databricks.
Python For Data Engineers | Python Interview Questions
มุมมอง 627 หลายเดือนก่อน
In this video, I am solving one of the interview questions asked to me during the Python round of interviews for the data engineering position. This question is asked in Service based Interview Question Problem: Find repeating character in a given input string. Data Engineering generally revolves around 4 topics: Python SQL Spark One Cloud (AWS/Azure/GCP) It can have more topics but these are t...
Cognizant Data Engineering Interview Question | Python Interview Questions
มุมมอง 3937 หลายเดือนก่อน
In this video, I am solving one of the interview questions asked to me during the Python round of interviews for the data engineering position.This is for service based company. Problem: Find the second largest number in a list. Data Engineering generally revolves around 4 topics: Python SQL Spark One Cloud (AWS/Azure/GCP) It can have more topics but these are the most asked topics. I am creati...
Ques2 | Python for Data Engineering | Data Engineering Interview Question
มุมมอง 907 หลายเดือนก่อน
List-based Python interview questions for the Data Engineering role. This question is asked in one of the interviews to me. Quite straightforward forward but I would suggest pausing the video and thinking of the solution for at least 15 minutes and then looking for the solution.
Python For Data Engineers | Python Interview Questions
มุมมอง 2467 หลายเดือนก่อน
In this video, I am solving one of the interview questions asked to me during the Python round of interviews for the data engineering position. Data Engineering generally revolves around 4 topics: Python SQL Spark One Cloud (AWS/Azure/GCP) It can have more topics but these are the most asked topics. I am creating a playlist for all 4 topics on my channel. Feel free to suggest topics to me which...
SQL Interview Questions| InEqui Join | Pepsico Data Engineering Question
มุมมอง 1637 หลายเดือนก่อน
This Video is about a SQL question which has been asked in Pepsico Data Engineering Interview. I would suggest you to see the problem and then pause the video without looking at the solution. Try it from your side for atleast 30 mins, and then come back to Solution. This will help you more in preparing for data engineer interview.
Pyspark Interview Question With Databricks | Cricket Tournament
มุมมอง 1758 หลายเดือนก่อน
This video has a pyspark Interview question which I am solving via Databricks hands-on exercise. This problem can be solved via SQL and Pyspark both, here we are solving with Pyspark and databricks and will solve via SQL way in some other video.
How I Optimized Spark Jobs| Sharing My 6 Years Learning
มุมมอง 1.4K3 ปีที่แล้ว
In this video, I am going to share my learnings of 6 years which I used to optimize spark jobs. Optimising spark tasks is one of the critical tasks of data engineers, and it has so many properties to control. I have provided all the best possible scenarios for optimizing spark jobs. Let me know if you have encountered any other issues. Let me know how you find this video and feel free to give f...
Spark Streaming Application with Flat File | Business Use Case
มุมมอง 833 ปีที่แล้ว
In my second Spark stream application, data is read from flat files and stored back to JSON after some transformation. I tried to illustrate this by using a business use case. This is a very basic beginner-level use case. We will cover many complex use cases in future videos. If you like my videos, please like, share, and subscribe. If you have not set up Spark on your machine, watch this video...
Spark Streaming Word Count| First Spark Streaming Application
มุมมอง 4503 ปีที่แล้ว
Create your first spark streaming word count application with this code. Install the netcat from this link: nmap.org/download.html. Send messages from netcat and do the processing from spark Streaming and print the count on the console. If you have not set up Spark on your machine, watch this video first: th-cam.com/video/ADvacZcnYic/w-d-xo.html Let me know how you find this video and feel free...
Setup Pycharm for Spark 3 on Windows
มุมมอง 1503 ปีที่แล้ว
Install Pycharm in your windows machine for Apache Spark 3 in less than 5 minutes. I know there are already so many tutorials available to set up pycharm, but all those are old and that is not for spark 3, so thought of uploading one in my new Windows machine. Let me know how you find this video and feel free to give feedback so that I can improve and create more meaningful and useful content f...
Spark 3 Complete Installation on Windows in Just 10 mins
มุมมอง 6163 ปีที่แล้ว
Complete Installation Of Spark 3 on windows is done in this video. I have installed Apache spark 3 on my new windows machine in just 10 minutes and I am good to go with all spark 3 projects. Five steps are involved in this: Step 1: Install JDK Step 2: Install Hadoop winutils Step 3: Download Spark Binaries Step 4: Install Python(Anaconda) Step 5: Configure environment variables Let me know how ...
Spark Session Vs Spark Context | Interview Questions
มุมมอง 3.3K3 ปีที่แล้ว
This video will explain the difference between spark session and spark context with examples in the easiest language. Watch this short video of 5 minutes which will clear all your doubts on the Spark session and Spark Context. Let me know how you find this video and feel free to give feedback so that I can improve and create more meaningful and useful content for you. My goal is to bring precis...
How I Optimized File Validation in Spark
มุมมอง 2.7K3 ปีที่แล้ว
How I Optimized File Validation in Spark
DAG Vs Lineage Practically Explained With UI| Spark Interview Questions
มุมมอง 3.4K3 ปีที่แล้ว
DAG Vs Lineage Practically Explained With UI| Spark Interview Questions
Path Of Non CSE Background To Data Engineering In 6 Months.
มุมมอง 1133 ปีที่แล้ว
Path Of Non CSE Background To Data Engineering In 6 Months.
Rdd Vs Dataframe Easily Explained| Apache Spark Interview Questions
มุมมอง 3693 ปีที่แล้ว
Rdd Vs Dataframe Easily Explained| Apache Spark Interview Questions
Cache VS Persist With Spark UI: Spark Interview Questions
มุมมอง 7603 ปีที่แล้ว
Cache VS Persist With Spark UI: Spark Interview Questions
Narrow VS Wide Transformation in Apache Spark
มุมมอง 4.3K3 ปีที่แล้ว
Narrow VS Wide Transformation in Apache Spark
Repartition Vs Coalesce: Apache Spark Interview Questions
มุมมอง 2K4 ปีที่แล้ว
Repartition Vs Coalesce: Apache Spark Interview Questions
Good explanation. Hope to see more videos on databricks. Please cover how to give table/view access to new users in unity catalog.
Is it mandatory to create workspace in pay as you go subcription? Im not able to find account console in databricks workspace
Nice knowledgeable content 👍 Try to make more videos on Interview based python questions for data engineer
nice explanation bhai
try to first explain the algorithm and then do the code
Sure, noted. Thanks
Can you please start your use cases videos of data engineer once again. Because now a days so many guys are searching for this. But failing to find relevant videos
Starting again , hope you will find content regularly. Let me know your ideas on which you want me to create videos
Nice sir 👌
thanks a lot bro
It is only data type validation using schema
not good
Bro you really explained well now I am subscribing your channel. thank you
Great explanation, Sir... Couldn't find data skewness video... Can you please create one video...
Sir, This is Great... Yes it will be very very helpful if you make a video about using this or that...
Your videos on spark are good . Make some more
please make complete video on APache Spark Architecture
How to do schema (Column Name and Position) validation of file?
great explanation brother. Can you make a video on salting.
How did you install spark in jupyter?
Very nice explanation.... Please create more interview qus video series. It will be helpfull for all.
Good explanation bro
Can you attact the csv file
hi , amazing this is exactly what i was looking for.thanks. but at my end when i type "ncat -lk 9999" and press enter, even if i type anything it doesnt show on the console ....i'd really appreciate if you could help me. i hope you reply. I need this for one of my final assesment
Bhai itni teji me kar rahe ho. Nahi ho paya mera install. Pyspark cmd me error de raha.
pls proide link in describition
Pls do more videos... Not getting videos from you... Keep doing... Good content...
Needed bro
Very good explanation, thanks bro
Much useful
Super bro
great great great!!
Wonderful explanation...👍
Hi. One of the executor is taking lot of time, it has less tasks than other executors but taking lot of time
Check for the amount of data it has? It may have more data with less task.
finally my search for the difference is over :). THANK YOU.
Thank you 😊❤️ I have a small doubt, From one main DataFrame loaded using JSON api I'm creating 5-6 DataFrames each independent of one another, Should I cache the main DataFrame for performance gains? As it will be used 5-6 times in the following code.
Yes you should cache it but make sure it should fit in memory.
Nice explanation, Please do video on partitioning vs bucketing and other spark interview questions
Sure.
Great tip! Thanks for sharing :)
Very good video, thanks for your effort
C:\Users\akash>spark-shell Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.spark.unsafe.array.ByteArrayMethods.<clinit>(ByteArrayMethods.java:54) at org.apache.spark.internal.config.package$.<init>(package.scala:1095) at org.apache.spark.internal.config.package$.<clinit>(package.scala) at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$3(SparkSubmitArguments.scala:157) at scala.Option.orElse(Option.scala:447) at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:157) at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:115) at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:1022) at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:1022) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module @5c671d7f at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:357) at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297) at java.base/java.lang.reflect.Constructor.checkCanSetAccessible(Constructor.java:188) at java.base/java.lang.reflect.Constructor.setAccessible(Constructor.java:181) at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:56) ... 13 more
thanks.
More videos like these please.
Thanks Gotham. Now I will try to publish videos every tues, thur and sat without fail.
@@mayankmalhotra4672 How to display unique keys along with mismatch records. I could see it is populated with Null.. can you help me how to compare two dataframes, i found one video for comparision but primary key got populated with Null..so we are unable identify which primary key got mismatches
Lots of efforts were saved by this trick, thanks a lot for such useful info.
Thanks for appreciating content :)
WHY DID YOU STOP MAKING VIDEOS...YOUR EXPLANATION IS VERY NICE PLEASE CONTINUE THE SPARK SERIES..
Thanks for the motivation. You will see videos every Tue and Saturday now without any miss. You can suggest topics you want to see here.
@@mayankmalhotra4672 please Try to complete Spark and Pyspark Playlists with Small Projects or Practical Understanding... Theory + practical
Sure its on the list. you will be seeing it very soon.
Thank you ! very well explained pls do cover more topics on spark infact if its possible prepare a series over it !! thanks again man
Pls post video on narrow vs wide transformations
Sure.
Repartition always gives rise to new partitions. Coalesce always make use of existing partitions n adjust data into these partitions. Right?
Correct
@@mayankmalhotra4672 thanks a lot...I saw 3 videos before yours today and none of those mentioned this important point.
Feel free to reach out to me for other clarifications
Coalesce will always distribute evenly data
Coalesce will merge partitions as it is without worrying about the size of the partition.
You are delivering content-oriented videos. Expecting more videos from you... Thanks in Advance...!
More to come!
Nice initiative and subscribed looking for more such interview prep videos related to big data..Kudos to you :)
Thanks a ton