Mayank Malhotra
Mayank Malhotra
  • 23
  • 22 738
Setup Unity Catalog in Azure Databricks
In this video, we discussed how to create a Unity catalog in Azure Databricks, Prerequisites:
1. You should have an Azure account set, you can signup for free. Azure provides $200 for free for the first 30 days to explore and learn the services.
2. You should have basic knowledge of data engineering.
If you have not watched my What is Unity Catalog Video, go and check that video first to understand the basics of it.
มุมมอง: 361

วีดีโอ

Unity Catalog In Databricks | Benefits and Limitations
มุมมอง 1836 หลายเดือนก่อน
Unity Catalog is one of the main component of databricks, let's discuss the benefits, hierarchy and limitations of Unity Catalog. #databricks #unitycatalog Complete list of limitations can be found in the following link: Credits: docs.databricks.com/en/data-governance/unity-catalog/index.html#:~:text=Unity Catalog also captures lineage,help data consumers find data.
AutoLoader Vs DLT(Delta Live Tables) - Databricks Interview Questions
มุมมอง 9297 หลายเดือนก่อน
In this video we discussed about Autoloader vs DLT(Delta Live Tables) in databricks which is generally used to load/ingest the data in delta lake and also in further processing of data across different layers in databricks.
Python For Data Engineers | Python Interview Questions
มุมมอง 627 หลายเดือนก่อน
In this video, I am solving one of the interview questions asked to me during the Python round of interviews for the data engineering position. This question is asked in Service based Interview Question Problem: Find repeating character in a given input string. Data Engineering generally revolves around 4 topics: Python SQL Spark One Cloud (AWS/Azure/GCP) It can have more topics but these are t...
Cognizant Data Engineering Interview Question | Python Interview Questions
มุมมอง 3937 หลายเดือนก่อน
In this video, I am solving one of the interview questions asked to me during the Python round of interviews for the data engineering position.This is for service based company. Problem: Find the second largest number in a list. Data Engineering generally revolves around 4 topics: Python SQL Spark One Cloud (AWS/Azure/GCP) It can have more topics but these are the most asked topics. I am creati...
Ques2 | Python for Data Engineering | Data Engineering Interview Question
มุมมอง 907 หลายเดือนก่อน
List-based Python interview questions for the Data Engineering role. This question is asked in one of the interviews to me. Quite straightforward forward but I would suggest pausing the video and thinking of the solution for at least 15 minutes and then looking for the solution.
Python For Data Engineers | Python Interview Questions
มุมมอง 2467 หลายเดือนก่อน
In this video, I am solving one of the interview questions asked to me during the Python round of interviews for the data engineering position. Data Engineering generally revolves around 4 topics: Python SQL Spark One Cloud (AWS/Azure/GCP) It can have more topics but these are the most asked topics. I am creating a playlist for all 4 topics on my channel. Feel free to suggest topics to me which...
SQL Interview Questions| InEqui Join | Pepsico Data Engineering Question
มุมมอง 1637 หลายเดือนก่อน
This Video is about a SQL question which has been asked in Pepsico Data Engineering Interview. I would suggest you to see the problem and then pause the video without looking at the solution. Try it from your side for atleast 30 mins, and then come back to Solution. This will help you more in preparing for data engineer interview.
Pyspark Interview Question With Databricks | Cricket Tournament
มุมมอง 1758 หลายเดือนก่อน
This video has a pyspark Interview question which I am solving via Databricks hands-on exercise. This problem can be solved via SQL and Pyspark both, here we are solving with Pyspark and databricks and will solve via SQL way in some other video.
How I Optimized Spark Jobs| Sharing My 6 Years Learning
มุมมอง 1.4K3 ปีที่แล้ว
In this video, I am going to share my learnings of 6 years which I used to optimize spark jobs. Optimising spark tasks is one of the critical tasks of data engineers, and it has so many properties to control. I have provided all the best possible scenarios for optimizing spark jobs. Let me know if you have encountered any other issues. Let me know how you find this video and feel free to give f...
Spark Streaming Application with Flat File | Business Use Case
มุมมอง 833 ปีที่แล้ว
In my second Spark stream application, data is read from flat files and stored back to JSON after some transformation. I tried to illustrate this by using a business use case. This is a very basic beginner-level use case. We will cover many complex use cases in future videos. If you like my videos, please like, share, and subscribe. If you have not set up Spark on your machine, watch this video...
Spark Streaming Word Count| First Spark Streaming Application
มุมมอง 4503 ปีที่แล้ว
Create your first spark streaming word count application with this code. Install the netcat from this link: nmap.org/download.html. Send messages from netcat and do the processing from spark Streaming and print the count on the console. If you have not set up Spark on your machine, watch this video first: th-cam.com/video/ADvacZcnYic/w-d-xo.html Let me know how you find this video and feel free...
Setup Pycharm for Spark 3 on Windows
มุมมอง 1503 ปีที่แล้ว
Install Pycharm in your windows machine for Apache Spark 3 in less than 5 minutes. I know there are already so many tutorials available to set up pycharm, but all those are old and that is not for spark 3, so thought of uploading one in my new Windows machine. Let me know how you find this video and feel free to give feedback so that I can improve and create more meaningful and useful content f...
Spark 3 Complete Installation on Windows in Just 10 mins
มุมมอง 6163 ปีที่แล้ว
Complete Installation Of Spark 3 on windows is done in this video. I have installed Apache spark 3 on my new windows machine in just 10 minutes and I am good to go with all spark 3 projects. Five steps are involved in this: Step 1: Install JDK Step 2: Install Hadoop winutils Step 3: Download Spark Binaries Step 4: Install Python(Anaconda) Step 5: Configure environment variables Let me know how ...
Spark Session Vs Spark Context | Interview Questions
มุมมอง 3.3K3 ปีที่แล้ว
This video will explain the difference between spark session and spark context with examples in the easiest language. Watch this short video of 5 minutes which will clear all your doubts on the Spark session and Spark Context. Let me know how you find this video and feel free to give feedback so that I can improve and create more meaningful and useful content for you. My goal is to bring precis...
How I Optimized File Validation in Spark
มุมมอง 2.7K3 ปีที่แล้ว
How I Optimized File Validation in Spark
DAG Vs Lineage Practically Explained With UI| Spark Interview Questions
มุมมอง 3.4K3 ปีที่แล้ว
DAG Vs Lineage Practically Explained With UI| Spark Interview Questions
Path Of Non CSE Background To Data Engineering In 6 Months.
มุมมอง 1133 ปีที่แล้ว
Path Of Non CSE Background To Data Engineering In 6 Months.
Rdd Vs Dataframe Easily Explained| Apache Spark Interview Questions
มุมมอง 3693 ปีที่แล้ว
Rdd Vs Dataframe Easily Explained| Apache Spark Interview Questions
Cache VS Persist With Spark UI: Spark Interview Questions
มุมมอง 7603 ปีที่แล้ว
Cache VS Persist With Spark UI: Spark Interview Questions
Narrow VS Wide Transformation in Apache Spark
มุมมอง 4.3K3 ปีที่แล้ว
Narrow VS Wide Transformation in Apache Spark
Apache Spark Scala Vs Python Vs Java
มุมมอง 5464 ปีที่แล้ว
Apache Spark Scala Vs Python Vs Java
Repartition Vs Coalesce: Apache Spark Interview Questions
มุมมอง 2K4 ปีที่แล้ว
Repartition Vs Coalesce: Apache Spark Interview Questions

ความคิดเห็น

  • @Manikanta-n5v
    @Manikanta-n5v หลายเดือนก่อน

    Good explanation. Hope to see more videos on databricks. Please cover how to give table/view access to new users in unity catalog.

  • @ajithshetty1684
    @ajithshetty1684 หลายเดือนก่อน

    Is it mandatory to create workspace in pay as you go subcription? Im not able to find account console in databricks workspace

  • @srinivasanshettiyar5847
    @srinivasanshettiyar5847 4 หลายเดือนก่อน

    Nice knowledgeable content 👍 Try to make more videos on Interview based python questions for data engineer

  • @rakeshgupta-xb6qh
    @rakeshgupta-xb6qh 5 หลายเดือนก่อน

    nice explanation bhai

  • @manugupta3322
    @manugupta3322 7 หลายเดือนก่อน

    try to first explain the algorithm and then do the code

  • @stockmarketadda5509
    @stockmarketadda5509 ปีที่แล้ว

    Can you please start your use cases videos of data engineer once again. Because now a days so many guys are searching for this. But failing to find relevant videos

    • @mayankmalhotra4672
      @mayankmalhotra4672 8 หลายเดือนก่อน

      Starting again , hope you will find content regularly. Let me know your ideas on which you want me to create videos

  • @tanushreenagar3116
    @tanushreenagar3116 ปีที่แล้ว

    Nice sir 👌

  • @meghasyam427
    @meghasyam427 ปีที่แล้ว

    thanks a lot bro

  • @SohelKhan-vf3dt
    @SohelKhan-vf3dt ปีที่แล้ว

    It is only data type validation using schema

  • @shankrukulkarni3234
    @shankrukulkarni3234 ปีที่แล้ว

    not good

  • @shankrukulkarni3234
    @shankrukulkarni3234 ปีที่แล้ว

    Bro you really explained well now I am subscribing your channel. thank you

  • @gurumoorthysivakolunthu9878
    @gurumoorthysivakolunthu9878 ปีที่แล้ว

    Great explanation, Sir... Couldn't find data skewness video... Can you please create one video...

  • @gurumoorthysivakolunthu9878
    @gurumoorthysivakolunthu9878 ปีที่แล้ว

    Sir, This is Great... Yes it will be very very helpful if you make a video about using this or that...

  • @akshaykadam1260
    @akshaykadam1260 ปีที่แล้ว

    Your videos on spark are good . Make some more

  • @engineerbaaniya4846
    @engineerbaaniya4846 2 ปีที่แล้ว

    please make complete video on APache Spark Architecture

  • @krishnanandgupta5471
    @krishnanandgupta5471 2 ปีที่แล้ว

    How to do schema (Column Name and Position) validation of file?

  • @mogilipuripraveen5589
    @mogilipuripraveen5589 2 ปีที่แล้ว

    great explanation brother. Can you make a video on salting.

  • @psyche5184
    @psyche5184 2 ปีที่แล้ว

    How did you install spark in jupyter?

  • @Shivamsingh-uc1ou
    @Shivamsingh-uc1ou 2 ปีที่แล้ว

    Very nice explanation.... Please create more interview qus video series. It will be helpfull for all.

  • @kalyanababuk905
    @kalyanababuk905 2 ปีที่แล้ว

    Good explanation bro

  • @MIDHUNVLOGS
    @MIDHUNVLOGS 2 ปีที่แล้ว

    Can you attact the csv file

  • @SUFItech-rq4be
    @SUFItech-rq4be 2 ปีที่แล้ว

    hi , amazing this is exactly what i was looking for.thanks. but at my end when i type "ncat -lk 9999" and press enter, even if i type anything it doesnt show on the console ....i'd really appreciate if you could help me. i hope you reply. I need this for one of my final assesment

  • @Wasim_Raza
    @Wasim_Raza 2 ปีที่แล้ว

    Bhai itni teji me kar rahe ho. Nahi ho paya mera install. Pyspark cmd me error de raha.

  • @mathankumars896
    @mathankumars896 3 ปีที่แล้ว

    pls proide link in describition

  • @sivagssri
    @sivagssri 3 ปีที่แล้ว

    Pls do more videos... Not getting videos from you... Keep doing... Good content...

  • @RangaSwamyleela
    @RangaSwamyleela 3 ปีที่แล้ว

    Needed bro

  • @karthikk-ok5mh
    @karthikk-ok5mh 3 ปีที่แล้ว

    Very good explanation, thanks bro

  • @RangaSwamyleela
    @RangaSwamyleela 3 ปีที่แล้ว

    Much useful

  • @RangaSwamyleela
    @RangaSwamyleela 3 ปีที่แล้ว

    Super bro

  • @sudippandit9855
    @sudippandit9855 3 ปีที่แล้ว

    great great great!!

  • @praveenyadam2617
    @praveenyadam2617 3 ปีที่แล้ว

    Wonderful explanation...👍

  • @snehakalra9146
    @snehakalra9146 3 ปีที่แล้ว

    Hi. One of the executor is taking lot of time, it has less tasks than other executors but taking lot of time

    • @mayankmalhotra4672
      @mayankmalhotra4672 3 ปีที่แล้ว

      Check for the amount of data it has? It may have more data with less task.

  • @AWSoptimization
    @AWSoptimization 3 ปีที่แล้ว

    finally my search for the difference is over :). THANK YOU.

  • @thedarkknight579
    @thedarkknight579 3 ปีที่แล้ว

    Thank you 😊❤️ I have a small doubt, From one main DataFrame loaded using JSON api I'm creating 5-6 DataFrames each independent of one another, Should I cache the main DataFrame for performance gains? As it will be used 5-6 times in the following code.

    • @mayankmalhotra4672
      @mayankmalhotra4672 3 ปีที่แล้ว

      Yes you should cache it but make sure it should fit in memory.

  • @shidramayyash7392
    @shidramayyash7392 3 ปีที่แล้ว

    Nice explanation, Please do video on partitioning vs bucketing and other spark interview questions

  • @danhorus
    @danhorus 3 ปีที่แล้ว

    Great tip! Thanks for sharing :)

  • @chintudg5367
    @chintudg5367 3 ปีที่แล้ว

    Very good video, thanks for your effort

  • @Akashkumar-ei5tm
    @Akashkumar-ei5tm 3 ปีที่แล้ว

    C:\Users\akash>spark-shell Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.spark.unsafe.array.ByteArrayMethods.<clinit>(ByteArrayMethods.java:54) at org.apache.spark.internal.config.package$.<init>(package.scala:1095) at org.apache.spark.internal.config.package$.<clinit>(package.scala) at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$3(SparkSubmitArguments.scala:157) at scala.Option.orElse(Option.scala:447) at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:157) at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:115) at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:1022) at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:1022) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module @5c671d7f at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:357) at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297) at java.base/java.lang.reflect.Constructor.checkCanSetAccessible(Constructor.java:188) at java.base/java.lang.reflect.Constructor.setAccessible(Constructor.java:181) at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:56) ... 13 more

  • @user-co8oc1rm5w
    @user-co8oc1rm5w 3 ปีที่แล้ว

    thanks.

  • @gothams1195
    @gothams1195 3 ปีที่แล้ว

    More videos like these please.

    • @mayankmalhotra4672
      @mayankmalhotra4672 3 ปีที่แล้ว

      Thanks Gotham. Now I will try to publish videos every tues, thur and sat without fail.

    • @vinodhkoneti4473
      @vinodhkoneti4473 2 ปีที่แล้ว

      @@mayankmalhotra4672 How to display unique keys along with mismatch records. I could see it is populated with Null.. can you help me how to compare two dataframes, i found one video for comparision but primary key got populated with Null..so we are unable identify which primary key got mismatches

  • @salmansayyad4522
    @salmansayyad4522 3 ปีที่แล้ว

    Lots of efforts were saved by this trick, thanks a lot for such useful info.

  • @suman3316
    @suman3316 3 ปีที่แล้ว

    WHY DID YOU STOP MAKING VIDEOS...YOUR EXPLANATION IS VERY NICE PLEASE CONTINUE THE SPARK SERIES..

    • @mayankmalhotra4672
      @mayankmalhotra4672 3 ปีที่แล้ว

      Thanks for the motivation. You will see videos every Tue and Saturday now without any miss. You can suggest topics you want to see here.

    • @suman3316
      @suman3316 3 ปีที่แล้ว

      @@mayankmalhotra4672 please Try to complete Spark and Pyspark Playlists with Small Projects or Practical Understanding... Theory + practical

    • @mayankmalhotra4672
      @mayankmalhotra4672 3 ปีที่แล้ว

      Sure its on the list. you will be seeing it very soon.

  • @samhans18
    @samhans18 3 ปีที่แล้ว

    Thank you ! very well explained pls do cover more topics on spark infact if its possible prepare a series over it !! thanks again man

  • @madhu1987ful
    @madhu1987ful 3 ปีที่แล้ว

    Pls post video on narrow vs wide transformations

  • @madhu1987ful
    @madhu1987ful 3 ปีที่แล้ว

    Repartition always gives rise to new partitions. Coalesce always make use of existing partitions n adjust data into these partitions. Right?

    • @mayankmalhotra4672
      @mayankmalhotra4672 3 ปีที่แล้ว

      Correct

    • @madhu1987ful
      @madhu1987ful 3 ปีที่แล้ว

      @@mayankmalhotra4672 thanks a lot...I saw 3 videos before yours today and none of those mentioned this important point.

    • @mayankmalhotra4672
      @mayankmalhotra4672 3 ปีที่แล้ว

      Feel free to reach out to me for other clarifications

    • @GeekCoders
      @GeekCoders 3 ปีที่แล้ว

      Coalesce will always distribute evenly data

    • @mayankmalhotra4672
      @mayankmalhotra4672 3 ปีที่แล้ว

      Coalesce will merge partitions as it is without worrying about the size of the partition.

  • @riyazkhanpatan4602
    @riyazkhanpatan4602 3 ปีที่แล้ว

    You are delivering content-oriented videos. Expecting more videos from you... Thanks in Advance...!

  • @venkatasridharp6410
    @venkatasridharp6410 4 ปีที่แล้ว

    Nice initiative and subscribed looking for more such interview prep videos related to big data..Kudos to you :)