- 6
- 20 908
Data Talkies
เข้าร่วมเมื่อ 25 ก.ย. 2021
Learn Big Data Concepts with me
What is RDD : Unveiled: let's learn in a very unique way
Let's dive into the world of RDDs in Spark with a clear, visual walkthrough backed by code examples. RDD, or Resilient Distributed Dataset, is the backbone data structure of Spark. Unlike traditional data structures that store data on a single machine, RDD takes it a step further, organizing and storing data across multiple machines. Join this tutorial for a neat and memorable exploration of RDDs, unlocking their power for large-scale distributed computing tasks in Spark. After this, you'll have a solid grasp of RDDs that you won't forget!
มุมมอง: 92
วีดีโอ
How to see the content of Parquet File MAC
มุมมอง 5Kปีที่แล้ว
Steps to download parquet-cli: 1) Install brew if not installed already brew.sh/ 2) brew install parquet-cli 3) parquet head path_to_filename
Question of the week: How to run Python Spark code using spark-submit locally Mac.
มุมมอง 703ปีที่แล้ว
In this video I will walk you through the process of running python files using spark-submit. Here is the code which I am using in .py file import sys from pyspark.sql import SparkSession from pyspark.sql.functions import count def process_sample_data(): spark = (SparkSession .builder .appName("PythonsampleCount") .getOrCreate()) # Get the data set filename from the command-line arguments sampl...
Explanation of DRIVER , EXECUTORS , CLUSTER MANAGER with real scenario example. (CLIENT MODE)
มุมมอง 269ปีที่แล้ว
This video will explain the Spark jargons like Driver, Executors, Cluster Manager in layman term, comparing with real scenario example. Please Note.. This video is considering client deployment mode.
Introduction to Apache Spark in very simple way.
มุมมอง 2952 ปีที่แล้ว
This video is on What Is Apache Spark? and why do we need it in data pipelines. It will give you background why spark came into picture , starting with giving clear definition of vertical v/s horizontal scaling and distributed environment, Hadoop components and main drawback of Hadoop. and hence will introduce spark definition to help you understand the real purpose of Spark.
Spark Installation on Mac and run Pyspark locally.
มุมมอง 14K2 ปีที่แล้ว
Downloading Apache Spark on laptop(MAC) install java (prerequisite) to run pyspark run pyspark abd check Web UI link to download SPARK: spark.apache.org/downloads.html link to download JAVA: www.oracle.com/java/technologies/downloads/ NOTE: Spark itself is written in Scala but it runs on Java Virtual Machines so you need java version 8 or newer on your laptop before running pyspark..