Continuous Integration for Spark Apps

Top 5 Mistakes When Writing Spark Applications

How to use Australian Accents for AI Podcasts (NotebookLM Alternative)

ไก่วิเศษ #การ์ตูน #นิทาน #cartoon

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 16 : แมนเชสเตอร์ ซิตี้ พบ แมนเชสเตอร์ ยูไนเต็ด

Live!🔴 สิงคโปร์ VS ทีมชาติไทย เชียร์สดฟุตบอลฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

Top 5 Mistakes When Writing Spark Applications

Spark Summit

มุมมอง 102 415

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 10 ก.พ. 2025

ความคิดเห็น • 39

@johnengelhart4453 8 ปีที่แล้ว ⁺¹⁹
I would love to see an example of the salting side that is missing
@35sherminator 2 ปีที่แล้ว
Thanks for superbly breaking down the mistakes and their solutions. Thanks for the excellent presentation.
@bensums 7 ปีที่แล้ว ⁺²
At 6:21 it should say divide by 1 + 0.07 not multiply by 1 - 0.07. Also, on more recent versions of Spark it's gone up from 7% to 10%.
@35sherminator 2 ปีที่แล้ว
Absolutely agree, the division is correct.
@sankarasrikrishna 3 หลายเดือนก่อน
Thanks for clarification.
@rangarajanrao1994 ปีที่แล้ว
Excellent. Best wishes.
@Machin396 4 ปีที่แล้ว ⁺²
I am new to Spark and after viewing this presentation I see there's a lot to learn. I liked it a lot, thanks!
@支那湾倭好好贱 6 ปีที่แล้ว ⁺¹
5 cores per executor did not work for us. For us, the best number is 3 for on-prem, 2 for EMR. Number larger than that gave us IO exception. You need to adjust case by case.
@kumarrohit8311 4 ปีที่แล้ว ⁺¹
Anyone noticed Sameer Farooqui clicking photos when QnA started?
Awesome guys, all of them!
@sahebbhattu 7 ปีที่แล้ว ⁺²
Hi Mark, awesome explanation regarding exe and exe mem calculations. But this is for how can we use max number of cores or exe in the environment provide to achieve max parallelism . I would like to add one more point that if we are having so much memory load to deal with, we have to trade off number of exe\cores for executor memory. That means in the case of massive memory load we may have to go with lesser number of executers ( lesser than 17 exe) and keeping higher exe mem per exe ( more than 19 gb .....Please correct me if I am wrong...Thanks.
@PizdaRusni2023 3 ปีที่แล้ว
Great
@sailpawar6164 2 ปีที่แล้ว
damn 5 years ago...i absolutely loved the presentation
engaging is a difficult job..u did great
also
is it me or anyone else..these 2 faces looks too familiar by the time video ends
@charlesli5809 9 ปีที่แล้ว
awesome sharing, great thanks
@madhavareddy3927 8 ปีที่แล้ว
Thank you guys! Done a great job..
@JoHeN1990 5 ปีที่แล้ว ⁺¹
The data quality check article mentioned in 22:52 can be found here web.archive.org/web/20181116232422/blog.cloudera.com/blog/2015/07/how-to-do-data-quality-checks-using-apache-spark-dataframes/
@VasileSurdu 6 ปีที่แล้ว ⁺³
why can't they just let them speak and end their presentation for god's sake?? was it that big of a problem letting them finish their last 2 mistakes ? lol.. the last one (caching vs persisting) was very interesting
@andriimed6408 5 ปีที่แล้ว ⁺¹
it's awesome, thanks a lot!
@gounna1795 7 ปีที่แล้ว ⁺³
Great topic, Great explanation!
@gauravkataria1 7 ปีที่แล้ว
Thanks a lot. Very helpful!
@popicf 7 ปีที่แล้ว ⁺¹
but what to do if you have only 7 node cluster with 4 cores and 8GB ram?
@dtsleite 5 ปีที่แล้ว ⁺²
What Cloudera knows about spark applications they dont even update their versions.
@vinothsmart1 6 ปีที่แล้ว
what was the tool he was talking about for Spark unit testing ?
@clayblythe 6 ปีที่แล้ว
I think he said Junit
@sakthivel021 6 ปีที่แล้ว
what will be the solution of 2G Spark Shuffle size. ?
@veereshhosagoudar875 4 ปีที่แล้ว
Limit the partitions
@veereshhosagoudar875 4 ปีที่แล้ว
Resize the partion
@kambaalayashwanth123 6 ปีที่แล้ว
what about loading small files ?
@nguyen4so9 7 ปีที่แล้ว
Very cool :) ..!
@AlexanderWhillas 6 ปีที่แล้ว ⁺¹
These are also the top reasons Spark is still relatively unpopular :-/
@Machin396 4 ปีที่แล้ว ⁺²
Really? I thought It was already popular in 2020. If not, what else is gaining attention instead?
@CRTagadiya 8 ปีที่แล้ว
awesome
@TheSmartTrendTrader 5 ปีที่แล้ว
What is that special collection to do ETL?
@letscodewithvivek5191 4 ปีที่แล้ว
I have the same question..till now i have been doing etl using df only, never used any custom collections..
@rajjad 7 ปีที่แล้ว
where are the slides?
@StuggleIsSurreal 3 ปีที่แล้ว
Spark, by itself, is not intended to handle CPU-intensive operations on your data. If you have a process against the data that requires a lot of CPU or memory resources and/or is consuming CPU time, move that process into a microservice or competing consumer pattern. This problem will bog down your data handling and prevent you from using Spark effectively.
@nakget 3 ปีที่แล้ว
How each node gets 3 executors at th-cam.com/video/WyfHUNnMutg/w-d-xo.html ?
@MisterKhash 5 ปีที่แล้ว
I can't understand what he is saying !!

ต่อไป

เล่นอัตโนมัติ

Continuous Integration for Spark Apps

Continuous Integration for Spark Apps

Top 5 Mistakes When Writing Spark Applications

Top 5 Mistakes When Writing Spark Applications

How to use Australian Accents for AI Podcasts (NotebookLM Alternative)

How to use Australian Accents for AI Podcasts (NotebookLM Alternative)

ไก่วิเศษ #การ์ตูน #นิทาน #cartoon

ไก่วิเศษ #การ์ตูน #นิทาน #cartoon

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 16 : แมนเชสเตอร์ ซิตี้ พบ แมนเชสเตอร์ ยูไนเต็ด

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 16 : แมนเชสเตอร์ ซิตี้ พบ แมนเชสเตอร์ ยูไนเต็ด

Live!🔴 สิงคโปร์ VS ทีมชาติไทย เชียร์สดฟุตบอลฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

Live!🔴 สิงคโปร์ VS ทีมชาติไทย เชียร์สดฟุตบอลฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

เอก - ตาสว่าง - Live Show - The Voice Thailand 2024 - 15 Dec 2024

เอก - ตาสว่าง - Live Show - The Voice Thailand 2024 - 15 Dec 2024

SparkSQL: A Compiler from Queries to RDDs: Spark Summit East talk by Sameer Agarwal

SparkSQL: A Compiler from Queries to RDDs: Spark Summit East talk by Sameer Agarwal

Mastering Spark Unit Testing (Ted Malaska)

Mastering Spark Unit Testing (Ted Malaska)

Deep Dive: Apache Spark Memory Management

Deep Dive: Apache Spark Memory Management

Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

System Design Interview - Step By Step Guide

System Design Interview – Step By Step Guide

Apache Spark Core-Deep Dive-Proper Optimization Daniel Tomes Databricks

Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks

Donald Knuth: Algorithms, Complexity, and The Art of Computer Programming | Lex Fridman Podcast #62

Donald Knuth: Algorithms, Complexity, and The Art of Computer Programming | Lex Fridman Podcast #62

Shuffling: What it is and why it's important

Shuffling: What it is and why it's important

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules Damji

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules Damji

Real Vs Mannequin Challenge😱

Real Vs Mannequin Challenge😱

ไฮไลท์ ฟุตบอล ASEAN MITSUBISHI ELECTRIC CUP 2024 : สิงคโปร์ พบ ไทย

ไฮไลท์ ฟุตบอล ASEAN MITSUBISHI ELECTRIC CUP 2024 : สิงคโปร์ พบ ไทย

Scum Rangers LIVE-021 ขุนให้อ้วน ฟาร์มให้เงียบ

Scum Rangers LIVE-021 ขุนให้อ้วน ฟาร์มให้เงียบ

【หนังพากย์ไทย】ยอดฝีมือสังหารนักโทษ แต่นักโทษเป็นปรมาจารย์กังฟูที่ซ่อนอยู่ เขาจัดการทั้งหมดในทันที

【หนังพากย์ไทย】ยอดฝีมือสังหารนักโทษ แต่นักโทษเป็นปรมาจารย์กังฟูที่ซ่อนอยู่ เขาจัดการทั้งหมดในทันที

总算是用上情侣手机壳了 #玩一种很新的东西 #手机壳 #情侣

总算是用上情侣手机壳了 #玩一种很新的东西 #手机壳 #情侣

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67

ไฮไลท์การแข่งขัน สิงคโปร์ 2-4 ไทย | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

ไฮไลท์การแข่งขัน สิงคโปร์ 2-4 ไทย | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

หนูกับเต้ รัก ”พี่อู๋จูน“ นะ

หนูกับเต้ รัก ”พี่อู๋จูน“ นะ