Handling skewness

A Secret Weapon for Predicting Outcomes: The Binomial Distribution

The Biggest Issues I've Faced Web Scraping (and how to fix them)

ไฮไลท์ฟุตบอล #บุนเดสลีกา 2024/25 | โวล์ฟสบวร์ก 2-3 บาเยิร์น มิวนิค | 25 ส.ค. 67

หมอหญิงทะลุมิติกลายเป็นภรรยาที่เหินห่างจากเจ้าชาย เธออยากทิ้งสามี แต่เขากลับหลงรักเธอ

ไทยมีเศรษฐีอันดับ 12 ของโลก #Shorts | Paul Pattarapon

Correcting Skewed Data with Scipy and Numpy

Christopher Pulliam, PhD

มุมมอง 7 160

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 29 ส.ค. 2024
Skewed data can adversely affect your analysis and machine learning models. In this video, I demonstrate five methods for cleaning skewed data using the NumPy and SciPy modules. The methods include taking the square root, cube root, fourth root, log, and Yeo-Johnson transform. I also showcase the effectiveness of each method by summarizing the skewness of the data after each transformation with a bar plot.

ความคิดเห็น • 31

@marcom5873 หลายเดือนก่อน ⁺²
First time I have seen your videos. This is genuinely a very good video. Very well explained and clear. I am subscribing.
The music wasn’t off putting either!
@CJP3 หลายเดือนก่อน ⁺¹
Thank you so much!!! I really appreciate it. If there’s anything you’d like to see just let me know!
@marcom5873 หลายเดือนก่อน
@@CJP3Sent you an invite on LinkedIn!
@officialscience101 ปีที่แล้ว ⁺⁴
the on-screen text is a great addition, Dr. P!
@CJP3 ปีที่แล้ว ⁺²
🙏🏽, I’ll incorporate more in upcoming videos! Thanks for the feedback!
@metinunlu_ 7 หลายเดือนก่อน ⁺¹
Thank you for the video, subscribed! TH-cam needs more quality content like this.
@user-gx6tn4wk7r 5 หลายเดือนก่อน ⁺⁴
Bro this is data science ASMR 🤤
@CJP3 5 หลายเดือนก่อน
Hahaha I didn’t mean for it to be but glad you enjoyed it (I hope) 😂
@user-mm2uc6ye9x 3 หลายเดือนก่อน ⁺¹
Amazing video
I like it's structure: motivation, overview with examples, practical advices
Thanks!
@CJP3 3 หลายเดือนก่อน
Thanks for the feedback! I’ll do more of this style!
@mushinart 11 หลายเดือนก่อน ⁺¹
Outstanding explanation, professor
@CJP3 11 หลายเดือนก่อน
Thank you so much!
@undertaker7523 หลายเดือนก่อน
So what about if we were to standardize using z-scoring? It seems like that would get largely the same impact, wouldn't it?
@nicolaslpf ปีที่แล้ว
Amazing video! I was creating a function for measuring the same you forgot to name log1p Wich is log of (x+1) really useful for right skewed data with values less than 1
@pabloagogo1 2 หลายเดือนก่อน
This is interesting. If one corrects the original skewed data, via doing these kinds of transformations, in the context of linear regression or multiple linear regression, will that not change the interpretation of the original data. Curious to know.
@CJP3 2 หลายเดือนก่อน
Perhaps, but that change may be for the better. I’d say it’s worth considering these transformation if you know you have skewed data. Many models especially linear models assume normally distributed variables.
I usually build models with and without significant preprocessing and feature scaling/engineering.
@dannybee9068 ปีที่แล้ว ⁺¹
Thank you! That was helpful!
So we basically can make the root of any power? Is there a drawbag for exploiting it , like keep increasing the n value for feature to the power of 1/n?
@CJP3 ปีที่แล้ว ⁺¹
Hi Danny! Context definitely matters. For analytical chemistry 1/n scaling is usually ok. a few downsides are that it makes the models less sensitive to potential outliers. Also its not suitable for certain distributions. Lastly, because 1/n scaling is non-linear, it can make data interpretation more difficult.
@AyahuascaDataScientist 3 หลายเดือนก่อน
Skewing doesn’t necessarily matter if you’re using XGBoost, correct? For classification or regression, that is
@CJP3 3 หลายเดือนก่อน
Exactly! Skewed data doesn’t impact all model frameworks.
@thoniasenna2330 5 หลายเดือนก่อน
SUBSCRIBED! What should one do before? Or, what's the correct order? - treating outliers, impute missing values, correct symmetry? Thanks Dr. P!
@CJP3 5 หลายเดือนก่อน
You’re not going to like the answer 😂… it depends a lot on the application. It’s first best to be aware they exist and then evaluate their impact on your outcome. For example if you’re trying to determine outlier samples - then outlier msmts wouldn’t be so bad.. maybe. Or missing values could be useful depending on the application so instead of imputing maybe you engineer a new feature.
@CJP3 5 หลายเดือนก่อน
Don’t unsubscribe after my answer! 😂 🤣
@pewkaboo ปีที่แล้ว ⁺¹
What if my data contains a lot of useful '0' values?
@CJP3 ปีที่แล้ว
Howdy! Can you explain more about the 0’s?
@pewkaboo ปีที่แล้ว
@@CJP3 it is a expenditure data where the budget column contains a lot of '0' (not null) values.
@prathambhatnagar8653 5 หลายเดือนก่อน ⁺³
please dont add background music
@CJP3 5 หลายเดือนก่อน
Thanks for the feedback. Most of the newer coding tutorials don’t have background music. Have a great day!
@AyahuascaDataScientist 3 หลายเดือนก่อน ⁺¹
I like it. Don’t listen to this hater!
@mouhsineelqesry9446 3 หลายเดือนก่อน
Bro you explain a concept, but go you need the music!! It’s distracting
@CJP3 3 หลายเดือนก่อน
I 💯 understand, they newer videos don’t have the music and the audio has a better EQ :)

ต่อไป

เล่นอัตโนมัติ

Handling skewness

Handling skewness

A Secret Weapon for Predicting Outcomes: The Binomial Distribution

A Secret Weapon for Predicting Outcomes: The Binomial Distribution

The Biggest Issues I've Faced Web Scraping (and how to fix them)

The Biggest Issues I've Faced Web Scraping (and how to fix them)

ไฮไลท์ฟุตบอล #บุนเดสลีกา 2024/25 | โวล์ฟสบวร์ก 2-3 บาเยิร์น มิวนิค | 25 ส.ค. 67

ไฮไลท์ฟุตบอล #บุนเดสลีกา 2024/25 | โวล์ฟสบวร์ก 2-3 บาเยิร์น มิวนิค | 25 ส.ค. 67

หมอหญิงทะลุมิติกลายเป็นภรรยาที่เหินห่างจากเจ้าชาย เธออยากทิ้งสามี แต่เขากลับหลงรักเธอ

หมอหญิงทะลุมิติกลายเป็นภรรยาที่เหินห่างจากเจ้าชาย เธออยากทิ้งสามี แต่เขากลับหลงรักเธอ

ไทยมีเศรษฐีอันดับ 12 ของโลก #Shorts | Paul Pattarapon

ไทยมีเศรษฐีอันดับ 12 ของโลก #Shorts | Paul Pattarapon

小舞在干嘛#斗罗大陆#唐三小舞 #天使

小舞在干嘛#斗罗大陆#唐三小舞 #天使

How Fast can Python Parse 1 Billion Rows of Data?

How Fast can Python Parse 1 Billion Rows of Data?

3 Plots for Visualizing Complex Data!

3 Plots for Visualizing Complex Data!

🚨 YOU'RE VISUALIZING YOUR DATA WRONG. And Here's Why...

🚨 YOU'RE VISUALIZING YOUR DATA WRONG. And Here's Why...

Skewness and Kurtosis : the two summary stats they never taught you

Skewness and Kurtosis : the two summary stats they never taught you

25 Nooby Pandas Coding Mistakes You Should NEVER make.

25 Nooby Pandas Coding Mistakes You Should NEVER make.

ML Was Hard Until I Learned These 5 Secrets!

ML Was Hard Until I Learned These 5 Secrets!

Skewness and Kurtosis in Statistics | What is Skewness? | Handle Skewness | Satyajit Pattnaik

Skewness and Kurtosis in Statistics | What is Skewness? | Handle Skewness | Satyajit Pattnaik

LogTransformations.1.Why Log Transformations for Parametric

LogTransformations.1.Why Log Transformations for Parametric

How to Baseline Correct Mass Spectrometry Data Using Python and Peakutils

How to Baseline Correct Mass Spectrometry Data Using Python and Peakutils

كيلي خدعت تاتسويا بطريقة احترافية هههههه!#freefire #فري_فاير #freefirefunandfurious

كيلي خدعت تاتسويا بطريقة احترافية هههههه!#freefire #فري_فاير #freefirefunandfurious

Секрет фокусника! #shorts

Секрет фокусника! #shorts

🔴Live โหนกระแส ซดตายรายวัน ยาดองมรณะ!!! ดับแล้ว 6 ราย บางคนเสี่ยงตาบอด

🔴Live โหนกระแส ซดตายรายวัน ยาดองมรณะ!!! ดับแล้ว 6 ราย บางคนเสี่ยงตาบอด

He bought this so I can drive too🥹😭 #tiktok #elsarca

He bought this so I can drive too🥹😭 #tiktok #elsarca

NR Live : ตามติด การจับสลาก แชมป์เปี้ยนส์ลีก แบบใหม่!!

NR Live : ตามติด การจับสลาก แชมป์เปี้ยนส์ลีก แบบใหม่!!

แฟนเก่า มันร้าย จำจัดคู่จิ้นขี้ขลาด อ่อนแอก็แพ้ไป แฟนฉัน4 รักวัยเรียน | ใยบัว Fun Family

แฟนเก่า มันร้าย จำจัดคู่จิ้นขี้ขลาด อ่อนแอก็แพ้ไป แฟนฉัน4 รักวัยเรียน | ใยบัว Fun Family

อร่อยจนมีคนมาขอกิน! | Homeless

อร่อยจนมีคนมาขอกิน! | Homeless

"ปัทมา" เธอรักใคร? l EP.1747 l 29 ส.ค.67 l#โหนกระแส

"ปัทมา" เธอรักใคร? l EP.1747 l 29 ส.ค.67 l#โหนกระแส