Netherlands Rent Prediction (Pipeline) - Data Every Day #239

Dandelion Image Classification (Transfer Learning) - Data Every Day #243

Using Machine Learning to Optimize Semiconductor Test

มายคราฟแต่ "น้ำกับลาวา" สลับกัน!?

วาทะลูกหนังขอเสนอ"แมนเชสเตอร์ ซิตี้ VS แมนเชสเตอร์ ยูไนเต็ด หลังเกม เรือใบสีฟ้าแพ้ปีศาจแดงคาบ้าน"

人是不能做到吗？#火影忍者 #家人 #佐助

Semiconductor Test Result Prediction (Imbalanced Classes) - Data Every Day

Gabriel Atkin

มุมมอง 1 618

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 11 ม.ค. 2025

ความคิดเห็น • 8

@nayansarma840 3 ปีที่แล้ว
To identify the columns with only a single value, would it have been easier just to check out the variance of that column ? A 0 variance will indicate only a single value in that column.
Also, Scikit Learn provides a library function to automate this sklearn.feature_selection.VaruanceThreshold(). However, using this function could mess up the column indexes.
@gcdatkin 3 ปีที่แล้ว ⁺¹
Very true! That's a good idea. Definitely easier than typing out that whole dictionary comprehension. Thanks for the tip! :)
I would avoid using VarianceThreshold unless performing feature selection. What you could do instead is get the variance of each column with df.var() and then check which are equal to zero:
single_valued_columns = df.columns[df.var() == 0]
df = df.drop(single_valued_columns, axis=1)
@atharvadumbre 3 ปีที่แล้ว ⁺²
Is it necessary to have 50% of both the target values , can't we just oversample the minority class and undersample the majority class so that we have our target column in 70:30 or 60:40 ratio. I think that would give more better results , correct me if I am wrong I don't have much practical experience 😅 btw love your videos , I have started watching your every upload❤️
@gcdatkin 3 ปีที่แล้ว ⁺³
I highly recommend trying it out and seeing! It may be so. The results should dictate which approach to use.
In theory, we want both classes to have equal representation so that the model is used to seeing both kinds of training examples in equal quantity, but theory does not always rule. Practice will reveal the truth.
@priyation 3 ปีที่แล้ว
If you were to find out which features are most important to predict fails, how would you do it?
@gcdatkin 3 ปีที่แล้ว ⁺¹
Great question! There are many ways to do this.
One way is to use a model with interpretability built in.
For example, with logistic regression, you get to see the actual feature contributions just by looking at the weights learned by the model (since there is only one weight per feature).
Another interpretable model would be the decision tree. Once you've built the tree, you can look at its structure to see how the model is making the predictions it's making.
Another way you can gauge feature importance is by using explanation metrics such as LIME or Shapley values.
LIME (Local Interpretable Model-Agnostic Explanations) is a way to get a sense of how your model is making its predictions by building a linear (interpretable) model that approximates your model.
Shapley values measure the marginal contributions of each feature with respect to the final output.
If any of this is confusing, please let me know! :)
@goldenmikeLeKing 3 ปีที่แล้ว ⁺¹
@@gcdatkin how would you decide which to use between those methods?
@gcdatkin 3 ปีที่แล้ว ⁺¹
Well it depends on what you are looking for in a model.
If accuracy is extremely important to you, you would be better off using a non-interpretable model, because generally, models that have high interpretability have low accuracy relative to other models.
For example, neural networks and random forests are really accurate, but it is very difficult if not impossible to interpret their results.
In this case, using explanation metrics would better suit your needs.
If accuracy is secondary and not as important as interpretability, then opting for a simple linear/logistic regression or a decision tree might be your best option.

ต่อไป

เล่นอัตโนมัติ

Netherlands Rent Prediction (Pipeline) - Data Every Day #239

Netherlands Rent Prediction (Pipeline) - Data Every Day #239

Dandelion Image Classification (Transfer Learning) - Data Every Day #243

Dandelion Image Classification (Transfer Learning) - Data Every Day #243

Using Machine Learning to Optimize Semiconductor Test

Using Machine Learning to Optimize Semiconductor Test

มายคราฟแต่ "น้ำกับลาวา" สลับกัน!?

มายคราฟแต่ "น้ำกับลาวา" สลับกัน!?

วาทะลูกหนังขอเสนอ"แมนเชสเตอร์ ซิตี้ VS แมนเชสเตอร์ ยูไนเต็ด หลังเกม เรือใบสีฟ้าแพ้ปีศาจแดงคาบ้าน"

วาทะลูกหนังขอเสนอ"แมนเชสเตอร์ ซิตี้ VS แมนเชสเตอร์ ยูไนเต็ด หลังเกม เรือใบสีฟ้าแพ้ปีศาจแดงคาบ้าน"

人是不能做到吗？#火影忍者 #家人 #佐助

人是不能做到吗？#火影忍者 #家人 #佐助

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67

Critical Heat Flux Prediction - Data Every Day #244

Critical Heat Flux Prediction - Data Every Day #244

How to handle imbalanced datasets in Python

How to handle imbalanced datasets in Python

Product Sales Prediction (Class Imbalance/Pipeline/CV) - Data Every Day #248

Product Sales Prediction (Class Imbalance/Pipeline/CV) - Data Every Day #248

Exoplanet Identification (Model Selection) - Data Every Day #236

Exoplanet Identification (Model Selection) - Data Every Day #236

🚨 YOU'RE VISUALIZING YOUR DATA WRONG. And Here's Why...

🚨 YOU'RE VISUALIZING YOUR DATA WRONG. And Here's Why...

Understanding Bias-Variance Tradeoff

Understanding Bias-Variance Tradeoff

148 - 7 techniques to work with imbalanced data for machine learning in python

148 - 7 techniques to work with imbalanced data for machine learning in python

Decision and Classification Trees, Clearly Explained!!!

Decision and Classification Trees, Clearly Explained!!!

CAT Scan Location Prediction (Deep Learning) - Data Every Day #245

CAT Scan Location Prediction (Deep Learning) - Data Every Day #245

หลอกเพื่อนจับอึ #funny #แกล้ง #แกล้งเพื่อน #อึ #เพื่อนแกล้ง #ละคร

หลอกเพื่อนจับอึ #funny #แกล้ง #แกล้งเพื่อน #อึ #เพื่อนแกล้ง #ละคร

ถ้าต้องทำ การบ้าน ตลอดชีวิต? คุณจะเลือกแบบไหน!

ถ้าต้องทำ การบ้าน ตลอดชีวิต? คุณจะเลือกแบบไหน!

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

ใครขยับไม่ได้เป็น!!

ใครขยับไม่ได้เป็น!!

ไม่มีใครรักหนูเลย #shorts #แม่สุน้องซูกัส

ไม่มีใครรักหนูเลย #shorts #แม่สุน้องซูกัส

The White Lotus Season 3 | Official Teaser | Max

The White Lotus Season 3 | Official Teaser | Max

ตรวจหวยงวดวันที่ 16 ธันวาคม 2567 พร้อมรางวัล N3 รางวัลพิเศษ รางวัล 2 ตัว : Matichon Online

ตรวจหวยงวดวันที่ 16 ธันวาคม 2567 พร้อมรางวัล N3 รางวัลพิเศษ รางวัล 2 ตัว : Matichon Online

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV