Grouping, Sorting, and Shuffling in Python Pandas (2.3)

Introduction to Pandas for Deep Learning (2.1)

Machine Learning Tutorial Python - 6: Dummy Variables & One Hot Encoding

Highlight : นายใหญ่ฉุนใคร?

Uyurken Kendimi Kurtçukların Arasında Buldum🤯😬🪱

เอก - ตาสว่าง - Live Show - The Voice Thailand 2024 - 15 Dec 2024

Encoding Categorical Values in Pandas for Keras (2.2)

Jeff Heaton

มุมมอง 12 715

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 3 ก.พ. 2025

ความคิดเห็น • 19

@japedr 3 ปีที่แล้ว ⁺¹
12:29 Shouldn't "df" be passed as an argument of the function (or replace "df" by "df1")? Otherwise, the function depends on the definition of "df" out of its scope, which I think was not intended.
Thanks for the awesome work.
@obsidiansiriusblackheart 5 ปีที่แล้ว ⁺²
6:44 What if you had 10 categories? Or 100? Would you still have to create these dummies, or is there some more efficient way to convert the categories to numerical values?
@HeatonResearch 5 ปีที่แล้ว ⁺¹
If you can find a way to order them, then you can reduce it to a single index number. You can also see if some of the dummy variables are not important (by creating them and running a feature importance report) and drop the unimportant ones.
@obsidiansiriusblackheart 5 ปีที่แล้ว
@@HeatonResearch thank you
@leassis91 2 ปีที่แล้ว
hi, in calc_smooth_mean function, how do you define weight? did you choose randomly? Thanks in advance
@Kanakapallianurag 4 ปีที่แล้ว
for the dogs and cats at 10:16, we got that value {'cat': 0.2, 'dog': 0.8} is it because {'cat': ((no of cats with y = 1 ) * 2)/ (len(index)+1), 'dog': ((no of dogs with y = 1 ) * 2)/ (len(index)+1)}
((no of dogs with y = 1 ) * 2)/ (len(index)+1), please do reply
@bobdowling6932 4 ปีที่แล้ว
A question about Z-scores: Is there any advantage in using the mean and standard deviation over, say, the median and inter-quartile range? Would your networks be vulnerable if one or more columns of initial data came from a distribution with ill-defined mean and std.var.? (Classic maths example is p.d.f.(x) = (1/π)/(1+x²).)
@hanserj169 5 ปีที่แล้ว
What if I have just a label with 2 categories? should I use dummy variables or set 0 and 1 directly in the dataset in 1 simple column (binary)? Sorry if the question is too basic, but I'm a beginner. Could I use LabelEncoder() or pd.factorize?
@hellaxxable 5 ปีที่แล้ว ⁺¹
Great video Jeff, thank you! One question about target encoding, is it possible to use target encoding when your target value is a multiclass feature? Let's say it's not only 1's and 0's but there are some 2's as well. If yes, would this change the way how encoding is applied?
@HeatonResearch 5 ปีที่แล้ว
There are several ways to do that, but the potential to target leak is so much higher. I've tried a couple of different techniques on my own but always got fairly bad overfitting. If I ever have a case where I push it deeper I may post something on target encoding for a categorical target.
@nobodyeverybody8437 4 ปีที่แล้ว
Dear Jeff, shouldn't we set the ddof in the zscore function to 1? bcz the default value is 0 and I think it affects the results a little bit, but your suggestion?
@AlokKumar-jh8wp 5 ปีที่แล้ว
Jeff could you suggest for foundation in statistics and how to apply stats in machine learning
@HeatonResearch 5 ปีที่แล้ว
For a book, I really like this one: openstax.org/details/books/introductory-statistics
@HeatonResearch 5 ปีที่แล้ว
Not free, but pretty much a bridge between stats and ML: www.amazon.com/Elements-Statistical-Learning-Prediction-Statistics/dp/0387848576
@liweigao4755 5 ปีที่แล้ว
Great video, just one question: how to handle the case when there are too many dummy variables get created? For example, if the column is for the phone models, there might be thousands of them, which consume lots of memory. Thanks!
@paulchristian1244 4 ปีที่แล้ว
Look at sklearn hashing and binary encoding
@forvm2051 5 ปีที่แล้ว ⁺¹
Get lost since "Target Encoding for Categoricals" at 8:38, don't know what the code is doing
@HeatonResearch 5 ปีที่แล้ว
That code is basically just building up a test data set. I could have also just generated the test dataset as a CSV and had the Jupyter notebook load that, but this keeps it all in one compact notebook.
@8eck 3 ปีที่แล้ว
I lost the point somewhere between cars and dogs with cats... Very confusing explanations, going over important steps too quickly. Dropping so much columns, what the point of that data then?

ต่อไป

เล่นอัตโนมัติ

Grouping, Sorting, and Shuffling in Python Pandas (2.3)

Grouping, Sorting, and Shuffling in Python Pandas (2.3)

Introduction to Pandas for Deep Learning (2.1)

Introduction to Pandas for Deep Learning (2.1)

Machine Learning Tutorial Python - 6: Dummy Variables & One Hot Encoding

Machine Learning Tutorial Python - 6: Dummy Variables & One Hot Encoding

Highlight : นายใหญ่ฉุนใคร?

Highlight : นายใหญ่ฉุนใคร?

Uyurken Kendimi Kurtçukların Arasında Buldum🤯😬🪱

Uyurken Kendimi Kurtçukların Arasında Buldum🤯😬🪱

เอก - ตาสว่าง - Live Show - The Voice Thailand 2024 - 15 Dec 2024

เอก - ตาสว่าง - Live Show - The Voice Thailand 2024 - 15 Dec 2024

🔴LIVE โหนกระแส ศึกชิงมรดก 500 ล้าน ทายาทฟ้องเด็กรับใช้ปลอมลายเซ็น

🔴LIVE โหนกระแส ศึกชิงมรดก 500 ล้าน ทายาทฟ้องเด็กรับใช้ปลอมลายเซ็น

Early Stopping in Keras to Prevent Overfitting (3.4)

Early Stopping in Keras to Prevent Overfitting (3.4)

What are Embedding Layers in Keras (11.5)

What are Embedding Layers in Keras (11.5)

Using Apply and Map in Pandas for Keras (2.4)

Using Apply and Map in Pandas for Keras (2.4)

Bayes theorem, the geometry of changing beliefs

Bayes theorem, the geometry of changing beliefs

Bootstrapping and Benchmarking Hyperparameters (5.5)

Bootstrapping and Benchmarking Hyperparameters (5.5)

Introduction to Tensorflow & Keras for Deep Learning with Python (3.2)

Introduction to Tensorflow & Keras for Deep Learning with Python (3.2)

Bayesian Hyperparameter Optimization for Keras (8.4)

Bayesian Hyperparameter Optimization for Keras (8.4)

Data Analytics for Beginners | Data Analytics Training | Data Analytics Course | Intellipaat

Data Analytics for Beginners | Data Analytics Training | Data Analytics Course | Intellipaat

AI Is Making You An Illiterate Programmer

AI Is Making You An Illiterate Programmer

ไทยพลิกแซงสิงคโปร์ 2-4! อาเซียนยกเป็นแมตช์สุดมันส์!! เหงียนชมดูไทยเล่นสนุกจริง!

ไทยพลิกแซงสิงคโปร์ 2-4! อาเซียนยกเป็นแมตช์สุดมันส์!! เหงียนชมดูไทยเล่นสนุกจริง!

HIGHLIGHTS : Singapore 2-4 Thailand | ASEAN Championship 2024 | 17.12.24

HIGHLIGHTS : Singapore 2-4 Thailand | ASEAN Championship 2024 | 17.12.24

🔴𝐋𝐈𝐕𝐄 การแข่งขัน RoV นานาชาติ AIC 2024 รอบ Swiss Stage วันที่ 9

🔴𝐋𝐈𝐕𝐄 การแข่งขัน RoV นานาชาติ AIC 2024 รอบ Swiss Stage วันที่ 9

【หนังพากย์ไทย】ยอดฝีมือสังหารนักโทษ แต่นักโทษเป็นปรมาจารย์กังฟูที่ซ่อนอยู่ เขาจัดการทั้งหมดในทันที

【หนังพากย์ไทย】ยอดฝีมือสังหารนักโทษ แต่นักโทษเป็นปรมาจารย์กังฟูที่ซ่อนอยู่ เขาจัดการทั้งหมดในทันที

ซินเดอเรลล่ากลายเป็นภรรยาของลุงสุดหล่อหลังจากคืนโรแมนติกนั้น ไม่รู้ว่าเธอได้พบกับมหาเศรษฐี

ซินเดอเรลล่ากลายเป็นภรรยาของลุงสุดหล่อหลังจากคืนโรแมนติกนั้น ไม่รู้ว่าเธอได้พบกับมหาเศรษฐี

🎄✨ Puff is saving Christmas again with his incredible baking skills! #PuffTheBaker #thatlittlepuff

🎄✨ Puff is saving Christmas again with his incredible baking skills! #PuffTheBaker #thatlittlepuff

คริสต์มาสมรณะ | Who Are You EP.7 ( Edwin )

คริสต์มาสมรณะ | Who Are You EP.7 ( Edwin )

ไฮไลท์ ฟุตบอล ASEAN MITSUBISHI ELECTRIC CUP 2024 : สิงคโปร์ พบ ไทย

ไฮไลท์ ฟุตบอล ASEAN MITSUBISHI ELECTRIC CUP 2024 : สิงคโปร์ พบ ไทย