Understanding Your Data | Day 19 | 100 Days of Machine Learning

CampusX

มุมมอง 67 978

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 23 ก.ค. 2024
Quality data is fundamental to any data science engagement. To gain actionable insights, the appropriate data must be sourced and cleansed. Understanding Your Data is the foundational step in any data analysis, involving exploring data characteristics, patterns, and relationships to gain insights.
It is important at the beginning of a project to consider potential harms from your tool. These harms can be caused by designing for only a narrow group of users, having insufficient representation of sub-populations, or human labelers favoring a privileged group.
Machine learning discovers and generalizes patterns in the data and could, therefore, replicate bias. If a group is under-represented, the machine learning model has fewer examples to learn from, resulting in reduced accuracy for those individuals in this group.
When implementing these models at scale, it can result in a large number of biased decisions, harming a large number of people. Ensure you have evaluated risks and have techniques in place to mitigate them.
============================
Do you want to learn from me?
Check my affordable mentorship program at : learnwith.campusx.in/s/store
============================
📱 Grow with us:
CampusX' LinkedIn: / campusx-official
CampusX on Instagram for daily tips: / campusx.official
My LinkedIn: / nitish-singh-03412789
Discord: / discord
Instagram: / campusx.official
E-mail us at support@campusx.in
⌚Time Stamps⌚
00:00 - Intro
00:27 - Understanding your data
00:53 - Asking Basic Questions to your Data
03:00 - How big is the data?
03:35 - How does the data looks like?
05:02 - What is the data type of columns?
06:50 - Are there any missing values?
08:39 - How does the data looks mathematically?
10:25 - Are there any duplicate values?
11:27 - How is the Corelation between the columns?

ความคิดเห็น • 69

@chemistryman6053 ปีที่แล้ว ⁺¹¹
Anyone having problem with df.corr() having Value error, try setting numeric_only = True parameter in df.corr() function which is False by default in newer updates
@fayazkhan3404 ปีที่แล้ว
thank you
@learngerman773 6 หลายเดือนก่อน
Thanks for the update. You saved my time:)
@ishabhagat6020 3 หลายเดือนก่อน
thank you
@shivangpandey3507 2 หลายเดือนก่อน
So what should we use
@msgupta07 2 ปีที่แล้ว ⁺²⁸
--> understanding your data (7 Questions you ask to your data)
1. 2:52 How big is the data? - df.shape
2. 3:31 How does the data look like? - df.head() and df.sample()
3. 5:01 What is the data type of each columns? - df.info()
4. 6:47 Are there any missing values? - df.isnull().sum()
5. 8:37 How does the data look mathematically? - df.describe()
6. 10:24 Are there duplicate values? - df.duplicated().sum()
7. 11:23 How is the correlation between the columns? - df.corr()
@MrZeekkhan ปีที่แล้ว ⁺²
Thanks Gupta :)
@Naeem2460 ปีที่แล้ว ⁺⁶
Best Series for learning machine learning for the beginners and even the experienced people who want to solidify theirs skills further. Totally worth the time
@shekharkausalye 2 ปีที่แล้ว
Really very imp. points while starting any project. This video cleared very important aspects and guides in very good way in details.
@niranjanpawar20 2 ปีที่แล้ว ⁺⁸
sirrrrrr , i love you sir this is the bestest tutorialr i have found soooo simple and tooooooo effective
@infology3 7 หลายเดือนก่อน
Totally nailed it sir. Amazing Series.. Your way of teaching absolutely amazing..
@AbdurRahman-lv9ec 7 หลายเดือนก่อน
Kudos, Nitish ! Your teaching style is phenomenal, and I've learned so much from your Python and Machine Learning tutorials. As I delve deeper into my passion for data engineering, I can't help but wish for a mentor like you in this field. Your guidance would be invaluable. Any plans to explore data engineering topics?
@narendraparmar1631 7 หลายเดือนก่อน
Thanks Sir ji for this good explaination.
@user-qn7zv2in2r หลายเดือนก่อน ⁺¹
How can you be so underrated
Hows it even possible
@dheerajkumar1502 2 หลายเดือนก่อน ⁺²
this course is better than coding ninjas (ML)
@rawatbobby8883 9 หลายเดือนก่อน
You are a Good Father Of this Industry, i really understand With One Click With Your Videos, Spitting the truth always and having so Calm humanity thats makes You Legend Alaways Fly im With you ❤🥰
@BiswajitDas-lk7pp หลายเดือนก่อน ⁺¹
Amazing Explanation
@arshad1781 3 ปีที่แล้ว ⁺²
Thanks
@illusions8101 11 หลายเดือนก่อน
Thank you ❤️
@zkhan2023 3 ปีที่แล้ว ⁺²
Thanks sir
@talkswithRishabh 2 ปีที่แล้ว
Really good explanation
@AnilKumar-yk9xq ปีที่แล้ว
Great!
@grandson_f_phixis9480 หลายเดือนก่อน
Thank you very much sir
@youtubekumar8590 ปีที่แล้ว
Thanku bhaiya,, sir
@bishnuprasadsharma7160 ปีที่แล้ว ⁺¹
you can do data.describe on all columns by data.describe(include="all")
@MuhammadJunaid-yr8jd ปีที่แล้ว
best of best video
@niteshbutola8753 2 ปีที่แล้ว
awesome
@arman_shekh97 3 ปีที่แล้ว ⁺²
Sir I love everyday of machine learning tutorial I can't believe this
@Surya_Kiran_K หลายเดือนก่อน
Hey hii bro
You must be graduated now
Did this course help you to learn ML
Im in 2nd year in AI ML
And trying to complete this series
Any advice will be highly appreciated 🙏🏽❤
@sandipansarkar9211 ปีที่แล้ว
finished watching
@yashikagupta5144 ปีที่แล้ว ⁺¹
Great video..
Sir one doubt that how do we determine whether our dataset is labelled or not to determine the algorithm further?
@662adnan ปีที่แล้ว
When I used the same code it works but does not fetch data
Come blank
Df.shape shows (0,7) why
@SatyanjaySahoo 8 หลายเดือนก่อน
In day 18: how you define the header. is there any tricks to find that. Thanks, Satyanjay
@priyankbagad2832 ปีที่แล้ว
Sir can you please share the onenote book which you re using
@vijaynage3257 2 ปีที่แล้ว
👌
@imranroshan8758 7 หลายเดือนก่อน
Great
@imranroshan8758 7 หลายเดือนก่อน
First time I have seen this type of video about understanding what actually the data is and how we can understand this ❤ love this lecture
@dhananjayyeole7076 ปีที่แล้ว
500th like from Dhananjay Yeole
@rakibnsajib 7 หลายเดือนก่อน ⁺³
Changed in version 2.0.0: The default value of numeric_only is now False.
df.corr(numeric_only=True)
@ADITYAKUMAR-nq1um 4 หลายเดือนก่อน
thanks
@Paul-ij6zj 3 หลายเดือนก่อน
Thank You, brother
@ajitkumarpatel2048 ปีที่แล้ว
🙏🙏🙏🙏
@jinks3669 2 ปีที่แล้ว ⁺³
Sir one doubt, how to fetch the exact rows where duplicated function returned true value ?
Like if there are certain duplicate values existing in our dataset then how to find out which one to drop ?? The function is just returning the count of duplicate and not which rows are duplicate
@campusx-official 2 ปีที่แล้ว ⁺²
drop_duplicates
@jinks3669 2 ปีที่แล้ว
@@campusx-official Thanks Sir .
@divynashupathak9008 9 หลายเดือนก่อน
what is the meaning of duplicate rows mean.... all columns are similar for those rows or some columns are similar
....... Can anyone help
@MRAgundli 3 หลายเดือนก่อน
sir can you provide this dataset ?
@krishnakanthmacherla4431 2 ปีที่แล้ว
Done
@MRAgundli 3 หลายเดือนก่อน
done
@devendraverma9147 3 หลายเดือนก่อน
getting error when running df.corr() function. ValueError: could not convert string to float: 'Braund, Mr. Owen Harris'
Please help
@carliones4293 14 วันที่ผ่านมา
just write df.corr(numeric_only=True)
@aayushtiwari8139 14 วันที่ผ่านมา
where do i get this dataset from?
@shivamrawat966 10 วันที่ผ่านมา
From kaggle
@Dipenparmar12 7 หลายเดือนก่อน
Hi There,
just to inform you there is some error in
`
# is there correlation between cols
df.corr()
`
`ValueError: could not convert string to float: 'Braund, Mr. Owen Harris`
@atharva__soni 4 หลายเดือนก่อน ⁺¹
yes same error for me also
@atharva__soni 4 หลายเดือนก่อน ⁺¹
try setting numeric_only = True parameter in df.corr() function which is False by default in newer updates
@ashulohar8948 ปีที่แล้ว
Do u have English channel?
@mohdmaaz9037 4 หลายเดือนก่อน
Sir in df.corr() error is showing "ValueError: could not convert string to float: 'Braund, Mr. Owen Harris'"
@instrumentalmusic27 4 หลายเดือนก่อน ⁺¹
df.corr(numeric_only=True) run this code you will get the desired result
@carliones4293 14 วันที่ผ่านมา
just write df.corr(numeric_only=True)
@PyMLHub หลายเดือนก่อน
Sir, Point no 7 code df.corr() not working, its giving error, but if we use this code it works ?
numeric_df = df.select_dtypes(include=['number'])
correlation_matrix = numeric_df.corr()
print(correlation_matrix)
@carliones4293 14 วันที่ผ่านมา
just write df.corr(numeric_only=True)
@ribhubose8535 2 ปีที่แล้ว
sir you do great
@Star-xk5jp 6 หลายเดือนก่อน
day1:date:9/1/2024
@rawatbobby8883 9 หลายเดือนก่อน
You are a Good Father Of this Industry, i really understand With One Click With Your Videos, Spitting the truth always and having so Calm humanity thats makes You Legend Alaways Fly im With you ❤🥰
@heetbhatt4511 10 หลายเดือนก่อน
Thank you sir

ต่อไป

เล่นอัตโนมัติ

EDA using Univariate Analysis | Day 20 | 100 Days of Machine Learning