Understanding Your Data | Day 19 | 100 Days of Machine Learning
ฝัง
- เผยแพร่เมื่อ 23 ก.ค. 2024
- Quality data is fundamental to any data science engagement. To gain actionable insights, the appropriate data must be sourced and cleansed. Understanding Your Data is the foundational step in any data analysis, involving exploring data characteristics, patterns, and relationships to gain insights.
It is important at the beginning of a project to consider potential harms from your tool. These harms can be caused by designing for only a narrow group of users, having insufficient representation of sub-populations, or human labelers favoring a privileged group.
Machine learning discovers and generalizes patterns in the data and could, therefore, replicate bias. If a group is under-represented, the machine learning model has fewer examples to learn from, resulting in reduced accuracy for those individuals in this group.
When implementing these models at scale, it can result in a large number of biased decisions, harming a large number of people. Ensure you have evaluated risks and have techniques in place to mitigate them.
============================
Do you want to learn from me?
Check my affordable mentorship program at : learnwith.campusx.in/s/store
============================
📱 Grow with us:
CampusX' LinkedIn: / campusx-official
CampusX on Instagram for daily tips: / campusx.official
My LinkedIn: / nitish-singh-03412789
Discord: / discord
Instagram: / campusx.official
E-mail us at support@campusx.in
⌚Time Stamps⌚
00:00 - Intro
00:27 - Understanding your data
00:53 - Asking Basic Questions to your Data
03:00 - How big is the data?
03:35 - How does the data looks like?
05:02 - What is the data type of columns?
06:50 - Are there any missing values?
08:39 - How does the data looks mathematically?
10:25 - Are there any duplicate values?
11:27 - How is the Corelation between the columns?
Anyone having problem with df.corr() having Value error, try setting numeric_only = True parameter in df.corr() function which is False by default in newer updates
thank you
Thanks for the update. You saved my time:)
thank you
So what should we use
--> understanding your data (7 Questions you ask to your data)
1. 2:52 How big is the data? - df.shape
2. 3:31 How does the data look like? - df.head() and df.sample()
3. 5:01 What is the data type of each columns? - df.info()
4. 6:47 Are there any missing values? - df.isnull().sum()
5. 8:37 How does the data look mathematically? - df.describe()
6. 10:24 Are there duplicate values? - df.duplicated().sum()
7. 11:23 How is the correlation between the columns? - df.corr()
Thanks Gupta :)
Best Series for learning machine learning for the beginners and even the experienced people who want to solidify theirs skills further. Totally worth the time
Really very imp. points while starting any project. This video cleared very important aspects and guides in very good way in details.
sirrrrrr , i love you sir this is the bestest tutorialr i have found soooo simple and tooooooo effective
Totally nailed it sir. Amazing Series.. Your way of teaching absolutely amazing..
Kudos, Nitish ! Your teaching style is phenomenal, and I've learned so much from your Python and Machine Learning tutorials. As I delve deeper into my passion for data engineering, I can't help but wish for a mentor like you in this field. Your guidance would be invaluable. Any plans to explore data engineering topics?
Thanks Sir ji for this good explaination.
How can you be so underrated
Hows it even possible
this course is better than coding ninjas (ML)
You are a Good Father Of this Industry, i really understand With One Click With Your Videos, Spitting the truth always and having so Calm humanity thats makes You Legend Alaways Fly im With you ❤🥰
Amazing Explanation
Thanks
Thank you ❤️
Thanks sir
Really good explanation
Great!
Thank you very much sir
Thanku bhaiya,, sir
you can do data.describe on all columns by data.describe(include="all")
best of best video
awesome
Sir I love everyday of machine learning tutorial I can't believe this
Hey hii bro
You must be graduated now
Did this course help you to learn ML
Im in 2nd year in AI ML
And trying to complete this series
Any advice will be highly appreciated 🙏🏽❤
finished watching
Great video..
Sir one doubt that how do we determine whether our dataset is labelled or not to determine the algorithm further?
When I used the same code it works but does not fetch data
Come blank
Df.shape shows (0,7) why
In day 18: how you define the header. is there any tricks to find that. Thanks, Satyanjay
Sir can you please share the onenote book which you re using
👌
Great
First time I have seen this type of video about understanding what actually the data is and how we can understand this ❤ love this lecture
500th like from Dhananjay Yeole
Changed in version 2.0.0: The default value of numeric_only is now False.
df.corr(numeric_only=True)
thanks
Thank You, brother
🙏🙏🙏🙏
Sir one doubt, how to fetch the exact rows where duplicated function returned true value ?
Like if there are certain duplicate values existing in our dataset then how to find out which one to drop ?? The function is just returning the count of duplicate and not which rows are duplicate
drop_duplicates
@@campusx-official Thanks Sir .
what is the meaning of duplicate rows mean.... all columns are similar for those rows or some columns are similar
....... Can anyone help
sir can you provide this dataset ?
Done
done
getting error when running df.corr() function. ValueError: could not convert string to float: 'Braund, Mr. Owen Harris'
Please help
just write df.corr(numeric_only=True)
where do i get this dataset from?
From kaggle
Hi There,
just to inform you there is some error in
`
# is there correlation between cols
df.corr()
`
`ValueError: could not convert string to float: 'Braund, Mr. Owen Harris`
yes same error for me also
try setting numeric_only = True parameter in df.corr() function which is False by default in newer updates
Do u have English channel?
Sir in df.corr() error is showing "ValueError: could not convert string to float: 'Braund, Mr. Owen Harris'"
df.corr(numeric_only=True) run this code you will get the desired result
just write df.corr(numeric_only=True)
Sir, Point no 7 code df.corr() not working, its giving error, but if we use this code it works ?
numeric_df = df.select_dtypes(include=['number'])
correlation_matrix = numeric_df.corr()
print(correlation_matrix)
just write df.corr(numeric_only=True)
sir you do great
day1:date:9/1/2024
You are a Good Father Of this Industry, i really understand With One Click With Your Videos, Spitting the truth always and having so Calm humanity thats makes You Legend Alaways Fly im With you ❤🥰
Thank you sir