I followed till 32 min as I am not into ML. I just loved it. Understood univariate, bivariate. Want more videos like this. Love from India. Stay blessed.
Better than most of the big channels around there Really good explanation and project step by step Can you do other video like this using other types of Clustering like GMM and others and do a more detailed analysis and conclusions as well thank you for the time you put on this video it was super helpful
Hello.. for the kdeplot at 16:35, when I'm adding hue=df['Gender'], it is giving error The following variable cannot be assigned with wide-form data: `hue`
Amazing video, thank you a lot, i only have question in 21:52 you said that from the graph seems like there is more femal than male, how did you know, is it because the median?
Plots are two dimensional. So you can only have two-axis plots. We only view things in 3D so you can possibly add 3 dimensions which equates to 3 axes. However, thing about what you want to communicate to your audience. 2 dimensions should be adequate.
Awesome tutorial! I tried to download the dataset but I don’t where to begin. There’s an option for “raw” and “blame”. I’m new to data analytics so I would appreciate some help. Thank you very much
Worth watching and follwoing along. I completed the video and did my work alongside code. I needed more help on multivariate analysis of clustering. The last part of the video on it was not well explained. Any recoomendations or video on that @Absent Data??
Hi! thank you for this video. I have a question. I want to segment bank customers. But the data is in multiple files like accounts.csv, customer_details.csv, transactions.csv How to approach this problem when we have data in multiple files to segment the customers? Thanks Mohit
@@absentdata Ok. so basically i have to join them using any of joins like inner joins etc.? But how is it done when there are like 10-20 files? Is there any other way?
@@travelofftradition append the files that are similar like all transactional files to create a single dataset and merge these with single customer details file which should also be result if an append.
Hi, This is very helpful. I do have a question though, after df=df.drop('Customer ID'), I forgot to add the hashtag and continued on. From that point on, the Customer ID disappeared. But in your case, Customer ID value re-appear during clustering. How did that happen and how do I get it Customer ID back?
Also for the get dummies at the end if you need to force it to do an integer instead of a boolen, this worked for me: dff = pd.get_dummies(df,dtype=int,drop_first=True) dff.head()
@@absentdata Hi, thank you so much for the video, I also have a challenge with the hue I can't seem to get pass' ValueError: The following variable cannot be assigned with wide-form data: `hue', from 17m, how do I solve this please, thank you
Hi, I am new to data, Can anyone answer my question please? If the correlation showed the most correlation with Age (-0.33) and no correlation with Annual income (0.0099), would it be better to cluster by age?
Low correlation doesn't necessarily mean low similarity. Clustering can still be useful to identify patterns even with low correlation. It depends on the goals of the analysis.
sns.kdeplot(df['Annual Income (k$)'],shade =True,hue = df['Gender']); here i ValueError: The following variable cannot be assigned with wide-form data: `hue`. Can someone explain?
sns.kdeplot(x=df['Annual Income (k$)'],shade=True,hue=df['Gender']); write the code in this way, it will get resolved. I also had the same issue. Good Luck
df.groupby('Gender')['Age', 'Annual Income (k$)', 'Spending Score (1-100)'] ---> cannot subset columns with a tuple with more than one element. Use a list instead.
Is that your whole code? Because there is no aggregation function in your group by. Also you are adding two columns. So it should be df groupby('category')[['A','B']].mean()
sns.kdeplot(df['Annual Income (k$)'],shade = True,hue= df['Gender']); - ValueError: The following variable cannot be assigned with wide-form data: `hue`
The updated version of sns.kdeplot may require you to make sure you have your Gender column in longform. so you need to melt the column like this. melted_df = df.melt(id_vars='Gender', value_vars=['Annual Income (k$)']) sns.kdeplot(data=melted_df, x='value', hue='Gender', shade=True)
This is an exceptional walkthrough, especially how you vividly explain the process of visualizing the data.
Lovely Feedback! Thanks. I am glad you enjoyed it.
Thank you so much for an exceptional well explained and clear video better than what I learnt in my masters degree!
Definitely!!
I followed till 32 min as I am not into ML. I just loved it. Understood univariate, bivariate. Want more videos like this. Love from India. Stay blessed.
Thank you so much! I am glad you finished the video and understood the exploratory data analysis steps. You also stay blessed!
for i in columns:
plt.figure()
sns.kdeplot(data=df, x=i, shade=True, hue='Gender')
More usefull than hours of clas, good job 😍
Better than most of the big channels around there
Really good explanation and project step by step
Can you do other video like this using other types of Clustering like GMM and others and do a more detailed analysis and conclusions
as well thank you for the time you put on this video it was super helpful
Ibreally appreciate this. Sure I'll do more detail analysis on clustering
Hello.. for the kdeplot at 16:35, when I'm adding hue=df['Gender'], it is giving error The following variable cannot be assigned with wide-form data: `hue`
You can solve this by adding this to the code: x=df['Annual Income (k$)'], and then you put the hue and it works
@@juancamilosanchez4693 Thanks, this helped.
@@juancamilosanchez4693 this worked! Thanks!
@@juancamilosanchez4693 yeah , thanks, this worked
@@juancamilosanchez4693 why does this solve that problem?
waiting for it from a long! Thanks for uploading this great content
Glad you're enjoying the content!
Python great way to analysis awesome!!! thank you for great clip.
Pls I’m having issues when I run do.corr() at 31:11. It’s bringing value error when I run the code.
Thanks alot, It's a great efforts,, keep on going, and share more videos like this 👍🌺🌺🌺
This is gold!!! I'm upset I'm just finding your channel!!!
I am glad that you found the channel. Share it with anyone you think it will help!
Please who has idea of what happen when I run df.corr @ 31:11. Its printing value error when I run the code
Please also tell how did you implemented code autocompletion in Jupyter notebook
Amazing video, thank you a lot, i only have question in
21:52
you said that from the graph seems like there is more femal than male, how did you know, is it because the median?
The value count function will count the number of males and females to give the actual number
for those who have an error in the following formula is ---> df.corr() ------> df.corr(numeric_only=True)
Facing similar issue how to resolve it
@@rishiraj1192 He literally said it in his comment
thanks for your great explanation
Please on multi variant, when you have 4 centroids, how do I plot that because seaborn can only take two axis, x and y
Plots are two dimensional. So you can only have two-axis plots. We only view things in 3D so you can possibly add 3 dimensions which equates to 3 axes. However, thing about what you want to communicate to your audience. 2 dimensions should be adequate.
@ thank you somuch.
Learned a lot! Thank you
I'm glad to hear that. Please share it with anyone you think it helps
Awesome tutorial! I tried to download the dataset but I don’t where to begin. There’s an option for “raw” and “blame”. I’m new to data analytics so I would appreciate some help. Thank you very much
You can find the data here:
absentdata.com/data-analysis/where-to-find-data/
@@absentdata thank you so much for your quick response! I’m already doing tutorial #1 and I’m hoping to learn as much from your tutorials
What are the copyright permissions for this code? Can it be fully used without restrictions?
Its just code not an licensed applicate. You can use it freely
Worth watching and follwoing along.
I completed the video and did my work alongside code.
I needed more help on multivariate analysis of clustering. The last part of the video on it was not well explained.
Any recoomendations or video on that @Absent Data??
Hi!
thank you for this video. I have a question. I want to segment bank customers. But the data is in multiple files like accounts.csv, customer_details.csv, transactions.csv
How to approach this problem when we have data in multiple files to segment the customers?
Thanks
Mohit
You will need to merge them into a single dataset.
@@absentdata Ok. so basically i have to join them using any of joins like inner joins etc.?
But how is it done when there are like 10-20 files? Is there any other way?
@@travelofftradition append the files that are similar like all transactional files to create a single dataset and merge these with single customer details file which should also be result if an append.
list object has no attribute mean , how to fix this error
Amazing video. Kindly make more protfolio projects videos.
Thank you, I will
Hi, This is very helpful. I do have a question though, after df=df.drop('Customer ID'), I forgot to add the hashtag and continued on. From that point on, the Customer ID disappeared. But in your case, Customer ID value re-appear during clustering. How did that happen and how do I get it Customer ID back?
same question here!
The end felt a little rushed and underwhelming, but overall very instructive. Good job. =)
why on the y axis it was density and can we change it with some other parameters.
Yes you can change the variables on the x and y axis. You can also use PCA techniques also to display the data
Also for the get dummies at the end if you need to force it to do an integer instead of a boolen, this worked for me: dff = pd.get_dummies(df,dtype=int,drop_first=True)
dff.head()
insightful
Glad you found it insightful
Thank you for this awesome tutorial. Learnt a lot.
Thanks for the practice. But I got some problem when execute the n_clusters sensitivity analysis in 41:13. Do you know what the problem is?
Me too 😢
What is spending score ?
It is the score(out of 100) given to a customer by the mall authorities, based on the money spent and the behavior of the customer
The following variable cannot be assigned with wide-form data: `hue` someone can help me
sns.kdeplot(data=df, x='Annual Income (k$)', shade=True, hue='Gender')
Hi, great video.I can not understand why hue is not working in my computer. Could you please help me/
Whats your issue?
@@absentdata Hi, thank you so much for the video, I also have a challenge with the hue I can't seem to get pass' ValueError: The following variable cannot be assigned with wide-form data: `hue', from 17m, how do I solve this please, thank you
Good job!
Hi, I am new to data, Can anyone answer my question please? If the correlation showed the most correlation with Age (-0.33) and no correlation with Annual income (0.0099), would it be better to cluster by age?
Low correlation doesn't necessarily mean low similarity. Clustering can still be useful to identify patterns even with low correlation. It depends on the goals of the analysis.
@@absentdata Thanks!
i get an issue that fit_transform must get 2 arguments
try posting your code so we can see what's happening.
Loved it
Hi, is K means reliable at high dimensions?
I would say no. I would do some PCA to reduce some of your dimensions.
can i put this project on my resume?
Of course you can!
I would like to learn Data Analytics , can I get your contact to get more information from you?
www.linkedin.com/in/gaelimholland
15:00 practical &usefull
Yes loops are your friends. Saves tons of time :)
----> 3 plt.figure()
TypeError: 'module' object is not callable
please help it cant execute because of error
thank you
You're welcome
How to download dataset
check the description 😊
@@absentdata I can't see any download option in GitHub
Thnk uuuu
👍🏽👍🏽👍🏽
Hi, Loved your content! If possible please share the source code of this project
I added it in the description
everything changes in 4 year every syntax
zub salute
and you got yourself a subscriber
Welcome to the family! I am happy to earn your subscription.
sns.kdeplot(df['Annual Income (k$)'],shade =True,hue = df['Gender']); here i ValueError: The following variable cannot be assigned with wide-form data: `hue`. Can someone explain?
sns.kdeplot(x=df['Annual Income (k$)'],shade=True,hue=df['Gender']);
write the code in this way, it will get resolved. I also had the same issue.
Good Luck
@@arindambhunia9862 Thank you for this
df.groupby('Gender')['Age', 'Annual Income (k$)', 'Spending Score (1-100)'] ---> cannot subset columns with a tuple with more than one element. Use a list instead.
Is that your whole code? Because there is no aggregation function in your group by. Also you are adding two columns. So it should be df groupby('category')[['A','B']].mean()
@@absentdata i have just resolve it ----> df.groupby(['Gender'])[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']].mean() min 30:41
@@slacex Also helps with the income cluster later on : df.groupby(['Income Cluster'])[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']].mean()
sns.kdeplot(df['Annual Income (k$)'],shade = True,hue= df['Gender']); - ValueError: The following variable cannot be assigned with wide-form data: `hue`
The updated version of sns.kdeplot may require you to make sure you have your Gender column in longform. so you need to melt the column like this. melted_df = df.melt(id_vars='Gender', value_vars=['Annual Income (k$)'])
sns.kdeplot(data=melted_df, x='value', hue='Gender', shade=True)
thanks for response sir i'm your student @@absentdata