Nice video and great explanation. What is the method used to arrive at a number of clusters. If it is an elbow method, how do we arrive at the number of clusters on text data. Thank you.
How to make decision on how many clusters will be the good fit for data?...Can you do elbow plot or silhouette score for this same dataset and explain....
Great video, thanks. I have a quick question though as I am getting an error that I cannot work out why. In the second code block with the line df = pd.read_csv("Movies_Dataset.csv") I get that show as ParserError and further down the result in another line which says ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 3 Have you got any advice on what is going wrong here. Thanks
Hi, are you using Google Colab for this exercise? Did you use the same dataset as I am using (Movies_Dataset.csv)? If you are using Google Colab, make sure you have the file uploaded successfully. If that's true, try df = pd.read_csv(filename,header=None,error_bad_lines=False)
@@codewithmarcus2151 Thanks for the repyly Marcus, really appreciated. Yes I was using Colab, but the issue turned out that I had managed to corrupt the CSV 😬 i did a fresh download of the data and got it to work. Quick question, could you direct me to where I could possibly learn about using K means in a semi supervised model. For instance, if I have a whole batch of phrases that I wanted to sort into pre-defined clusters. Or using the movies dataset as an example, sort them by say family, adult or child friendly clusters if you see what I mean. Thanks again for your amazing tutorials
hi, thanks for the good tutorial. I follow your step using jupyter, but get stuck at this line: f.write(data.to_csv(index_label='id')) # set index to id f.close() after run this line, the error like below: UnicodeEncodeError Traceback (most recent call last) in ----> 1 f.write(data.to_csv(index_label='id')) # set index to id 2 f.close() ~\anaconda3\lib\encodings\cp1252.py in encode(self, input, final) 17 class IncrementalEncoder(codecs.IncrementalEncoder): 18 def encode(self, input, final=False): ---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0] 20 21 class IncrementalDecoder(codecs.IncrementalDecoder): UnicodeEncodeError: 'charmap' codec can't encode characters in position 13647-13649: character maps to i skip that line, but there is no dataset for each cluster created... :(
Hi Marcus, great tutorial, thanks! I have one question: when I try to use the elbow method here to determine the optimal k like you did in the video before, I'm getting: ValueError: could not convert string to float: 'Toy Story'. The problem seems to be in this line: kmeanModel.fit(df) I would be glad if you could tell me what the code for the elbow method would look like in this specific case :)
Very useful video for me.....Thank you for sharing Marcus.
Excellent tutorial explaining all the steps. Found this very helpful. Thank you!
Thank you!
@@codewithmarcus2151 Is there a command to find the Silhouette score or intertia from this? Thank you!
Marcus, amazing explanation!! Thank you!
You Rock Marcus! Video is really helping
The first one video which really helps me, Thank you!
Hey Marcus! Thank you for this!!! I learned a lot with this video, this will be very useful for my capstone project
Hi Miyel, very happy to hear that! Feel free to explore other videos :)
Thank you Marcus! This was really helpful!
Thanks ! I have got a new idea for my project
Thanks a lot brother, take love
bro,how to calculate the accuracy score of the model say(silhouette score ) for example
Thanks sharing the source materials!
you are welcome! Feel free follow my github account github.com/MarcusChong123 for the source code of other tutorials :)
Hi there,i need to classify customer reviews into categories..suggest a method
Thanks man, saved my day
Thank you for the video! Amazing content , I did learn a lot !
didnt you have to clean the "overview" column a bit more before vectorizing it? Like making them all lower case etc etc
Hi sir, how can I write a code that remove duplicate images using clustering in google colab after the unzipping the dataset zip file?
Nice video and great explanation. What is the method used to arrive at a number of clusters. If it is an elbow method, how do we arrive at the number of clusters on text data. Thank you.
How to make decision on how many clusters will be the good fit for data?...Can you do elbow plot or silhouette score for this same dataset and explain....
you should plot elbow curve for optimal number of clusters
Thank you sir. Sir , how to visual with scatter plot in this case?🙏🙏🙏
How can I know which id belongs to which cluster?
Hi Marcus! Could you explain me the following line? print(' %s' % terms[j])
Hi Marcus, very well explained. thanks for the video. Can u make something on Analysis of categorical data without response variable.
Thank you!
hi, is there any way to then represent those clusters in the form of a dbscan diagram?
hi can you please cover the topic of heirarchical clustering for text documents ?using python ?.since i need it
to use
Great video, thanks. I have a quick question though as I am getting an error that I cannot work out why. In the second code block with the line df = pd.read_csv("Movies_Dataset.csv") I get that show as ParserError and further down the result in another line which says ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 3
Have you got any advice on what is going wrong here. Thanks
Hi, are you using Google Colab for this exercise? Did you use the same dataset as I am using (Movies_Dataset.csv)? If you are using Google Colab, make sure you have the file uploaded successfully. If that's true, try df = pd.read_csv(filename,header=None,error_bad_lines=False)
@@codewithmarcus2151 Thanks for the repyly Marcus, really appreciated. Yes I was using Colab, but the issue turned out that I had managed to corrupt the CSV 😬 i did a fresh download of the data and got it to work. Quick question, could you direct me to where I could possibly learn about using K means in a semi supervised model. For instance, if I have a whole batch of phrases that I wanted to sort into pre-defined clusters. Or using the movies dataset as an example, sort them by say family, adult or child friendly clusters if you see what I mean. Thanks again for your amazing tutorials
I have a question that I hope someone can answer.
How can I clustring on two fields, let's say the first is like "overview" and the second is "title"
hi, thanks for the good tutorial. I follow your step using jupyter, but get stuck at this line:
f.write(data.to_csv(index_label='id')) # set index to id
f.close()
after run this line, the error like below:
UnicodeEncodeError Traceback (most recent call last)
in
----> 1 f.write(data.to_csv(index_label='id')) # set index to id
2 f.close()
~\anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):
UnicodeEncodeError: 'charmap' codec can't encode characters in position 13647-13649: character maps to
i skip that line, but there is no dataset for each cluster created... :(
TOp G thanks 😊
Analysis is not complete without doing a manual review of the clusters at the end. From what you showed, it didn't look too promising.
Hi Marcus, great tutorial, thanks! I have one question: when I try to use the elbow method here to determine the optimal k like you did in the video before, I'm getting: ValueError: could not convert string to float: 'Toy Story'. The problem seems to be in this line: kmeanModel.fit(df)
I would be glad if you could tell me what the code for the elbow method would look like in this specific case :)