ML with Python | Text Clustering | K-Means (Movies)

Code with Marcus

มุมมอง 21 758

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 17 ธ.ค. 2024

ความคิดเห็น • 39

@jannatulfardous5802 2 ปีที่แล้ว ⁺²
Very useful video for me.....Thank you for sharing Marcus.
@shahedmahbub9013 3 ปีที่แล้ว ⁺⁵
Excellent tutorial explaining all the steps. Found this very helpful. Thank you!
@codewithmarcus2151 3 ปีที่แล้ว ⁺¹
Thank you!
@shahedmahbub9013 3 ปีที่แล้ว
@@codewithmarcus2151 Is there a command to find the Silhouette score or intertia from this? Thank you!
@bhavyagoradia4203 2 ปีที่แล้ว ⁺²
Marcus, amazing explanation!! Thank you!
@pengchaocai2848 3 ปีที่แล้ว ⁺²
You Rock Marcus! Video is really helping
@TheSassy023 3 ปีที่แล้ว ⁺¹
The first one video which really helps me, Thank you!
@miyeeeeeel 3 ปีที่แล้ว ⁺⁴
Hey Marcus! Thank you for this!!! I learned a lot with this video, this will be very useful for my capstone project
@codewithmarcus2151 3 ปีที่แล้ว ⁺¹
Hi Miyel, very happy to hear that! Feel free to explore other videos :)
@ranjinimukhejee2786 3 ปีที่แล้ว ⁺²
Thank you Marcus! This was really helpful!
@lifetube1117 2 ปีที่แล้ว ⁺¹
Thanks ! I have got a new idea for my project
@mujammalahmed1524 3 ปีที่แล้ว ⁺¹
Thanks a lot brother, take love
@Jxxxxxxxxxxxxxxxxxxx ปีที่แล้ว ⁺¹
bro,how to calculate the accuracy score of the model say(silhouette score ) for example
@bentraje 4 ปีที่แล้ว ⁺¹
Thanks sharing the source materials!
@codewithmarcus2151 4 ปีที่แล้ว
you are welcome! Feel free follow my github account github.com/MarcusChong123 for the source code of other tutorials :)
@Saiju. ปีที่แล้ว ⁺¹
Hi there,i need to classify customer reviews into categories..suggest a method
@silentscream2808 3 ปีที่แล้ว ⁺¹
Thanks man, saved my day
@Qweasdzxc912 2 ปีที่แล้ว
Thank you for the video! Amazing content , I did learn a lot !
@mirroring_2035 3 ปีที่แล้ว ⁺¹
didnt you have to clean the "overview" column a bit more before vectorizing it? Like making them all lower case etc etc
@bjmaudioservices6134 ปีที่แล้ว
Hi sir, how can I write a code that remove duplicate images using clustering in google colab after the unzipping the dataset zip file?
@saimanohar3363 2 ปีที่แล้ว
Nice video and great explanation. What is the method used to arrive at a number of clusters. If it is an elbow method, how do we arrive at the number of clusters on text data. Thank you.
@mahiraj8522 2 ปีที่แล้ว
How to make decision on how many clusters will be the good fit for data?...Can you do elbow plot or silhouette score for this same dataset and explain....
@chandandacchufan3242 ปีที่แล้ว
you should plot elbow curve for optimal number of clusters
@josuahutagalung6961 3 ปีที่แล้ว
Thank you sir. Sir , how to visual with scatter plot in this case?🙏🙏🙏
@azeemsiddiqui3853 ปีที่แล้ว
How can I know which id belongs to which cluster?
@hugoalbert4695 2 ปีที่แล้ว
Hi Marcus! Could you explain me the following line? print(' %s' % terms[j])
@Bhaveshwari21 3 ปีที่แล้ว
Hi Marcus, very well explained. thanks for the video. Can u make something on Analysis of categorical data without response variable.
@sandyAshraf 3 ปีที่แล้ว ⁺¹
Thank you!
@ayeshaakhtar1482 3 ปีที่แล้ว
hi, is there any way to then represent those clusters in the form of a dbscan diagram?
@zahrasiraj106 3 ปีที่แล้ว
hi can you please cover the topic of heirarchical clustering for text documents ?using python ?.since i need it
to use
@richv7170 3 ปีที่แล้ว
Great video, thanks. I have a quick question though as I am getting an error that I cannot work out why. In the second code block with the line df = pd.read_csv("Movies_Dataset.csv") I get that show as ParserError and further down the result in another line which says ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 3
Have you got any advice on what is going wrong here. Thanks
@codewithmarcus2151 3 ปีที่แล้ว ⁺¹
Hi, are you using Google Colab for this exercise? Did you use the same dataset as I am using (Movies_Dataset.csv)? If you are using Google Colab, make sure you have the file uploaded successfully. If that's true, try df = pd.read_csv(filename,header=None,error_bad_lines=False)
@richv7170 3 ปีที่แล้ว
@@codewithmarcus2151 Thanks for the repyly Marcus, really appreciated. Yes I was using Colab, but the issue turned out that I had managed to corrupt the CSV 😬 i did a fresh download of the data and got it to work. Quick question, could you direct me to where I could possibly learn about using K means in a semi supervised model. For instance, if I have a whole batch of phrases that I wanted to sort into pre-defined clusters. Or using the movies dataset as an example, sort them by say family, adult or child friendly clusters if you see what I mean. Thanks again for your amazing tutorials
@mohamadjumaa2042 2 ปีที่แล้ว
I have a question that I hope someone can answer.
How can I clustring on two fields, let's say the first is like "overview" and the second is "title"
@zakiyahzainon5958 3 ปีที่แล้ว
hi, thanks for the good tutorial. I follow your step using jupyter, but get stuck at this line:
f.write(data.to_csv(index_label='id')) # set index to id
f.close()
after run this line, the error like below:
UnicodeEncodeError Traceback (most recent call last)
in
----> 1 f.write(data.to_csv(index_label='id')) # set index to id
2 f.close()
~\anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):
UnicodeEncodeError: 'charmap' codec can't encode characters in position 13647-13649: character maps to
i skip that line, but there is no dataset for each cluster created... :(
@shaikhkashif9973 ปีที่แล้ว
TOp G thanks 😊
@kent4239 ปีที่แล้ว
Analysis is not complete without doing a manual review of the clusters at the end. From what you showed, it didn't look too promising.
@leiderneinleidergarnicht 3 ปีที่แล้ว ⁺¹
Hi Marcus, great tutorial, thanks! I have one question: when I try to use the elbow method here to determine the optimal k like you did in the video before, I'm getting: ValueError: could not convert string to float: 'Toy Story'. The problem seems to be in this line: kmeanModel.fit(df)
I would be glad if you could tell me what the code for the elbow method would look like in this specific case :)

ต่อไป

เล่นอัตโนมัติ

K-Means Clustering From Scratch in Python (Mathematical)