ML with Python | Text Clustering | K-Means (Movies)

แชร์
ฝัง
  • เผยแพร่เมื่อ 17 ธ.ค. 2024

ความคิดเห็น • 39

  • @jannatulfardous5802
    @jannatulfardous5802 2 ปีที่แล้ว +2

    Very useful video for me.....Thank you for sharing Marcus.

  • @shahedmahbub9013
    @shahedmahbub9013 3 ปีที่แล้ว +5

    Excellent tutorial explaining all the steps. Found this very helpful. Thank you!

    • @codewithmarcus2151
      @codewithmarcus2151  3 ปีที่แล้ว +1

      Thank you!

    • @shahedmahbub9013
      @shahedmahbub9013 3 ปีที่แล้ว

      @@codewithmarcus2151 Is there a command to find the Silhouette score or intertia from this? Thank you!

  • @bhavyagoradia4203
    @bhavyagoradia4203 2 ปีที่แล้ว +2

    Marcus, amazing explanation!! Thank you!

  • @pengchaocai2848
    @pengchaocai2848 3 ปีที่แล้ว +2

    You Rock Marcus! Video is really helping

  • @TheSassy023
    @TheSassy023 3 ปีที่แล้ว +1

    The first one video which really helps me, Thank you!

  • @miyeeeeeel
    @miyeeeeeel 3 ปีที่แล้ว +4

    Hey Marcus! Thank you for this!!! I learned a lot with this video, this will be very useful for my capstone project

    • @codewithmarcus2151
      @codewithmarcus2151  3 ปีที่แล้ว +1

      Hi Miyel, very happy to hear that! Feel free to explore other videos :)

  • @ranjinimukhejee2786
    @ranjinimukhejee2786 3 ปีที่แล้ว +2

    Thank you Marcus! This was really helpful!

  • @lifetube1117
    @lifetube1117 2 ปีที่แล้ว +1

    Thanks ! I have got a new idea for my project

  • @mujammalahmed1524
    @mujammalahmed1524 3 ปีที่แล้ว +1

    Thanks a lot brother, take love

  • @Jxxxxxxxxxxxxxxxxxxx
    @Jxxxxxxxxxxxxxxxxxxx ปีที่แล้ว +1

    bro,how to calculate the accuracy score of the model say(silhouette score ) for example

  • @bentraje
    @bentraje 4 ปีที่แล้ว +1

    Thanks sharing the source materials!

    • @codewithmarcus2151
      @codewithmarcus2151  4 ปีที่แล้ว

      you are welcome! Feel free follow my github account github.com/MarcusChong123 for the source code of other tutorials :)

  • @Saiju.
    @Saiju. ปีที่แล้ว +1

    Hi there,i need to classify customer reviews into categories..suggest a method

  • @silentscream2808
    @silentscream2808 3 ปีที่แล้ว +1

    Thanks man, saved my day

  • @Qweasdzxc912
    @Qweasdzxc912 2 ปีที่แล้ว

    Thank you for the video! Amazing content , I did learn a lot !

  • @mirroring_2035
    @mirroring_2035 3 ปีที่แล้ว +1

    didnt you have to clean the "overview" column a bit more before vectorizing it? Like making them all lower case etc etc

  • @bjmaudioservices6134
    @bjmaudioservices6134 ปีที่แล้ว

    Hi sir, how can I write a code that remove duplicate images using clustering in google colab after the unzipping the dataset zip file?

  • @saimanohar3363
    @saimanohar3363 2 ปีที่แล้ว

    Nice video and great explanation. What is the method used to arrive at a number of clusters. If it is an elbow method, how do we arrive at the number of clusters on text data. Thank you.

  • @mahiraj8522
    @mahiraj8522 2 ปีที่แล้ว

    How to make decision on how many clusters will be the good fit for data?...Can you do elbow plot or silhouette score for this same dataset and explain....

  • @chandandacchufan3242
    @chandandacchufan3242 ปีที่แล้ว

    you should plot elbow curve for optimal number of clusters

  • @josuahutagalung6961
    @josuahutagalung6961 3 ปีที่แล้ว

    Thank you sir. Sir , how to visual with scatter plot in this case?🙏🙏🙏

  • @azeemsiddiqui3853
    @azeemsiddiqui3853 ปีที่แล้ว

    How can I know which id belongs to which cluster?

  • @hugoalbert4695
    @hugoalbert4695 2 ปีที่แล้ว

    Hi Marcus! Could you explain me the following line? print(' %s' % terms[j])

  • @Bhaveshwari21
    @Bhaveshwari21 3 ปีที่แล้ว

    Hi Marcus, very well explained. thanks for the video. Can u make something on Analysis of categorical data without response variable.

  • @sandyAshraf
    @sandyAshraf 3 ปีที่แล้ว +1

    Thank you!

  • @ayeshaakhtar1482
    @ayeshaakhtar1482 3 ปีที่แล้ว

    hi, is there any way to then represent those clusters in the form of a dbscan diagram?

  • @zahrasiraj106
    @zahrasiraj106 3 ปีที่แล้ว

    hi can you please cover the topic of heirarchical clustering for text documents ?using python ?.since i need it
    to use

  • @richv7170
    @richv7170 3 ปีที่แล้ว

    Great video, thanks. I have a quick question though as I am getting an error that I cannot work out why. In the second code block with the line df = pd.read_csv("Movies_Dataset.csv") I get that show as ParserError and further down the result in another line which says ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 3
    Have you got any advice on what is going wrong here. Thanks

    • @codewithmarcus2151
      @codewithmarcus2151  3 ปีที่แล้ว +1

      Hi, are you using Google Colab for this exercise? Did you use the same dataset as I am using (Movies_Dataset.csv)? If you are using Google Colab, make sure you have the file uploaded successfully. If that's true, try df = pd.read_csv(filename,header=None,error_bad_lines=False)

    • @richv7170
      @richv7170 3 ปีที่แล้ว

      @@codewithmarcus2151 Thanks for the repyly Marcus, really appreciated. Yes I was using Colab, but the issue turned out that I had managed to corrupt the CSV 😬 i did a fresh download of the data and got it to work. Quick question, could you direct me to where I could possibly learn about using K means in a semi supervised model. For instance, if I have a whole batch of phrases that I wanted to sort into pre-defined clusters. Or using the movies dataset as an example, sort them by say family, adult or child friendly clusters if you see what I mean. Thanks again for your amazing tutorials

  • @mohamadjumaa2042
    @mohamadjumaa2042 2 ปีที่แล้ว

    I have a question that I hope someone can answer.
    How can I clustring on two fields, let's say the first is like "overview" and the second is "title"

  • @zakiyahzainon5958
    @zakiyahzainon5958 3 ปีที่แล้ว

    hi, thanks for the good tutorial. I follow your step using jupyter, but get stuck at this line:
    f.write(data.to_csv(index_label='id')) # set index to id
    f.close()
    after run this line, the error like below:
    UnicodeEncodeError Traceback (most recent call last)
    in
    ----> 1 f.write(data.to_csv(index_label='id')) # set index to id
    2 f.close()
    ~\anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
    17 class IncrementalEncoder(codecs.IncrementalEncoder):
    18 def encode(self, input, final=False):
    ---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
    20
    21 class IncrementalDecoder(codecs.IncrementalDecoder):
    UnicodeEncodeError: 'charmap' codec can't encode characters in position 13647-13649: character maps to
    i skip that line, but there is no dataset for each cluster created... :(

  • @shaikhkashif9973
    @shaikhkashif9973 ปีที่แล้ว

    TOp G thanks 😊

  • @kent4239
    @kent4239 ปีที่แล้ว

    Analysis is not complete without doing a manual review of the clusters at the end. From what you showed, it didn't look too promising.

  • @leiderneinleidergarnicht
    @leiderneinleidergarnicht 3 ปีที่แล้ว +1

    Hi Marcus, great tutorial, thanks! I have one question: when I try to use the elbow method here to determine the optimal k like you did in the video before, I'm getting: ValueError: could not convert string to float: 'Toy Story'. The problem seems to be in this line: kmeanModel.fit(df)
    I would be glad if you could tell me what the code for the elbow method would look like in this specific case :)