BERTopic for Topic Modeling - Maarten Grootendorst - Talking Language AI Ep#1

แชร์
ฝัง

ความคิดเห็น • 28

  • @adventurerwannabe
    @adventurerwannabe ปีที่แล้ว +19

    this guy just casually switched from Psychology to this and already has 100000 times a more in depth understanding than I do of DS concepts as well as coding, and I did a whole masters in DS...conclusion....im not very smart...he is very smart

  • @TowhidIslam
    @TowhidIslam 6 หลายเดือนก่อน

    Thanks to both of you Jay and Maarten, for doing such generous tutorial. Special Gratitude to Maarten, for your contribution to the computing community. WOW!

  • @rvian4
    @rvian4 ปีที่แล้ว

    I just used bertopic in my conclusion project. Incredible framework, very versatile and the default algorithms worked very well.

  • @oostopitre
    @oostopitre 2 ปีที่แล้ว +4

    Such a thoughtful speaker..

  • @WouterSuren
    @WouterSuren 2 ปีที่แล้ว +4

    Amazing package, have used it on email topic clustering

  • @marcelosilvadasilva4547
    @marcelosilvadasilva4547 ปีที่แล้ว +2

    Awesome, used it on nps, I believe future use it on medical records on any area

  • @connor-shorten
    @connor-shorten 2 ปีที่แล้ว +4

    Really enjoyed this!

  • @MadMads-hp8ug
    @MadMads-hp8ug ปีที่แล้ว

    Great talk! Thank you for sharing your knowledge and work with us!

  • @deepakwalia9878
    @deepakwalia9878 2 ปีที่แล้ว +3

    Great Session ✋

  • @tariqnahmad
    @tariqnahmad 2 ปีที่แล้ว +3

    Fascinating 👍

  • @datacamaraderie3527
    @datacamaraderie3527 ปีที่แล้ว

    Dear Maarten, Amazing package!

  • @Stopinvadingmyhardware
    @Stopinvadingmyhardware ปีที่แล้ว

    I was just working on doing something like this in Julia. I wasn’t aware that BERT was already there.

  • @ibragimsadikov3194
    @ibragimsadikov3194 2 ปีที่แล้ว +3

    Awesome presentation, can you share please notebook as well

  • @tariqnahmad
    @tariqnahmad 2 ปีที่แล้ว +3

    Suggestion for next time: classification

  • @BernardoGarciadelRio
    @BernardoGarciadelRio 2 ปีที่แล้ว +1

    Amazing presentation!

  • @luka7626
    @luka7626 8 หลายเดือนก่อน

    Great video!

  • @guimaraesalysson
    @guimaraesalysson ปีที่แล้ว +1

    Amazing presentation. The notebook was shared ?

  • @raziehfadaei4801
    @raziehfadaei4801 8 หลายเดือนก่อน

    Does BERTopic need preprocesing like lemmatization, tokenization and removing stopwords?

  • @datacamaraderie3527
    @datacamaraderie3527 ปีที่แล้ว

    Dear Maarten, how are the topic embeddings calculated (I supposed they came from the document embeddings in Step 1?) for the Topic Similarity measure in the [visualize_heatmap] function?

  • @fernigasos3320
    @fernigasos3320 ปีที่แล้ว +3

    are there techniques to automatically label topics?

  • @ankitrohilla11
    @ankitrohilla11 ปีที่แล้ว +1

    Thanks for this awesome explanation. I am a beginner in Data science field. What's the use of Count Vectorizer here?

    • @amnahebrahim3325
      @amnahebrahim3325 ปีที่แล้ว +1

      I haven’t watched the video fully but I’m assuming that it’s used to convert words into numbers for the model to be able to train on.

    • @ankitrohilla11
      @ankitrohilla11 ปีที่แล้ว +1

      @@amnahebrahim3325 I thought tf-idf is doing that

    • @fireworker8205
      @fireworker8205 ปีที่แล้ว

      The use is to 'tokenize' the learned clusters. So to get a bag-of-words representation you can see on the slide at 36:00. Which is what you need in order to apply the cTF-IDF thing, to extract topic words that represent the topic as a whole. So it has nothing to do with preparing the data for training, but with making nice topic representations of the clusters, that have been found by the cluster algorithm of choice.

    • @Mrroy08657
      @Mrroy08657 9 หลายเดือนก่อน

      ​@@fireworker8205
      Hi bro , I'm going to Cluster & Analyze few TH-cam Vdos to clustering the Video into Various Topics. If is there any Unstructured Data Emojis Pic. then how to handel that data. First do I need remove that Unstructured data then proceed with Embedding or goinh with that all data ?

  • @PolymetricMonogon
    @PolymetricMonogon 8 หลายเดือนก่อน

    where I can learn all of this BERTopic as mathematical procedure not computational?

  • @Stopinvadingmyhardware
    @Stopinvadingmyhardware ปีที่แล้ว

    No. I am not CO

  • @alexisdamnit9012
    @alexisdamnit9012 2 หลายเดือนก่อน

    k-means is poor man's analysis. It has little to no statistical reasoning for clustering. Works off heuristics 😓