this guy just casually switched from Psychology to this and already has 100000 times a more in depth understanding than I do of DS concepts as well as coding, and I did a whole masters in DS...conclusion....im not very smart...he is very smart
Thanks to both of you Jay and Maarten, for doing such generous tutorial. Special Gratitude to Maarten, for your contribution to the computing community. WOW!
Dear Maarten, how are the topic embeddings calculated (I supposed they came from the document embeddings in Step 1?) for the Topic Similarity measure in the [visualize_heatmap] function?
The use is to 'tokenize' the learned clusters. So to get a bag-of-words representation you can see on the slide at 36:00. Which is what you need in order to apply the cTF-IDF thing, to extract topic words that represent the topic as a whole. So it has nothing to do with preparing the data for training, but with making nice topic representations of the clusters, that have been found by the cluster algorithm of choice.
@@fireworker8205 Hi bro , I'm going to Cluster & Analyze few TH-cam Vdos to clustering the Video into Various Topics. If is there any Unstructured Data Emojis Pic. then how to handel that data. First do I need remove that Unstructured data then proceed with Embedding or goinh with that all data ?
this guy just casually switched from Psychology to this and already has 100000 times a more in depth understanding than I do of DS concepts as well as coding, and I did a whole masters in DS...conclusion....im not very smart...he is very smart
Thanks to both of you Jay and Maarten, for doing such generous tutorial. Special Gratitude to Maarten, for your contribution to the computing community. WOW!
I just used bertopic in my conclusion project. Incredible framework, very versatile and the default algorithms worked very well.
Such a thoughtful speaker..
Amazing package, have used it on email topic clustering
Awesome, used it on nps, I believe future use it on medical records on any area
Really enjoyed this!
Great talk! Thank you for sharing your knowledge and work with us!
Great Session ✋
Fascinating 👍
Dear Maarten, Amazing package!
I was just working on doing something like this in Julia. I wasn’t aware that BERT was already there.
Awesome presentation, can you share please notebook as well
Suggestion for next time: classification
Amazing presentation!
Great video!
Amazing presentation. The notebook was shared ?
Does BERTopic need preprocesing like lemmatization, tokenization and removing stopwords?
Dear Maarten, how are the topic embeddings calculated (I supposed they came from the document embeddings in Step 1?) for the Topic Similarity measure in the [visualize_heatmap] function?
are there techniques to automatically label topics?
Thanks for this awesome explanation. I am a beginner in Data science field. What's the use of Count Vectorizer here?
I haven’t watched the video fully but I’m assuming that it’s used to convert words into numbers for the model to be able to train on.
@@amnahebrahim3325 I thought tf-idf is doing that
The use is to 'tokenize' the learned clusters. So to get a bag-of-words representation you can see on the slide at 36:00. Which is what you need in order to apply the cTF-IDF thing, to extract topic words that represent the topic as a whole. So it has nothing to do with preparing the data for training, but with making nice topic representations of the clusters, that have been found by the cluster algorithm of choice.
@@fireworker8205
Hi bro , I'm going to Cluster & Analyze few TH-cam Vdos to clustering the Video into Various Topics. If is there any Unstructured Data Emojis Pic. then how to handel that data. First do I need remove that Unstructured data then proceed with Embedding or goinh with that all data ?
where I can learn all of this BERTopic as mathematical procedure not computational?
No. I am not CO
k-means is poor man's analysis. It has little to no statistical reasoning for clustering. Works off heuristics 😓