Sentence Transformers: Sentence Embedding, Sentence Similarity, Semantic Search and Clustering |Code

Pradip Nichite

มุมมอง 28 059

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 2 ก.พ. 2025

ความคิดเห็น • 79

@FutureSmartAI ปีที่แล้ว ⁺²
📌 Hey everyone! Enjoying these NLP tutorials? Check out my other project, AI Demos, for quick 1-2 min AI tool demos! 🤖🚀
🔗 TH-cam: www.youtube.com/@aidemos.futuresmart
We aim to educate and inform you about AI's incredible possibilities. Don't miss our AI Demos TH-cam channel and website for amazing demos!
🌐 AI Demos Website: www.aidemos.com/
Subscribe to AI Demos and explore the future of AI with us!
@AzertAzert-nw4ze 11 หลายเดือนก่อน
Ghvtyu
@kyoungd ปีที่แล้ว ⁺²
This is an amazing video. I love how you walk me through, step by step. I love how this video gets into the meat of the problem and solution rather than talking endlessly about this and that. Straight to the point, and tons of useful and practical information that I can apply right away.
@FutureSmartAI ปีที่แล้ว
Thank You very much 🙏.Hope you find other videos also useful.
@martinmolina-sx4be ปีที่แล้ว ⁺¹
All the concepts were clearly explained, thanks for the video! 🙌
@FutureSmartAI ปีที่แล้ว
Glad it was helpful!
@Munk-tt6tz 9 หลายเดือนก่อน
That's exactly what I needed. Huge thanks Pradip!
@balag3611 ปีที่แล้ว ⁺²
Thanks for explaining this concept.This video is really helpful for project
@FutureSmartAI ปีที่แล้ว
Glad it was helpful!
@bhusanchettri8594 ปีที่แล้ว ⁺¹
Nicely compiled. Great work!
@prajithkumar432 2 ปีที่แล้ว ⁺¹
Very helpful video on embeddings Pradip. Keep it going👏👏👏
@FutureSmartAI 2 ปีที่แล้ว
Thank you, I will
@HuggingFace 2 ปีที่แล้ว ⁺³
Cool video! 🤗
@FutureSmartAI 2 ปีที่แล้ว
Thanks for the visit 🤗
@ShaikhMushfikurRahman ปีที่แล้ว ⁺¹
Just amazing! Salute man!
@LearningWorldChatGPT ปีที่แล้ว ⁺¹
Fantastic video!
Thanks a lot for the explanation
@FutureSmartAI ปีที่แล้ว ⁺¹
You're welcome! 🤗
@ShaileshSarda-m6z 6 หลายเดือนก่อน ⁺¹
Easy Interpretation!! Kudos
@phongd5929 ปีที่แล้ว ⁺²
I'm a starter in this wide range of ML and very impressed about your presentation. If you have a chance, can you make a video about predicting name tag from some alphabets. For example, searching FBI, it will return Federal Bureau of Investigation, etc
@BD_warriors 7 หลายเดือนก่อน ⁺¹
Great!!!! Super!!!!
@Ashesoftheliving 2 ปีที่แล้ว ⁺¹
Wonderful lesson!
@FutureSmartAI 2 ปีที่แล้ว
Glad you liked it!
@AndresVeraF ปีที่แล้ว ⁺¹
thanks you! very good explaination
@FutureSmartAI ปีที่แล้ว
You are welcome!
@kittu999c 9 หลายเดือนก่อน
Great content!!
@Raaj_ML 9 หลายเดือนก่อน
Pradip, can you please explain how you narrow down a particular model from all others ? Like how or why did you pick up this particular mfaq model for semantic search of query ?
@iamfavoured9142 หลายเดือนก่อน ⁺¹
Thank you so much
@FutureSmartAI หลายเดือนก่อน
You're welcome!
@veronicanatividade ปีที่แล้ว
OMG, man! Thank you!!
@FutureSmartAI ปีที่แล้ว
You're welcome!
@duetplay4551 2 ปีที่แล้ว ⁺¹
Embedding/similarity can only be applied between sentences? What if paragraph to paragraph? essay to essay? Thx
@FutureSmartAI 2 ปีที่แล้ว ⁺¹
Embedding can be calculated for paragraph and also for big documents.
Sometime models will have their input token limit in the case you need to break document into smaller paragraphs and then calculate embedding.
@duetplay4551 2 ปีที่แล้ว
@@FutureSmartAI Thank you, sir! I will try it out. Is there any particular model you suggest to start with?
Thanks again.
@duetplay4551 2 ปีที่แล้ว ⁺¹
q about the clustering case you gave the last. Is there a default criterion of similarity score to group the sentence? Which factor(s) sort the sentences together behind the scene? I mean some groups have only 2 sentences and some have 4 or 5. Thx
@FutureSmartAI 2 ปีที่แล้ว ⁺¹
K-means clustering based on their proximity to the centroid of each cluster. The distance measure used in K-means clustering is typically the Euclidean distance, which is the straight-line distance between two points in n-dimensional space.
Depends on these distances they are grouped. here we are calculating distance between embedding.
@duetplay4551 2 ปีที่แล้ว
@@FutureSmartAI thx for your quick reply. Let me ask this way: Is there a specific distance value behind this clustering? This might be need a read though the document by myself. Thanks again!
@FutureSmartAI 2 ปีที่แล้ว ⁺¹
@@duetplay4551 Yes in Kmeans you should be able to get the distance between cluster centroid and points.
@MyAscetic ปีที่แล้ว ⁺¹
Hi Pradip. Great demo. Can we further classify the cluster number into text? For example is there a model that will generate the word “baby” for cluster 0, “drums or monkey” for cluster 2, “animal” for cluster 3 and “food” for cluster 4?
@FutureSmartAI ปีที่แล้ว
You can create your mapping
@ilducedimas ปีที่แล้ว ⁺¹
You rock !
@IbrahimKhan-lf9cq ปีที่แล้ว ⁺¹
Amazing great work can please make video on semantic similarity detection model using bert transformer pleaseeee 🙏🙏🙏🙏🙏
@rafiquemohammed3029 4 หลายเดือนก่อน
Thank you Pradip! How to select which is best sentence transformer model? whats the difference in those 100+ sentence transformer model?
@FutureSmartAI 4 หลายเดือนก่อน
You can decide based on prediction time, accuracy and whether you want multi lingual or not
@wilfredomartel7781 ปีที่แล้ว ⁺¹
Amazing! But how about a video of how to fine tuning a sentence transformer for nom english?
@birolyildiz ปีที่แล้ว ⁺¹
Thank you very much ❤
@FutureSmartAI ปีที่แล้ว
You're welcome 😊
@samarthsarin 2 ปีที่แล้ว ⁺¹
How can I train my custom sentence embeddings for my domain specific task so that I can find out similarity between my custom domain words
@FutureSmartAI 2 ปีที่แล้ว
You can train your own here are steps.
www.sbert.net/docs/training/overview.html
@samarthsarin 2 ปีที่แล้ว
@@FutureSmartAI thank you for replying but this is valid for a supervised problem. I have huge amount of data which is pure text documents. I want to train it in an unsupervised way where the model can learn similar words/ sentences
@FutureSmartAI 2 ปีที่แล้ว
@@samarthsarin One way to train unsupervised is by using dummy tasks like next word prediction or next sentence prediction
@shobhitrajgautam ปีที่แล้ว
great video. My usecase is slightly different.
i have corpus of articles and corpus of summary.
i want to find for particiular summary, how many articles are sematically related or similar.
Which model is use, embedding and cluster it or not?
Can you help
@FutureSmartAI ปีที่แล้ว
You can use embedding and semantic score.
1. Calculate embedding for each article.
2. calculate embedding for a particular summary
Now iterate through each article embedding and calculate the cosine similarity between article embedding and particular summary.
Sort results to get the most semantically similar articles to that summary.
Check this it has utility functions : www.sbert.net/examples/applications/semantic-search/README.html
@flreview212 ปีที่แล้ว ⁺¹
Hello sir, thanks for sharing, this is so insightful. I want to build a text summary, but I find it interesting on this embedding method. I want to ask, how we train our dataset on this model. Have any tutorials? Thank you in advance!
@FutureSmartAI ปีที่แล้ว
You can finetuning senetence transormers but I dont have any tutorials on it. You can read more about it www.sbert.net/docs/training/overview.html
@flreview212 ปีที่แล้ว
@@FutureSmartAI Sorry to bothering again sir, I'm still new to this, so I just give the sentences (all the news text in each document without labels) in InputExample function and then train it in SentenceTransformer?
@panditamey1 2 ปีที่แล้ว ⁺¹
Fantastic video Pradip!! Can you please suggest any reading material for sentence embeddings?
@FutureSmartAI 2 ปีที่แล้ว ⁺¹
Thanks, Amey. I think you should check the official website. They have details of what pre-trained models are available and how to fine-tune them.
www.sbert.net/docs/training/overview.html
There is also new thing called SetFit
huggingface.co/blog/setfit
@panditamey1 2 ปีที่แล้ว
@@FutureSmartAI Thanks a lot!!
@Amazingarjun 10 หลายเดือนก่อน
Thank YOU.
@saritbahuguna9603 ปีที่แล้ว
pip install -U sentence-transformers I am getting error .
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
@sumankumari-gl3ze ปีที่แล้ว
amazing
@FutureSmartAI ปีที่แล้ว
Thanks
@SMCGPRA 10 หลายเดือนก่อน
How we know what is number of clusters needed
@Mr_ScrufMan 9 หลายเดือนก่อน
It's beneficial if you can somehow infer it based on domain knowledge, but have a look at the "elbow method" or "silhouette method"
@rajsethi26 ปีที่แล้ว
excellent man!! short and crisp. Do you mind creating semantic search model on custom dataset using pre-trained hugging face model.
@FutureSmartAI ปีที่แล้ว
you mean you want to fientune senetence transfomers?
@nitinchavan3395 ปีที่แล้ว
HI Pradip, thanks for the video.
Can you please help me with this:
The embeddings (numerical values) change every time I use a new kernel.
How can I ensure that the embeddings are exactly same?
I have tried the following but it does not seem to work:
1. use model.eval() to turn model into evaluation mode and deactivate dropouts.
2. set "requires_grad" for each layer in the model as false so that the weight do not change.
3. set the same seeds.
Could you please guide me on this, any suggestion is appreciated.
Thanks,
Nitin
@tintumarygeorge9309 ปีที่แล้ว
Hi, Did you get the solution for this problem ? I am also facing same problem
@nitinchavan3395 ปีที่แล้ว
@@tintumarygeorge9309 Yes, the weights remain same (provided you use exactly same text each time). The bug in my case was, the order of text which I fed to the transformer was not same every time.
@kaka_rbp1998 ปีที่แล้ว
Thankyou
@venkatesanr9455 2 ปีที่แล้ว ⁺¹
Thanks for valuable inputs and clarity explanation. Can you do ner related videos/fine tuning using hugging face. Another request is currently, i am also doing semantic search related tasks and followed all the links in the notebooks already excluding clustering. I like to do semantic search between text input and images output ( which is possible only vectorizing both query and image description). Can you share any links related to huggingface or others that will be helpful.
@FutureSmartAI 2 ปีที่แล้ว ⁺¹
Hi Venkatesan, I have already done videos related to custom NER and Fine tuning Hugging face transformers.
th-cam.com/video/9he4XKqqzvE/w-d-xo.html
th-cam.com/video/YLQvVpCXpbU/w-d-xo.html
For semantic search between text input and images output:
Check CLIP (Contrastive Language-Image Pre-training)
openai.com/blog/clip/
It is a neural network model which efficiently learns visual concepts from natural language supervision.
CLIP is trained on a dataset composed of pairs of images and their textual descriptions, abundantly available across the internet.
@FutureSmartAI 2 ปีที่แล้ว ⁺¹
SentenceTransformers provides models that allow embedding images and text into the same vector space. This allows to find similar images as well as to implement image search.
www.sbert.net/examples/applications/image-search/README.html
@venkatesanr9455 2 ปีที่แล้ว ⁺¹
@@FutureSmartAI Thanks for your valuable links and I will check/try.
@pouriaforouzesh5349 ปีที่แล้ว
🙏
@AzertAzert-nw4ze 11 หลายเดือนก่อน
😙🤯🤯🤯🤯😃😲😁😅🤣😲😁🤣
@shubhamguptachannel3853 ปีที่แล้ว
Thanku soo much sir😊❤❤❤
@highstakestrading ปีที่แล้ว
where can we find the dataset?
its throwing error it cant find the dataset No such file or directory: '/content/drive/MyDrive/Content Creation/TH-cam Tutorials/datasets/toxic_commnets_500.csv'
@FutureSmartAI ปีที่แล้ว
Its is shared in previous video of playlsit.

ต่อไป

เล่นอัตโนมัติ

GPT-3 Embeddings: Perform Text Similarity, Semantic Search, Classification, and Clustering | Code