Ayo! This is probably the best vid I found that explains this shit. All other examples I saw didn't take repeating words in a document into consideration. Thanks!
Hi I have a doubt, I could see that the Training data set of class j is having all the words of test data set, so Laplace smoothing can be done only for class c right instead of doing for both.
Please clarify. Suppose there is a word in the test document which is not included in any training docs. Will we include that word in our count of |v| ?
Good question. Sometimes test documents do have words that are not in the training docs. We smooth the frequencies of those words as well using Laplace smoothing so they get a non-zero, small value or ignore them from our calculation so that the total probability does not come out to be zero.
You have clearly explained how text classification works using BoW representation. Can you please explain, how will be the conditional probabilities of words/features will be calculated (specifically numerator, Laplace and denominator) for the same example if using tf-idf representation.
Each word is a feature in this example. There are 6 unique words, i.e., 6 words. Hence, we add 1 for each feature in the denominator, so the denominator has a +6.
8 is total for C, then 3 is total for J, then 6 is the unique words * K value I think, since K value = 1, and unique words = 6, it is equal to 6 I believe
Ayo! This is probably the best vid I found that explains this shit. All other examples I saw didn't take repeating words in a document into consideration. Thanks!
I can't understnad what my teacher said during the class, but you save me!!!
Thanks!!!!🥺
Smooth explanation❤ thanks
Good explanation from you madam,it will be helpful to number of learner's, thank you so much for making us very understandable.
Glad you found it helpful.
Thank you so much , i needed to understand this exercice
Best Explanation
thanks a ton, understood the entire video
What is the default formula for choosing a class?
Extremely well explained and great presentation. Gg. Thankyouu. Here's a cookie for you. 🍪
Thank you, great job dear
Hi I have a doubt, I could see that the Training data set of class j is having all the words of test data set, so Laplace smoothing can be done only for class c right instead of doing for both.
Please clarify.
Suppose there is a word in the test document which is not included in any training docs. Will we include that word in our count of |v| ?
Good question. Sometimes test documents do have words that are not in the training docs. We smooth the frequencies of those words as well using Laplace smoothing so they get a non-zero, small value or ignore them from our calculation so that the total probability does not come out to be zero.
@@machinelearningmymusic6250 So the word is still included in the calculation right, but defaults to 1/(1*Cardinal) ?
You have clearly explained how text classification works using BoW representation. Can you please explain, how will be the conditional probabilities of words/features will be calculated (specifically numerator, Laplace and denominator) for the same example if using tf-idf representation.
OP SHORT AND CONCISE THANKSSS
Mam 8+6 and 3+6 are there and 6 is common among all. But how 6 comes ??
Each word is a feature in this example. There are 6 unique words, i.e., 6 words. Hence, we add 1 for each feature in the denominator, so the denominator has a +6.
@@machinelearningmymusic6250 Clear mam! Thank you
8 is total for C, then 3 is total for J, then 6 is the unique words * K value I think, since K value = 1, and unique words = 6, it is equal to 6 I believe