Have been consistently watching your videos for a while now. It's amazing how clear, concise and succinctly you manage to explain everything. Just a suggestion, it'd be massively helpful if you could link/mention relevant books pertaining to topics in the video description. In the age of SEO laden blog posts, filtering out noise amidst Google searches is a pain.
LOL I think either your dataset or something went wrong in most situation Naive Bayes always has around 93% accuracy even u do the following technique lemmize steeming remove stopword N-Grams but anyway this video give us a concept of how it works with python
I think you've made a mistake in: test_spam_df = spam_df.iloc[int(len(spam_df)*0.7):] The testing set should have been 30% not 70% of the data. That's perhaps why the validation result was so good.
test_spam_df = spam_df.iloc[int(len(spam_df)*0.7):] It basically means that we save in our test set only those observations which index is more or equal to len(spam_df)*0.7
@@jessicatriplev9802I can't send you a colab link with a simple example, youtube delete it. But you can try it on your own to be sure that everything is fine
Have been consistently watching your videos for a while now. It's amazing how clear, concise and succinctly you manage to explain everything.
Just a suggestion, it'd be massively helpful if you could link/mention relevant books pertaining to topics in the video description. In the age of SEO laden blog posts, filtering out noise amidst Google searches is a pain.
Thanks for your exceptionally simple explanation 👍
In my experience, using word embedding (fasttext is good for the character ngrams) + gradient boosting is far better than naive bayes.
Really good explanation. Thank you!
This is just naive bayes maybe. Cool
LOL I think either your dataset or something went wrong in most situation Naive Bayes always has around 93% accuracy
even u do the following technique lemmize steeming remove stopword N-Grams
but anyway this video give us a concept of how it works with python
Could you make a video about stock predictions with RNN Lstm
spoiler: it doesn't really work
link please
I think you've made a mistake in: test_spam_df = spam_df.iloc[int(len(spam_df)*0.7):] The testing set should have been 30% not 70% of the data. That's perhaps why the validation result was so good.
test_spam_df = spam_df.iloc[int(len(spam_df)*0.7):]
It basically means that we save in our test set only those observations which index is more or equal to len(spam_df)*0.7
@@gracikk Right. But that means that 70% of the data will end up in the testing set. He must have meant 30%.
@@jessicatriplev9802I can't send you a colab link with a simple example, youtube delete it. But you can try it on your own to be sure that everything is fine
I think it's fine because he is taking the first slice UNTIL the 70% mark and then the testing set FROM the 70% mark.
Bypass the model by repeating words that are biased towards the non-spam, multiple times.
Wrong language for this. Need php
lmao