Code With Me : Building a Spam Filter !

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ต.ค. 2024

ความคิดเห็น • 17

  • @tamojitmaiti
    @tamojitmaiti 3 ปีที่แล้ว +5

    Have been consistently watching your videos for a while now. It's amazing how clear, concise and succinctly you manage to explain everything.
    Just a suggestion, it'd be massively helpful if you could link/mention relevant books pertaining to topics in the video description. In the age of SEO laden blog posts, filtering out noise amidst Google searches is a pain.

  • @drsandeepvm5622
    @drsandeepvm5622 2 ปีที่แล้ว +1

    Thanks for your exceptionally simple explanation 👍

  • @kdhlkjhdlk
    @kdhlkjhdlk 3 ปีที่แล้ว +3

    In my experience, using word embedding (fasttext is good for the character ngrams) + gradient boosting is far better than naive bayes.

  • @gracikk
    @gracikk 3 ปีที่แล้ว

    Really good explanation. Thank you!

  • @photonicsauce7729
    @photonicsauce7729 ปีที่แล้ว +1

    This is just naive bayes maybe. Cool

  • @ccuuttww
    @ccuuttww 3 ปีที่แล้ว +1

    LOL I think either your dataset or something went wrong in most situation Naive Bayes always has around 93% accuracy
    even u do the following technique lemmize steeming remove stopword N-Grams
    but anyway this video give us a concept of how it works with python

  • @YohaneesHutagalung
    @YohaneesHutagalung 3 ปีที่แล้ว +2

    Could you make a video about stock predictions with RNN Lstm

    • @Blaze098890
      @Blaze098890 ปีที่แล้ว +1

      spoiler: it doesn't really work

  • @Islamic_Tv984
    @Islamic_Tv984 ปีที่แล้ว

    link please

  • @jessicatriplev9802
    @jessicatriplev9802 3 ปีที่แล้ว +4

    I think you've made a mistake in: test_spam_df = spam_df.iloc[int(len(spam_df)*0.7):] The testing set should have been 30% not 70% of the data. That's perhaps why the validation result was so good.

    • @gracikk
      @gracikk 3 ปีที่แล้ว

      test_spam_df = spam_df.iloc[int(len(spam_df)*0.7):]
      It basically means that we save in our test set only those observations which index is more or equal to len(spam_df)*0.7

    • @jessicatriplev9802
      @jessicatriplev9802 3 ปีที่แล้ว

      @@gracikk Right. But that means that 70% of the data will end up in the testing set. He must have meant 30%.

    • @gracikk
      @gracikk 3 ปีที่แล้ว

      ​@@jessicatriplev9802I can't send you a colab link with a simple example, youtube delete it. But you can try it on your own to be sure that everything is fine

    • @joshuasigelman8141
      @joshuasigelman8141 ปีที่แล้ว

      I think it's fine because he is taking the first slice UNTIL the 70% mark and then the testing set FROM the 70% mark.

  • @GeyzsonKristoffer
    @GeyzsonKristoffer ปีที่แล้ว

    Bypass the model by repeating words that are biased towards the non-spam, multiple times.

  • @BurkenProductions
    @BurkenProductions 3 ปีที่แล้ว

    Wrong language for this. Need php

    • @Lizergus
      @Lizergus 6 หลายเดือนก่อน

      lmao