NLP Demystified 3: Basic Preprocessing (case-folding, stop words, stemming, lemmatization)

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 พ.ย. 2024

ความคิดเห็น •

  • @futuremojo
    @futuremojo  2 ปีที่แล้ว +3

    Timestamps:
    00:00:00 Basic Preprocessing
    00:00:35 Case-folding and its tradeoffs
    00:02:40 Stop word removal (tradeoffs and how it can go wrong)
    00:04:40 Stemming (tradeoffs and things to watch out for)
    00:06:28 Lemmatization and its advantages over stemming
    00:07:52 DEMO: basic processing with spaCy
    00:10:37 Basic preprocessing recap

  • @khalidnaveed1077
    @khalidnaveed1077 3 หลายเดือนก่อน

    Great concise intro, I see you getting big in the future. Keep up with the work.

  • @nisargpatel1443
    @nisargpatel1443 5 หลายเดือนก่อน

    Concise and easily understandable. Thanks a lot for the series.

  • @YashodPerera-b9j
    @YashodPerera-b9j ปีที่แล้ว

    This is the best NLP series I have ever watched

  • @YashodPerera-b9j
    @YashodPerera-b9j ปีที่แล้ว +1

    This content is simple and easy to understand.

  • @somerset006
    @somerset006 ปีที่แล้ว +1

    Well done, thanks!

  • @rishidixit7939
    @rishidixit7939 หลายเดือนก่อน

    I have a bunch of reviews(about 20 million) on places like restaurants, cafes, pet groomers, cleaners and other services.
    Now I have to categorize them into these service categories like food, pet grooming, cleaning etc. A heavy model like BERT is taking up a lot of time and resources.
    The data in not labelled for the service so I was thinking about doing a clustering and doing food or no food as the only classes. Kind of like Aspect Based Classification

  • @rishidixit7939
    @rishidixit7939 หลายเดือนก่อน

    I also had to ask one more question that if I have so many product reviews(around 20 million) how will I analyze and clean my data. In some places the punctuations are wrong, some have too many spaces etc. It is not possible to see all the errors in the reviews.
    In that case how to preprocess the data.