Differences between NLTK and Spacy : i)Purpose and Design: Most of the times NLTK will be focused more on Research, Academic and Educational sectors and Spacy is designed with a focus on efficiency and production use. ii) Performance: NLTK is slower than Spacy which can be a drawback for large-scale or real-time processing tasks. Whereas spaCy is optimized for high performance, making it much faster than NLTK. It’s designed to handle large volumes of text data efficiently. iii)Features and Capabilities: NLTK provides tools for almost every NLP task and It is rich in Educational Resources. spaCy includes pre-trained models for a variety of languages and spaCy integrates well with deep learning frameworks like TensorFlow and PyTorch, which is advantageous for modern NLP tasks. Ease of Use: Due to its flexibility and range of options, NLTK can be more complex to use, especially for beginners. Whereas spacy is User-Friendly and it has a more straightforward API and is easier to get started with, especially for common NLP tasks. Community and Ecosystem: NLTK has been around longer, with a larger number of academic users and more extensive resources. Whereas spaCy has a rapidly growing community, especially in industry, and it benefits from regular updates and new features. Summary: Use NLTK if: You need a comprehensive, flexible toolkit for detailed NLP tasks, research, or learning. Use spaCy if: You need a fast, production-ready solution with pre-trained models and an easy-to-use API for common NLP tasks. Important Note : Each library has its strengths, so the choice between them depends on the specific needs of your project.
Thanks for all the amazing information you keep sharing over youtube. Please keep up the excellent work, your data science knowledge is of great help to the community of aspiring data scientists.
I watched this video entirely , its very useful for me. I paid 55K for a data science and I am learning from here you are much better than any one there . Thank you Krish 😍
I am both happy and sad. Happy that I discovered the channel today. Sad that I discovered this channel only today. I wish I discovered this channel 4 years ago. So the overall sentiment of my post is positive :)
NLTK is a comprehensive and educational toolkit suitable for a wide range of NLP tasks, while SpaCy is a focused and efficient library designed for production use, particularly for tasks like entity recognition and part-of-speech tagging.
Thank you for being a guiding light in the vast sea of information, providing clarity and understanding to those who seek knowledge. Your commitment to the betterment of individuals and society as a whole is truly uplifting.
at 2:30:20 you finish BOW video and transition into TF-IDF video and say that in previous video you have mentioned about N-grams. I didnt find the N-grams video. Am I missing something?
Thank you for uploading such nice and comprehensive lectures (not videos) and explaining it so nicely. Your commitment is quite commendable. Please make such one shot videos in the future too.
However, the process of tokenization usually goes from a larger unit to a smaller one, not the other way around. So, it’s more common to tokenize a paragraph into sentences or a sentence into words, rather than tokenizing sentences into a paragraph. @37:21 your comment says ##sentence to paragraph tokenization however you ended up using sent_tokenization() which accepts a paragraph and breaks it down into smaller sentence
The difference between NLTK and spacy are: 1. NLTK supports various languages whereas spaCy have statistical models for 7 languages (English, German, Spanish, French, Portuguese, Italian, and Dutch). It also supports named entities for multi language. 2. NLTK is a string processing library. It takes strings as input and returns strings or lists of strings as output. Whereas, spaCy uses object-oriented approach. When we parse a text, spaCy returns document object whose words and sentences are objects themselves. 3. spaCy has support for word vectors whereas NLTK does not. 4. As spaCy uses the latest and best algorithms, its performance is usually good as compared to NLTK. In word tokenization and POS-tagging spaCy performs better, but in sentence tokenization, NLTK outperforms spaCy. Its poor performance in sentence tokenization is a result of differing approaches: NLTK attempts to split the text into sentences. In contrast, spaCy constructs a syntactic tree for each sentence, a more robust method that yields much more information about the text.
Thankyou sir.. It was very useful to have it in single video.. All concepts were very clearly explained. God bless you sir.. I am 45 and trying to learn AI Ml😊
Upon reading the documentation and paper on CBOW, I have two questions - 1. You explained that when we chose a window size, for example 3, we take 3 consecutive words from the corpus and take the middle word as target word and words before and after (1 each in this case) to provide context for the target word. However the documentation says that the number of words we take as window size determines the number of words taken before and after the target word. So for example if we take window_size = 3, we take 3 words before and 3 words after the target word to provide context. 2. We can chose the hidden layer to be any size. It is not important that it matches the window size, since the input layer does average or sum of the input vector and hence it's size is always [1 x V] where V is the vocabulary size. The input-hidden layer matrix is on size [V x N] where N is the hidden layer size, and then the hidden-output layer matrix is of [N x V] and finally the output layer if [V x 1] Can you please clarify my doubts here
NLTK is a string processing library. It takes strings as input and returns strings or lists of strings as output. Whereas, spaCy uses object-oriented approach. When we parse a text, spaCy returns document object whose words and sentences are objects themselves.
Not sure if you will be reading these comments.. I really really like the way you teach.. its very informative and made it very easy to understand with the examples (giving the not so good ones first and then coming to the one which fixes all, this way gives the clear picture, also helps in clearing interview questions). very good narration, like a director of a movie narrating a story, which shows that you are so passionate in teaching and making others understand what you are explaining. And finally very very good voice. Thanks a lot KRISH NAIK SIR. Subscribed and waiting for more videos from you.
Can you please create end to end project with real time data, ie using Kafka for streaming, Django for the backend and atleast use kubeflow for tracking 😊,,I'll appreciate
Completed this course end to end and it is super amazing to be frank, i wasted a lot of money going for trainings. only request is could not find things like how to train a model to recognize our own named entities and how we can use nlp to take in un structured data to structured data , how to create a model from scratch to build something similar to word 2 vec with our own corpus kind , also some real time examples would be of great help. I know you are already doing a lot for free, but if you can help in the above requests it would be of great help.. please see if you can do this and I appreciate a looooot for what you are already doing for free. not seen anyone explaining in this detail and simple ways...
okay for lemmetization how would we find the pos is noun, verb, adjective or anything else like for a big corpus?? because we cant explicitly check for all the types right?
Can anybody please tell me how can I enable extension support like code completion in jupyter lab. I have searched stackoverflow but all effort had been in vain.
how is parts of speech going to work for ungrammatical sentences like some word's part of speech may depend on context and semantic in sentence as well right?
Hello sir, can you help me with my final year project, i'm trying to build a website where users uploads resumes in pdf format and admin classifies all resumes into predefined categories and ranks them based on job descriptions using cosine similarity and they said we cant use external libraries. can you help me out?
Where can I find the next part..? Like practical implementation of word2vec with model training from scratch using gensim or Glove... Also practical implementation of tf-idf, bow... Pls share those videos as well
Hey nice but this is old vedio sir i am one of your very old subcriber so i remember last year you upload this vedio. And at 16.26 vedio time frame please check your time too at the right bottom of your vedio when you open gmail. 17/10/2022 So please Sir Krish Naik I really respcct to you becuase i very learn from your side you are one of my best Online Professior. SO please don't mind for my commend. If you mind then i sorry you and please accept my applogies too. My Main request is this if you upload new vedio on NLP then its very helpfull for us.
Differences between NLTK and Spacy :
i)Purpose and Design: Most of the times NLTK will be focused more on Research, Academic and Educational sectors and Spacy is designed with a focus on efficiency and production use.
ii) Performance: NLTK is slower than Spacy which can be a drawback for large-scale or real-time processing tasks. Whereas spaCy is optimized for high performance, making it much faster than NLTK. It’s designed to handle large volumes of text data efficiently.
iii)Features and Capabilities: NLTK provides tools for almost every NLP task and It is rich in Educational Resources. spaCy includes pre-trained models for a variety of languages and spaCy integrates well with deep learning frameworks like TensorFlow and PyTorch, which is advantageous for modern NLP tasks.
Ease of Use: Due to its flexibility and range of options, NLTK can be more complex to use, especially for beginners. Whereas spacy is User-Friendly and it has a more straightforward API and is easier to get started with, especially for common NLP tasks.
Community and Ecosystem: NLTK has been around longer, with a larger number of academic users and more extensive resources. Whereas spaCy has a rapidly growing community, especially in industry, and it benefits from regular updates and new features.
Summary:
Use NLTK if: You need a comprehensive, flexible toolkit for detailed NLP tasks, research, or learning.
Use spaCy if: You need a fast, production-ready solution with pre-trained models and an easy-to-use API for common NLP tasks.
Important Note : Each library has its strengths, so the choice between them depends on the specific needs of your project.
good insights.👍
Thanks for all the amazing information you keep sharing over youtube. Please keep up the excellent work, your data science knowledge is of great help to the community of aspiring data scientists.
I watched this video entirely , its very useful for me. I paid 55K for a data science and I am learning from here you are much better than any one there .
Thank you Krish 😍
No one can be much fool as you are 😂😂😂
hey do i need prior ml knowledge to understand the concepts of this video ?
I am both happy and sad. Happy that I discovered the channel today. Sad that I discovered this channel only today. I wish I discovered this channel 4 years ago. So the overall sentiment of my post is positive :)
sometimes, it's classified as neutral
NLTK is a comprehensive and educational toolkit suitable for a wide range of NLP tasks, while SpaCy is a focused and efficient library designed for production use, particularly for tasks like entity recognition and part-of-speech tagging.
Who needs institutes when krish sir is ready to give everyone this much free resources.
Thank you for being a guiding light in the vast sea of information, providing clarity and understanding to those who seek knowledge. Your commitment to the betterment of individuals and society as a whole is truly uplifting.
NLTK:
Purpose: Teaching, research, and experimentation.
Ease of Use: More complex, requires manual setup.
Speed: Slower, prioritizes flexibility.
Models: No pre-trained models by default.
Data Handling: Uses Pythonic lists, trees.
Visualization: Basic, limited tools.
Learning Curve: Steeper for beginners.
Community: Strong in academia, research-focused.
spaCy:
Purpose: Industrial use, production-ready applications.
Ease of Use: Simple API, pre-built pipelines.
Speed: Fast, optimized with Cython.
Models: Provides pre-trained models out of the box.
Data Handling: Uses optimized objects like Doc, Token.
Visualization: Interactive, built-in tools like displaCy.
Learning Curve: Easier, beginner-friendly.
Community: Growing, production-focused ecosystem.
at 2:30:20 you finish BOW video and transition into TF-IDF video and say that in previous video you have mentioned about N-grams. I didnt find the N-grams video. Am I missing something?
Yes! N-gram topic was skipped
@@islamiczone7731 so where to watch it ?
if you want to learn it, just search on youtube n grams nlp by krish naik
Thank you for uploading such nice and comprehensive lectures (not videos) and explaining it so nicely. Your commitment is quite commendable. Please make such one shot videos in the future too.
However, the process of tokenization usually goes from a larger unit to a smaller one, not the other way around. So, it’s more common to tokenize a paragraph into sentences or a sentence into words, rather than tokenizing sentences into a paragraph.
@37:21 your comment says ##sentence to paragraph tokenization however you ended up using sent_tokenization() which accepts a paragraph and breaks it down into smaller sentence
The difference between NLTK and spacy are:
1. NLTK supports various languages whereas spaCy have statistical models for 7 languages (English, German, Spanish, French, Portuguese, Italian, and Dutch). It also supports named entities for multi language.
2. NLTK is a string processing library. It takes strings as input and returns strings or lists of strings as output. Whereas, spaCy uses object-oriented approach. When we parse a text, spaCy returns document object whose words and sentences are objects themselves.
3. spaCy has support for word vectors whereas NLTK does not.
4. As spaCy uses the latest and best algorithms, its performance is usually good as compared to NLTK. In word tokenization and POS-tagging spaCy performs better, but in sentence tokenization, NLTK outperforms spaCy. Its poor performance in sentence tokenization is a result of differing approaches: NLTK attempts to split the text into sentences. In contrast, spaCy constructs a syntactic tree for each sentence, a more robust method that yields much more information about the text.
Excellent Video, Just started with it, got clear about basics of NLP
Aww, you did an incredible job, Krish! I was fully engaged for the entire 4 hours and didn't get bored once.
Thankyou sir.. It was very useful to have it in single video.. All concepts were very clearly explained. God bless you sir.. I am 45 and trying to learn AI Ml😊
Thanks so much Professor Naik.
Let's be grateful and give a LIKE to this great resource!
Thanks for sharing and educating us. Keep it up.
Thankyou so much Krish Sir for this amazing video.
One Of the best course, regarding to nlp. thank you so much sir.
can you tell me if he covered TF-IDF?
@@anshulvairagade1604 yeap he covered it
@@anshulvairagade1604yes
Upon reading the documentation and paper on CBOW, I have two questions -
1. You explained that when we chose a window size, for example 3, we take 3 consecutive words from the corpus and take the middle word as target word and words before and after (1 each in this case) to provide context for the target word. However the documentation says that the number of words we take as window size determines the number of words taken before and after the target word. So for example if we take window_size = 3, we take 3 words before and 3 words after the target word to provide context.
2. We can chose the hidden layer to be any size. It is not important that it matches the window size, since the input layer does average or sum of the input vector and hence it's size is always [1 x V] where V is the vocabulary size. The input-hidden layer matrix is on size [V x N] where N is the hidden layer size, and then the hidden-output layer matrix is of [N x V] and finally the output layer if [V x 1]
Can you please clarify my doubts here
I also have confusion in this step. i cant figure out the output dimension
In CBOW, i was also confused about the size of hidden layer. as i understands, hidden layer nodes can be of any size.
thank you sir. such a wonderful and helpful video. could you please provide part 2 for this.
NLTK is a string processing library. It takes strings as input and returns strings or lists of strings as output. Whereas, spaCy uses object-oriented approach. When we parse a text, spaCy returns document object whose words and sentences are objects themselves.
Hello Krish, A small problem at time 2:30:21 the n-gram explanation video is skipped. Please add the video corresponding to N-gram. Thank you.
Great video Krish much needed
Thank you so much, You are Awesome!!!!
great guy, complete description for every term
Not sure if you will be reading these comments.. I really really like the way you teach.. its very informative and made it very easy to understand with the examples (giving the not so good ones first and then coming to the one which fixes all, this way gives the clear picture, also helps in clearing interview questions). very good narration, like a director of a movie narrating a story, which shows that you are so passionate in teaching and making others understand what you are explaining. And finally very very good voice. Thanks a lot KRISH NAIK SIR. Subscribed and waiting for more videos from you.
such a GREAT video on NLP. i just LOVED your explanation!! keep up the good work!
Thank you for this amazing content
NLTK is widely used in research, Spacy is focuses on production usage.
Helped to revise the concepts. Thank you Krish sir
can you tell me if he covered TF-IDF?
Can you please create end to end project with real time data, ie using Kafka for streaming, Django for the backend and atleast use kubeflow for tracking 😊,,I'll appreciate
We really want this kind of project.
in ineuron there is a course for end-to-end data science projects. You can check it out
Thank you Krish a friend introduce me to your videos. very wonderful and educative.
Akpe na wo Mawu ne yrawo 💯
Looks great video, I'm halfway right now.
Excellent roadmap. Really looking ahead.
2:30:27 @Krish Sir, The previous lecture was about the bag of words, not N-grams. where can I find the lecture on N-grams?
Hey Krish. İ think you can read my mind😅. Thank you for video
Thank you so much for showing path to our life krish…
I bought FSDS course from ineuron cos of your name but learning from here.
After explanation every concept
Please provide practical knowledge.
The practical content should be more
Thanks for all you do Krish 🙏
Please provide a link of 2nd part of this video which contains the after word2vec. The DL part
Thank you for this oneshort awesome video in NLP
Such a great leason. Thanks man!
best course, it really helped me!
explanation 👏 perfect
i enjoyed this video, thanks Krish
Great video Krish ❤ much needed
personal timestamp
day 1- 31:36
Thank you so much for this tutorial.
This video is from his Udemy Paid Course: "Complete Machine Learning,NLP Bootcamp MLOPS & Deployment" - Section 48
i find it old i cant perform steps he is doing
Outstanding series
Godamn that is an incredible speech
Great insight, what is the name of that digital board that you're using to capture your illustrated drawings?
that arabic was - Kayfa Haalak
in case someone was interested in pronunciation
Awesome! Thanks Krish sir!
Completed! What is the next step?
Great Video Krish, Some how Ngrams is skipped, Can you please add it
First to comment,, The best Data Science,Al, Machine Learning teacher of all time,, Straight outta Kenya❤
Great explanation, could you please provide the link for the subsequent parts of the pyramid (RNN, Transformers and BER)
This video was really needed
Completed this course end to end and it is super amazing to be frank, i wasted a lot of money going for trainings. only request is could not find things like how to train a model to recognize our own named entities and how we can use nlp to take in un structured data to structured data , how to create a model from scratch to build something similar to word 2 vec with our own corpus kind , also some real time examples would be of great help.
I know you are already doing a lot for free, but if you can help in the above requests it would be of great help.. please see if you can do this and I appreciate a looooot for what you are already doing for free. not seen anyone explaining in this detail and simple ways...
Is there an extension to this playlist leading to introduction of LLMs and how to train generative models?
N grams topic is missing. Could you kindly check the video again ?
Hello krish
Are you starting any new batch for data science?? plz let me know
If possible plz shre any link regarding that!!
Thank you
can i do this without doing ML/AI basic concepts?
Awesome session
really nice explanation brother, please, can you share your notes?
hello krish ,
how to handle curse dimensionality in huge corpus?
Amazing videos. Curious to know which app/tool you use for creating notes
Thank you so much sir
okay for lemmetization how would we find the pos is noun, verb, adjective or anything else like for a big corpus?? because we cant explicitly check for all the types right?
Can anybody please tell me how can I enable extension support like code completion in jupyter lab. I have searched stackoverflow but all effort had been in vain.
how is parts of speech going to work for ungrammatical sentences like some word's part of speech may depend on context and semantic in sentence as well right?
Thankuuu sir❤
What is the sw tool and gadgets that you use to present it , writing in a pen ?
Can you please provide notes for this video
n-grams tutorial video is missing
petition to upload the deep learning part of NLP ASAP, i have college exam next month
There's one small mistake..Tokenization does not convert a sentence into paragraphs, rather into tokens.
I have doubt why they have taken only cos similarity in word2vec.why not sin.
Thanks for this video. N-grams is missing from this video. Could you please upload that separately annd share the link?
in NLP do we use standardization or normalization for text data ?
what are tools used in preparing the video?
Hello sir, can you help me with my final year project, i'm trying to build a website where users uploads resumes in pdf format and admin classifies all resumes into predefined categories and ranks them based on job descriptions using cosine similarity and they said we cant use external libraries. can you help me out?
Sir can you tell how can I use azure openai api for llm in python
Where is Transformers and BERT?
Where can I find the next part..? Like practical implementation of word2vec with model training from scratch using gensim or Glove... Also practical implementation of tf-idf, bow... Pls share those videos as well
Please make same one shot video on Deep Learning ❤
its already there..
Sir will you also cover deep learning models in nlp?
Hey nice but this is old vedio sir i am one of your very old subcriber so i remember last year you upload this vedio. And at 16.26 vedio time frame please check your time too at the right bottom of your vedio when you open gmail. 17/10/2022 So please Sir Krish Naik I really respcct to you becuase i very learn from your side you are one of my best Online Professior. SO please don't mind for my commend. If you mind then i sorry you and please accept my applogies too. My Main request is this if you upload new vedio on NLP then its very helpfull for us.
please do include transformers and bert related part too
so this is combination of your old videos right?
No these are new recorded videos
@@krishnaik06Dhanyawad, pichle videos samazhke skip karne wala tha
most waited...♥
Sir, There is N-grams topic is skipped automatically .so, please discuss it again or fixed this video
Can anyone comment on this ??? Will I be able to crack an interview on NLP after watching this vdo ?? Plz comment if u have watched this till the end
Bro this video was from basic to intermediate now go check his live nlp serious from day 6 onwards till the end to crack interviews on nlp
Give some pratical usecases
pls share link to this notebook
Runolfsson Freeway
We want more content