Bro Your Video really save me. I really need to build a basic sentimental model for my presentation. Your video really helps me because I have no AI, ML experience. Thank Great video.
I'm inspired by you. I'm pretty sure you will be more successful in the upcoming days. keep posting videos about data science like ML AN DL. FROM Pakistan
Hi Achol, for your information we have recently launched couple of new project, one on: Age, Gender & Emotion Detection (th-cam.com/video/uovo1s1barU/w-d-xo.html) and second on: Credit Scoring (th-cam.com/video/8jzvzRo3Ij0/w-d-xo.html&t)..😊 Keep learning!!
Hey guys!! Glad to see such amazing feedback on this ML Project🤗 Need your support in reaching out to more learners by subscribing to my channel 🙂 Also, join me on my Skillcate Discord Server: discord.gg/GyMBfD4ER5. Let's talk Machine Learning ❤❤
Great video! I wanted to know if we can use other classification algorithms such as KNN/XG-boost/decision tree/random forest instead of naive bayes? I mean if I keep the code unchanged till the data cleaning part and then change the code according to the different classifier algorithms, will it work? I am not very concerned about accuracy here, I just want to know if it will work or not...if it does, how much do I need to change in the code section if I use the other above mentioned algorithms? please help.
Hi Banka, for your information we have recently launched couple of new project, one on: Age, Gender & Emotion Detection (th-cam.com/video/uovo1s1barU/w-d-xo.html) and second on: Credit Scoring (th-cam.com/video/8jzvzRo3Ij0/w-d-xo.html&t)..😊
Thanks Ishwari for your feedback. We love to hear from our learners :) We reached to 1420 by hit-&-trial in this case. However, we could have also write an algorithm for parameter turning, but not in this case. Hope it helps! :-)
Dear Paidimarri, thank you for your comment & apologies for the delay in response. You can reach out to us at skillcate@gmail.com with the details on your requirements. Happy to help :-)
@@skillcate Sir, I m getting error in Model fitting for classifier.fit(features, label). Error is: not supported between instances of 'str' and 'float'. I divided the dataset in train and test and used the train dataset as parameter in classifier.fit()
@@prajacta_m You possibly are using a different dataset and have conflicting str and float datatypes being used. Could you please try converting features and labels to str datatype before calling the fit function?
Hey Omkar, in the Google Drive Projects Folder, you will see a Sentiment_Predictor code file. We generated sentiment predictions using this code file on our Reviews_FreshDump (also there in the project folder). And then put a pivot to compute positive / negative sentiment predictions. Towards the end, I also mentioned 'Immediate Business Interventions'. That I figured out manually by going through reviews.. Project folder here: drive.google.com/drive/folders/1KgvHQAaYwprkQJTTOf3TH8OJ_e3SbSGI
Dear Nagaswapna, the 2 X2 matrix you see at 18:45 is called a confusion matrix. For computing accuracy, we use this mathematical formula: Accuracy = (True Positive + True Negative) / (True Positive + True Negative + False Positive + False Negative). So, essentially, in our case, Accuracy = (67+64)/(67+64+38+11) = ~72.7%. You may read more on Confusion Matrix here: en.wikipedia.org/wiki/Confusion_matrix Hope this helps!! 😊
Thanks for your interest Rishabh. Sentiment analysis is a hot research area these days. In general, you would like to learn about Feature encodings and representation and Classifiers- Naive bayes, decision trees, RNNs, LSTM etc. For each use-case provided to you, ensure why should we choose one strategy over other.
Hi Hashir, for your information we have recently launched couple of new project, one on: Age, Gender & Emotion Detection (th-cam.com/video/uovo1s1barU/w-d-xo.html) and second on: Credit Scoring (th-cam.com/video/8jzvzRo3Ij0/w-d-xo.html&t)..😊 Do like, share & subscribe to us. This helps us in reaching out to more learners like yourself!!!
Sir I am getting error that x is not defined can you please say why it is showing error like that...? Pls it's my major project 1 day left for submission pls help me sir ...
Dear Kavya, first of all apologies for this delay in our response here.. Ideally, this shouldn't have happened. Did you double-check if you are using upper-case? We are using the capital letter "X" here. Hope this helps. Keep learning :)
Dear Mohammed, , thank you for your comment & apologies for the delay in response. You can reach out to us at skillcate@gmail.com with the details on your requirements. Happy to help :-)
the corpus() part of the post cleaning data is not correct as I'm running ur code in colab they're saying Review is not define constantly!! i have the drive pathway still the error is occuring constantly
Sir can you please tell, in your sentiment model how you Create the link of bow and classifier.. I do not get that thing. From where we get that link and how and why we create that.
Hi Maitri. During the training, we obtain bag of words and classifier (Refer to section Model Building in video) which we then save so that we can use both of them in prediction script to make inferences on unseen test cases.
Dear Emran, that's an excellent query!! We have recently done a ML Project, where we have deployed Model Serialized .pkl file using Flask. Suggest you to go through it to have further clarity on Deployment side. Hope this helps. Keep learning :-)
It is showing me error that says- NameError: name 'X' is not defined in the Dividing dataset into training and test set part. Can you please tell me what did i do wrong?
Ahh. You most likely have missed to read data in X (as shown in cell 12) or have used lower case (x) character in place of upper case. Could you please verify?
Can you''l please release a video on how to sentiment as well as emotion analysis please and also an end to end project on Time Series Univariate and multivariate.Thanks
Hi Lancelot! For your information, we have done a video on Emotion Detection recently (watch here: th-cam.com/video/uovo1s1barU/w-d-xo.html ). Do like, share & subscribe to us. This helps us in reaching out to more learners like yourself..😊
Hi Aaisha, I assume by Jupyter notebook, you mean on Jupiter notebook in your local machine. Then, answer is yes. If your machine doesn't have gpu, it may just take longer to train.
Sir, when I am trying to import dataset from drive, it is showing 'File not found error' whereas in my drive, the dataset file is downloaded. Please help me.
@@mohitmayank5524 Alright. No problem. There is surely some catch here. 1. Could you please check if the file name is correct? 2. If it is, Run command "!ls" in your notebook and see if you are seeing the dataset file there.
That's correct! Lemmatization sure retains the meaning of tokens. However, it is computationally expensive & may not always help in significantly improving accuracy. So, there's always a trade-off here. Appreciate your response around this..:-)
Hi thats a great leacture. i have a query while building my own sentiment model I am having issue with how to deal with the such sentence "cell resource not failed and cell got connected" now this is somehow positive one but I got negative sentiment because it has some negative terms. is there any better way to do this
In such cases, model needs to be provided with more contextual information. To begin with, you can explore different length word n-gram for BoW representation discussed in the video. Moving toward more complex approaches, you can explore RNNs and LSTMs.
@@skillcate Thanks but for sentiment it does not required any labelled data but for LSTM it requires the data to be labelled and that is what I do not have! please suggest me.
Hi. I tired to put in a comment yesterday but looks likes it didn't post. Let me try again. I Love this video. I like the way it is simple and easy to follow. I loved the pickle file concept. I wasn't aware of that! Here is the question / issue I am having. When I run the exact code for this notebook in Juypter notebook, I get different accuracy. I get around .433. All the code, data is exactly the same. What could I be doing wrong? Thanks!
Hey!! This may sound stupid, but can you try redoing the entire flow again? Ideally, nothing should change, if you followed the exact same steps offline in Jupyter notebook. Also, double check these: 1. if you have updated all paths to your local working directory. 2. if you see 900 rows via shape function (4th code cell)
@@skillcate Thanks. I redid and got the same accuracy as yours. Now, I need to find what I did wrong earlier. I will try to break it again and post as to what I messed up. I want to say, I sort the list and that's what gave be less accuracy. But I will try and let you know. Thank again!
max_feature is one of the "hyperparameter" which represents dimensions of space in which each review is represented. There are many approaches available to select the feature dimension. In our video, to keep things simple, we selected max_feature through hit-and-trail (17:00). If you want to learn more about selecting feature dimension in general, please refer to machinelearningmastery.com/dimensionality-reduction-for-machine-learning/.
Hi, I replicated this code for a different analysis. Where I used a relative training data for my model. At the end I got a really low Model accuracy which was as low as 37%. What can i do to have a higher accuracy
Hey Jerry, Good to hear you're implementing it on your side. Here's what you should do next: - Check your dataset: Make sure your dataset is labeled correctly and has enough examples for each sentiment category. Having a balanced dataset with enough representative samples can improve the model's performance. - Data preprocessing: Clean your text data by removing unnecessary symbols, punctuation, or special characters. Additionally, removing common words that don't carry much sentiment (known as stop words) can help focus on more meaningful words. - Feature selection: Instead of using only the Bag of Words representation, try other techniques like TF-IDF or word embeddings (e.g., Word2Vec, GloVe). These methods capture more contextual information and can enhance the model's understanding of the sentiment. - Model selection: Experiment with different algorithms besides Naive Bayes, such as Support Vector Machines (SVM) or Random Forest. Each algorithm has its own strengths and may perform better on different types of data. - Hyperparameter tuning: Adjust the settings of your model, known as hyperparameters, to find the best combination for your dataset. Use techniques like cross-validation or grid search to systematically explore different parameter values and select the ones that give the best results. - More training data: If possible, gather more labeled data for training your model. Having a larger dataset can provide more diverse examples and improve the model's ability to generalize. - Learn from mistakes: Analyze the errors made by your model. Look for patterns or common mistakes it makes. This analysis can help you identify areas for improvement and guide your next steps. Remember, achieving high accuracy can be challenging, especially for beginners. It's important to keep learning, experimenting, and iterating on your approach. Don't get discouraged by initial low accuracy, as it's a common part of the learning process. Do write me back for further specific questions you may have. We may have a 1:1 call too if it's required.
@@skillcate thank you so much for your response.. I have taken some classes since I posted this comment. All you have said confirms aligns with what i learnt. Basically I relieved my training data was highly imbalanced
Well seriously, who are these people writing neutral reviews :p. Jokes apart, recently I did Sentiment Analysis on Live Tweets. This project has three label: + - ~ (link: th-cam.com/video/YdRTs0LmiuU/w-d-xo.html) I have done few more Sentiment Analysis Projects: one, on Movie Reviews with Neural Network (th-cam.com/video/oWo9SNcyxlI/w-d-xo.html) and second, on Amazon Reviews with Sklearn Pipeline (th-cam.com/video/lKAdxN0qrgk/w-d-xo.html) - where you shall get multi-label classification flavour. This current project is basically on Binary Classification. However, you may define 3 (+ - ~) labels, by using a training dataset that has multi-labels (ex: rating on a scale 1-5). R Happy learning :)
@@skillcate Hey! In the Live Twitter sentiment analysis project, would it work for other languages also ? like "Sinhala" which is the mother language in Sri Lanka 😂
@@rusiraliyanage6643 Yes sure you can. For this you may use TextBlob Translate Method to first translate the 'Sinhala' tweets to English, and then feed them to NLTK Vader (Vader is not multi-lingual, handles only English for now). Sharing the script here to translate: from textblob import TextBlob as TB tweet = 'ඉන්දියාවේ සිට ආයුබෝවන්' translated_tweet = TB(tweet).translate(from_lang=u'si', to=u'en') translated_tweet.sentiment
In the very last part. I wanted to know how many customers complained about particular issue like for example "Restaurant staff being rude" so how many related reviews were there about this..... Is that possible ?? @SKILLCATE
Yes Abdul. You may do so by looking up for these keywords "staff" and "rude". You may also use advance Text Analytics techniques, like: Topic Modelling, to know the themes being discussed in given customer reviews.
I am getting error as : input contain NaN when : after running, 'from sklearn.naive_bayes import GaussianNB classifier=GaussianNB() classifier.fit(X_train,y_train)' my data does not contain any Nan values though, can u please help?
Dear Viraj, sorry to hear that! Can you confirm if you are running this project in Google Colab or local machine? Also, are you using the same dataset from the toolkit or some different? Will get your issue resolved at the earliest possible..
for i in range (0,899): review = re.sub('[^a-zA-Z]',' ', data['Review'][i]) review = review.lower() review = review.split() review = [ps.stem(word) for word in review if not word in set (all_stopwords)] review = ' '.join(review) corpus.append(review) when i run the above this code on jupyter notebook it is showing keyerror
Dear Harsh, try using this snippet instead: for i in range(0, 900): review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i]) review = review.lower() review = review.split() review = [ps.stem(word) for word in review if not word in set(all_stopwords)] review = ' '.join(review) corpus.append(review)
Sir , What a content you are making ! Hat's off to you man , Teaching in a very simple manner . Appreciated
Very Kind of you! Glad that you are finding it useful.
Bro Your Video really save me. I really need to build a basic sentimental model for my presentation. Your video really helps me because I have no AI, ML experience. Thank Great video.
Glad that it helped you Abdul!
Good insights about business and data science. Good Job !!
Looking forward to more videos, cheers!
Thank you! Certainly committed to bring more ML projects. :)
Great Video!!! Much Helpful for Data Science insights and Application...Keep Posting more videos :)
Thanks for the support!
Thank you Sir you solve my problem
I'm inspired by you. I'm pretty sure you will be more successful in the upcoming days. keep posting videos about data science like ML AN DL. FROM Pakistan
Thanks for your lovely wishes..🙂
Thank you so much plz add more vedios on SA regarding different algorithms...
For sure buddy. Thanks for your feedback. Do like, share & subscribe to us. This helps us in reaching out to more learners like yourself..😊
Thank you..looking for more video❤️
Hi Achol, for your information we have recently launched couple of new project, one on: Age, Gender & Emotion Detection (th-cam.com/video/uovo1s1barU/w-d-xo.html) and second on: Credit Scoring (th-cam.com/video/8jzvzRo3Ij0/w-d-xo.html&t)..😊 Keep learning!!
Hats off sir please keep it up. you are doing great.
Thank you so much for this video!
Thanks a ton Kamil for the feedback!! Do recommend us in your esteemed social/profession circle as well, so they may benefit too :-)
Thanks for all your good work
Thanks for the appreciation Lancelot. Means a lot!! 🙂
Thank you sir for this detailed informative video. All the explanations are very much helpful to us. Thanks a lot
You are most welcome
this video is great, love it
Thanks a ton Samson for the feedback!! Keeps us motivated for sure
very good class. thank you very much
This was helpful. Very well explained.
Thank you very much!
Nice explanation how to use naive Bayes for SA 👍pl . Try to share use of different machine learning algorithms with more accuracy performance...
Sure. We will. Thanks for the support!
Hi there, Why are we using Gaussian Naive Bayes here? In general Multinomial Naive bayes is used for text classification right.
Hey guys!! Glad to see such amazing feedback on this ML Project🤗 Need your support in reaching out to more learners by subscribing to my channel 🙂 Also, join me on my Skillcate Discord Server: discord.gg/GyMBfD4ER5. Let's talk Machine Learning ❤❤
You are the best bro thanks ❤️
Glad that you liked it! Thanks for the support!
Crisp explanation, loved it
Great video! I wanted to know if we can use other classification algorithms such as KNN/XG-boost/decision tree/random forest instead of naive bayes? I mean if I keep the code unchanged till the data cleaning part and then change the code according to the different classifier algorithms, will it work? I am not very concerned about accuracy here, I just want to know if it will work or not...if it does, how much do I need to change in the code section if I use the other above mentioned algorithms? please help.
Yes, of course you may use other classifiers. You will just need to replace Naive Bayes function with your desired classifier.
@@skillcate yes it worked...thank you...keep up the good work ❤️
@@akifzaman9269 Wohoo!!! Do like, share & subscribe to us. This helps us in reaching out to more learners like yourself..😊
Thank you so much:)
Hi Banka, for your information we have recently launched couple of new project, one on: Age, Gender & Emotion Detection (th-cam.com/video/uovo1s1barU/w-d-xo.html) and second on: Credit Scoring (th-cam.com/video/8jzvzRo3Ij0/w-d-xo.html&t)..😊
Very well explained! Thank you so much for this video! Can you please explain how did you reach the 1420 value in data transformation?
Thanks Ishwari for your feedback. We love to hear from our learners :)
We reached to 1420 by hit-&-trial in this case. However, we could have also write an algorithm for parameter turning, but not in this case. Hope it helps! :-)
My clg gave me a project on sentimental analysis of restaurant reviews .can you help me for this..
Dear Paidimarri, thank you for your comment & apologies for the delay in response. You can reach out to us at skillcate@gmail.com with the details on your requirements. Happy to help :-)
can you please explain healthcare review dataset with review text and rating columns for ML project
Great video , but i m getting error for Model fitting . Sir can u please help
Hey Prajakta, what is the error message you are getting?
@@skillcate Sir, I m getting error in Model fitting for classifier.fit(features, label). Error is: not supported between instances of 'str' and 'float'.
I divided the dataset in train and test and used the train dataset as parameter in classifier.fit()
@@prajacta_m You possibly are using a different dataset and have conflicting str and float datatypes being used. Could you please try converting features and labels to str datatype before calling the fit function?
how did you calculate Aggregated customer feedback at the end for fresh reviews?
Hey Omkar, in the Google Drive Projects Folder, you will see a Sentiment_Predictor code file. We generated sentiment predictions using this code file on our Reviews_FreshDump (also there in the project folder). And then put a pivot to compute positive / negative sentiment predictions.
Towards the end, I also mentioned 'Immediate Business Interventions'. That I figured out manually by going through reviews..
Project folder here: drive.google.com/drive/folders/1KgvHQAaYwprkQJTTOf3TH8OJ_e3SbSGI
How do we get the value 72% of accuracy in 2.1 file by mathematically?can you please explain?
Dear Nagaswapna, the 2 X2 matrix you see at 18:45 is called a confusion matrix. For computing accuracy, we use this mathematical formula: Accuracy = (True Positive + True Negative) / (True Positive + True Negative + False Positive + False Negative). So, essentially, in our case, Accuracy = (67+64)/(67+64+38+11) = ~72.7%.
You may read more on Confusion Matrix here: en.wikipedia.org/wiki/Confusion_matrix
Hope this helps!! 😊
Can u list the topics i need to learn to ace this project.plzz i have to prepare a model on this in a week
Thanks for your interest Rishabh. Sentiment analysis is a hot research area these days. In general, you would like to learn about Feature encodings and representation and Classifiers- Naive bayes, decision trees, RNNs, LSTM etc. For each use-case provided to you, ensure why should we choose one strategy over other.
What is the front end and back end of this project
thank you sir
Hi Hashir, for your information we have recently launched couple of new project, one on: Age, Gender & Emotion Detection (th-cam.com/video/uovo1s1barU/w-d-xo.html) and second on: Credit Scoring (th-cam.com/video/8jzvzRo3Ij0/w-d-xo.html&t)..😊
Do like, share & subscribe to us. This helps us in reaching out to more learners like yourself!!!
@@skillcate can you please make sentiment analysis on 3 sentiments (positive, neutral and negative)
@@hashirsheikh8386 Sure. We will soon be doing a separate project on it. 🙂
@@skillcate thankyou sir I'm waiting for it 🙂
The video is straight forward but I’m getting errors every where, I don’t know how to fix the errors
Sir I am getting error that x is not defined can you please say why it is showing error like that...? Pls it's my major project 1 day left for submission pls help me sir ...
Dear Kavya, first of all apologies for this delay in our response here..
Ideally, this shouldn't have happened. Did you double-check if you are using upper-case? We are using the capital letter "X" here.
Hope this helps. Keep learning :)
@@skillcate okay thanks for your reply
Please I need proposal on sentiment analysis system to be submitting coming Monday
Dear Mohammed, , thank you for your comment & apologies for the delay in response. You can reach out to us at skillcate@gmail.com with the details on your requirements. Happy to help :-)
the corpus() part of the post cleaning data is not correct as I'm running ur code in colab they're saying Review is not define constantly!! i have the drive pathway still the error is occuring constantly
Hey Pranita, could you please write to me at skillcate@gmail.com along with your error message. This sounds weird though..
How can i make frontend for this output?
Sir can you please tell, in your sentiment model how you Create the link of bow and classifier.. I do not get that thing. From where we get that link and how and why we create that.
Hi Maitri. During the training, we obtain bag of words and classifier (Refer to section Model Building in video) which we then save so that we can use both of them in prediction script to make inferences on unseen test cases.
How can I link it with node JS backend
Dear Emran, that's an excellent query!!
We have recently done a ML Project, where we have deployed Model Serialized .pkl file using Flask. Suggest you to go through it to have further clarity on Deployment side.
Hope this helps. Keep learning :-)
great. thank you
It is showing me error that says- NameError: name 'X' is not defined in the Dividing dataset into training and test set part. Can you please tell me what did i do wrong?
Ahh. You most likely have missed to read data in X (as shown in cell 12) or have used lower case (x) character in place of upper case. Could you please verify?
Can i use sentiment analysis as my major project in M. Tech?
Sure. Please reach out to us skillcate@gmail.com.
Can you''l please release a video on how to sentiment as well as emotion analysis please
and also an end to end project on Time Series Univariate and multivariate.Thanks
Hi Lancelot! For your information, we have done a video on Emotion Detection recently (watch here: th-cam.com/video/uovo1s1barU/w-d-xo.html ).
Do like, share & subscribe to us. This helps us in reaching out to more learners like yourself..😊
Can we run this on Jupyter notebook?
Hi Aaisha, I assume by Jupyter notebook, you mean on Jupiter notebook in your local machine. Then, answer is yes. If your machine doesn't have gpu, it may just take longer to train.
Sir, when I am trying to import dataset from drive, it is showing 'File not found error' whereas in my drive, the dataset file is downloaded. Please help me.
Hi Mohit, Could you please ensure that you drive is mounted and path is set to the directory where your colab notebook file is located?
@@skillcate yeah drive is mounted but still it is showing error.
@@mohitmayank5524 Alright. No problem. There is surely some catch here. 1. Could you please check if the file name is correct? 2. If it is, Run command "!ls" in your notebook and see if you are seeing the dataset file there.
I would have rather used lemmatizer instead of stemming. Helps to keep meaning of the word being altered
That's correct! Lemmatization sure retains the meaning of tokens. However, it is computationally expensive & may not always help in significantly improving accuracy. So, there's always a trade-off here. Appreciate your response around this..:-)
Hi thats a great leacture.
i have a query while building my own sentiment model I am having issue with how to deal with the such sentence "cell resource not failed and cell got connected" now this is somehow positive one but I got negative sentiment because it has some negative terms.
is there any better way to do this
In such cases, model needs to be provided with more contextual information. To begin with, you can explore different length word n-gram for BoW representation discussed in the video. Moving toward more complex approaches, you can explore RNNs and LSTMs.
@@skillcate Thanks but for sentiment it does not required any labelled data but for LSTM it requires the data to be labelled and that is what I do not have!
please suggest me.
it is not predicting negetive comments , after entering bad it is still showing "positive" sentiment
is it possible to create sentiment Analysis on php code?
Check out this documentation: php-ml.readthedocs.io/en/latest/
Hi. I tired to put in a comment yesterday but looks likes it didn't post. Let me try again.
I Love this video. I like the way it is simple and easy to follow. I loved the pickle file concept. I wasn't aware of that!
Here is the question / issue I am having.
When I run the exact code for this notebook in Juypter notebook, I get different accuracy. I get around .433. All the code, data is exactly the same.
What could I be doing wrong?
Thanks!
Hey!! This may sound stupid, but can you try redoing the entire flow again? Ideally, nothing should change, if you followed the exact same steps offline in Jupyter notebook.
Also, double check these:
1. if you have updated all paths to your local working directory.
2. if you see 900 rows via shape function (4th code cell)
@@skillcate Thanks. I redid and got the same accuracy as yours. Now, I need to find what I did wrong earlier. I will try to break it again and post as to what I messed up. I want to say, I sort the list and that's what gave be less accuracy. But I will try and let you know. Thank again!
Awesome!! :-)
How did you calculate the value of max_ feature???
max_feature is one of the "hyperparameter" which represents dimensions of space in which each review is represented. There are many approaches available to select the feature dimension. In our video, to keep things simple, we selected max_feature through hit-and-trail (17:00). If you want to learn more about selecting feature dimension in general, please refer to machinelearningmastery.com/dimensionality-reduction-for-machine-learning/.
Hi, I replicated this code for a different analysis. Where I used a relative training data for my model. At the end I got a really low Model accuracy which was as low as 37%. What can i do to have a higher accuracy
Hey Jerry, Good to hear you're implementing it on your side. Here's what you should do next:
- Check your dataset: Make sure your dataset is labeled correctly and has enough examples for each sentiment category. Having a balanced dataset with enough representative samples can improve the model's performance.
- Data preprocessing: Clean your text data by removing unnecessary symbols, punctuation, or special characters. Additionally, removing common words that don't carry much sentiment (known as stop words) can help focus on more meaningful words.
- Feature selection: Instead of using only the Bag of Words representation, try other techniques like TF-IDF or word embeddings (e.g., Word2Vec, GloVe). These methods capture more contextual information and can enhance the model's understanding of the sentiment.
- Model selection: Experiment with different algorithms besides Naive Bayes, such as Support Vector Machines (SVM) or Random Forest. Each algorithm has its own strengths and may perform better on different types of data.
- Hyperparameter tuning: Adjust the settings of your model, known as hyperparameters, to find the best combination for your dataset. Use techniques like cross-validation or grid search to systematically explore different parameter values and select the ones that give the best results.
- More training data: If possible, gather more labeled data for training your model. Having a larger dataset can provide more diverse examples and improve the model's ability to generalize.
- Learn from mistakes: Analyze the errors made by your model. Look for patterns or common mistakes it makes. This analysis can help you identify areas for improvement and guide your next steps.
Remember, achieving high accuracy can be challenging, especially for beginners. It's important to keep learning, experimenting, and iterating on your approach. Don't get discouraged by initial low accuracy, as it's a common part of the learning process.
Do write me back for further specific questions you may have. We may have a 1:1 call too if it's required.
@@skillcate thank you so much for your response.. I have taken some classes since I posted this comment. All you have said confirms aligns with what i learnt. Basically I relieved my training data was highly imbalanced
i can't run in saving bow path
hey faridz, double check if you set your working directory correctly.
Sir i want your help?
Please reach out to use at skillcate[AT]gmail[DoT]com
Thank you very much, great explanation, can I help? Master's thesis entitled Sentimental analysis using Word Sense Disambiguation
Hey Zahraa! Please reach out to us at skillcate@gmail.com :)
@@skillcate The email address is not working😭
what about Neutral reviews ? 😁
Well seriously, who are these people writing neutral reviews :p. Jokes apart, recently I did Sentiment Analysis on Live Tweets. This project has three label: + - ~ (link: th-cam.com/video/YdRTs0LmiuU/w-d-xo.html)
I have done few more Sentiment Analysis Projects: one, on Movie Reviews with Neural Network (th-cam.com/video/oWo9SNcyxlI/w-d-xo.html) and second, on Amazon Reviews with Sklearn Pipeline (th-cam.com/video/lKAdxN0qrgk/w-d-xo.html) - where you shall get multi-label classification flavour.
This current project is basically on Binary Classification. However, you may define 3 (+ - ~) labels, by using a training dataset that has multi-labels (ex: rating on a scale 1-5). R
Happy learning :)
@@skillcate Thanks a lot :)
@@skillcate Hey! In the Live Twitter sentiment analysis project, would it work for other languages also ? like "Sinhala" which is the mother language in Sri Lanka 😂
@@rusiraliyanage6643 Yes sure you can. For this you may use TextBlob Translate Method to first translate the 'Sinhala' tweets to English, and then feed them to NLTK Vader (Vader is not multi-lingual, handles only English for now). Sharing the script here to translate:
from textblob import TextBlob as TB
tweet = 'ඉන්දියාවේ සිට ආයුබෝවන්'
translated_tweet = TB(tweet).translate(from_lang=u'si', to=u'en')
translated_tweet.sentiment
And with this script, you may call Vader on the translated tweet:
SentimentIntensityAnalyzer().polarity_scores(str(translated_tweet))
i have a small project can any one help plz
HI Siddu, for sure we can! You can write to us at skillcate@gmail.com with the details on your requirements. Happy to help :-)
Anyone who has executed this. What was the path for the Bag of words.
where is the link to the data set
Google Drive Link is provided in the description part, Edwin!!
hello sir can you help me please!
In the very last part. I wanted to know how many customers complained about particular issue like for example "Restaurant staff being rude" so how many related reviews were there about this..... Is that possible ?? @SKILLCATE
Yes Abdul. You may do so by looking up for these keywords "staff" and "rude".
You may also use advance Text Analytics techniques, like: Topic Modelling, to know the themes being discussed in given customer reviews.
I am getting error as : input contain NaN
when :
after running,
'from sklearn.naive_bayes import GaussianNB
classifier=GaussianNB()
classifier.fit(X_train,y_train)'
my data does not contain any Nan values though, can u please help?
Dear Viraj, sorry to hear that! Can you confirm if you are running this project in Google Colab or local machine?
Also, are you using the same dataset from the toolkit or some different?
Will get your issue resolved at the earliest possible..
for i in range (0,899):
review = re.sub('[^a-zA-Z]',' ', data['Review'][i])
review = review.lower()
review = review.split()
review = [ps.stem(word) for word in review if not word in set (all_stopwords)]
review = ' '.join(review)
corpus.append(review)
when i run the above this code on jupyter notebook it is showing keyerror
Dear Harsh, try using this snippet instead:
for i in range(0, 900):
review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i])
review = review.lower()
review = review.split()
review = [ps.stem(word) for word in review if not word in set(all_stopwords)]
review = ' '.join(review)
corpus.append(review)