How can we use our own tweet or comment to test the machine. As for now we are using index as median. Tell me if i can actually write my own sentence and check whether its negative or positive?😢
can anyone help, while performing the stemming operation, it's taking a lot of time, i know it takes as he mentioned in the video that it takes 50+min, but why my cpu utilization is only 10%. I mean why vs code isnt utilizing the CPU to 100% and executing the code faster ???
I just completed this project. by the way, how did you not get null values in your stemmed_content ? I was unable to set up my vectoriser, turned out, stemmed_content had 495 null values. I removed them and it worked fine afterwards
@@raunakkakkar1231 Hi can you help me too I had an error at the same line twitter_data['stemmed_content'] = twitter_data['text'].apply(stemming) and the error was NameError : name 'twitter_data' is not defined
@@suyashsawant8928 coz there is no file named twitter data with you
9 หลายเดือนก่อน +1
I have implemented the above code it took some time to stem then I implemented another code with lemmatization it was fast. In general, Lemmas are more computationally expensive than Stemming, when I checked this code again I found a potential change - for each loop, it's calling StopWords instead you can define it once within some variable and call that variable inside the loop. After making these changes it took me 2 mins for stemming using the above code.
can you please tell me how's this part of code after changes you mentioned?
8 หลายเดือนก่อน
@@thegeeks2002 Before it took around 1 hour I guess after these changes it took 2 mins, However I ended up using Lematization even it took 2mins. But this tutorial is really helpful for me to get started with this usecase and then explore more advanced concepts for it.
sir i have tried the same dataset with same code but the accuracy score is much different then yours ,its causing an overfitting condition ,how it can be improved
check that line of code where you split the dataset into training and testing sets. where the random state = 2 was mentioned. Also check whether you executed that snippet or not. that could also be an error.
Just a suggestion it would have been simpler if you directly utilized the methods as they are from the libraries rather than storing each method in a different variable. It kind of makes the code look heavy and also confusing and difficult to follow. Thanks for the tutorial.
try different algorithms, maybe some other algorithm works better and gives a better accuracy. Also, if the dataset is not vast like the one on the video try gathering atleast 1000-10000 of data and train again to check the accuracy.
X has 179715 features, but LogisticRegression is expecting 460873 features as input. I am getting this error while executing predict function on test data
@@archana2467 ı had the same mistake. random_state should be equal to 2 or yo should check taht you have written the X_test equation correctly in the "converting the textual data to numerical data" section . I mean ; X_test = vectorizer.transform(X_test)
sir, there are other options to make this step faster. it took me only 5 mins to perform stemming. Here's the code: import pandas as pd import re from nltk.corpus import stopwords from nltk.stem.porter import PorterStemmer from concurrent.futures import ThreadPoolExecutor from tqdm import tqdm # Initialize stopwords stop_words = set(stopwords.words('english')) def stemming(content): try: port_stem = PorterStemmer() # Instantiate inside the function stemmed_content = re.sub('[^a-zA-Z]', ' ', content).lower() return ' '.join(port_stem.stem(word) for word in stemmed_content.split() if word not in stop_words) except Exception as e: print(f"Error processing content: {content}. Error: {e}") return "" # Return an empty string on error def process_data(df): with ThreadPoolExecutor() as executor: return list(tqdm(executor.map(stemming, df['text']), total=len(df))) # Process the DataFrame in chunks chunk_size = 50000 # Adjust based on your memory capacity num_chunks = len(twitter_data) // chunk_size + 1 stemmed_contents = [] for i in tqdm(range(num_chunks)): start = i * chunk_size end = min((i + 1) * chunk_size, len(twitter_data)) chunk = twitter_data.iloc[start:end] stemmed_chunk = process_data(chunk) stemmed_contents.extend(stemmed_chunk) # Add the stemmed content back to the DataFrame twitter_data['stemmed_content'] = stemmed_contents
we need to optimize the stemming function. like in list comprehension we are using stopwords.words('english') instead of that create a variable outside the function and use that variable instead.. here I guess it will optimize the code than the previous version
hello I am making web app on it using flask but i am facing this error(tweepy.errors.Unauthorized: 401 Unauthorized 89 - Invalid or expired token.) because i have not basic account so can you help me to solve error .please reply
hello friends i have completed the more than half tutorial still need to go.. in the tutorial I have seen that the stemming is taking to much time (almost 50m) so I just optimized it little bit.... code: [ ] pattern =re.compile('[^a-zA-Z]') [ ] english_stopwords = stopwords.words('english') [ ] port_stemmer = PorterStemmer() [ ] def stemming(content): stemmed_content = re.sub(pattern,' ',content) stemmed_content = stemmed_content.lower() stemmed_content = stemmed_content.split() stemmed_content = [port_stemmer.stem(word) for word in stemmed_content if not word in english_stopwords] stemmed_content = ' '.join(stemmed_content) return stemmed_content [ ] twitter_data['stemmed_content'] = twitter_data['text'].apply(stemming) mine just completed in 6m
Related Article: www.geeksforgeeks.org/twitter-sentiment-analysis-using-python/
siddharthan sir, congratuations for joining geeks for geeks.
Great content from Siddhardhan.
you got it bro!
Thank you so much gfg. I even applied neutral tweets in doing this sentiment analysis. I will try to do this for other social media datasets.
Thats a Very Great Explanation sir , Hope much more content like this from GFG and you too sir..
Very Clearly explanation
Pls make a video on career recommendation after secondary school using ML
Nice explanation, completed the project👍
Hi brother. Can you please provide me the code . I will pay you the amount you demand . 🙏🙏🙏🙏🙏🙏
Have you run this project??
@@1anu_ra-dha. Yes
@@1anu_ra-dha. Yes
Have U run the project ?
How can we use our own tweet or comment to test the machine. As for now we are using index as median. Tell me if i can actually write my own sentence and check whether its negative or positive?😢
I have the same doubt. Please let me know as well if you have found the answer.
@@ymmich2143 I aslo having same doubts
I
Convert your tweet into a vector. using the process you showed and then follow the same predication pattern.
Beautifully explained!!! Thanks a lot sir 🙏🏻🙏🏻🙏🏻
PS : How can we use it to make a project which will take input from the user
loved this explaination
Thanks for the video
Nice explanation
can anyone help,
while performing the stemming operation, it's taking a lot of time, i know it takes as he mentioned in the video that it takes 50+min, but why my cpu utilization is only 10%. I mean why vs code isnt utilizing the CPU to 100% and executing the code faster ???
In jupyter notebook I waited for almost 3 hours
I just completed this project. by the way, how did you not get null values in your stemmed_content ? I was unable to set up my vectoriser, turned out, stemmed_content had 495 null values. I removed them and it worked fine afterwards
Greate Exeplaination sir.
Great Explanation
Great eplanation
thanks a lot
u said abt overfitting and we can clearly see the model is overfitted so can we improve the accuracy and make the model little underfit ?
But what if I want to test on new data
Hi can help me i got error at the line 20 of code twitter data['stemmed_content']=Twitter_data['text'].apply(stemming)
Solve hua?
it should be twitter_data not Twitter_data (no capital T)
@@raunakkakkar1231 Hi can you help me too I had an error at the same line twitter_data['stemmed_content'] = twitter_data['text'].apply(stemming) and the error was NameError : name 'twitter_data' is not defined
@@suyashsawant8928 coz there is no file named twitter data with you
I have implemented the above code it took some time to stem then I implemented another code with lemmatization it was fast. In general, Lemmas are more computationally expensive than Stemming, when I checked this code again I found a potential change - for each loop, it's calling StopWords instead you can define it once within some variable and call that variable inside the loop. After making these changes it took me 2 mins for stemming using the above code.
can u give the dataset link ?
@@soukarya_ghosh6612 You can find it in video itself at 3:30
can you please tell me how's this part of code after changes you mentioned?
@@thegeeks2002 Before it took around 1 hour I guess after these changes it took 2 mins, However I ended up using Lematization even it took 2mins. But this tutorial is really helpful for me to get started with this usecase and then explore more advanced concepts for it.
do you have the code with you rn??
Siddharthan Sir
Could you share the Colab link?
If we want to predict the stock market moment with Twitter comments or message , how can we do this
stemming would remove the comma as well as the apostrophe but we need to keep the apostrophe . so what should be the re for that
can you provide the link to the colab please
link to the notebook please....
Hi Sir, Can we split data in X customer_Id with Text ? So we can get which customers said negative reviews ?
sir i have tried the same dataset with same code but the accuracy score is much different then yours ,its causing an overfitting condition ,how it can be improved
Kitni aayi thi bhaii?
Btao
55%@@saurabhojha2832
The same problem,you solve it or not ?
Can anyone help my prog shows
X_test not defined when running the pickle file in another window . What to do
check that line of code where you split the dataset into training and testing sets. where the random state = 2 was mentioned.
Also check whether you executed that snippet or not. that could also be an error.
very useful. but my code does not showing stemmed_content and it's taking only 2 to 3 mins to run instead of 55 mins. what can be the error sir
You didn't add blank space between the quotation marks at the .join statement.
@@sizzrizz6074 that isn't the cause of the problem.
Can you send the PPT file which you showed in the beginning of the video?
Where can I find the complete code for this?
Which library is used like tool?
Is there a way to convert the vectorized numbers to string again?
Just a suggestion it would have been simpler if you directly utilized the methods as they are from the libraries rather than storing each method in a different variable. It kind of makes the code look heavy and also confusing and difficult to follow. Thanks for the tutorial.
How can we increase the accuracy of the model
try different algorithms, maybe some other algorithm works better and gives a better accuracy. Also, if the dataset is not vast like the one on the video try gathering atleast 1000-10000 of data and train again to check the accuracy.
@@shaikhahsan100 have any other suggesstion
how to add kaggle.json Path in Jupyter notebook ............please reply?
thanks.
ho siddharthan sir 😁😁😁😁😁😁😁😁
can anybody tell where is the collab file link
Sır! If we don't have the target column in our Twitter dataset, what should we do?
yes we do..check whether u have selected sentiment140 from kaggle
At the time of fetching api from kaggle
I am getting a key error: content length
Pls help
SIR, HOW TO GET YOUR GOOGLE COLAB LINK FOR YOUR CODE
sir,please provide the complete end to end neural network style transfer projects with web application
what is up and down sampling
siddharthan is here(as instructor)
1:08:00
The output of counts_value not the same your output, why this ?
X has 179715 features, but LogisticRegression is expecting 460873 features as input.
I am getting this error while executing predict function on test data
How you solved this error
@@archana2467 for training data use fit_transform function and for testing data use transform() function
@@archana2467 use fit_transfer function for training data and transfer function for testing data instead of using fit_transfer for testing data
@@archana2467 use fit_transform function for training data and transfer function for testing data instead of using fit_transform for testing data
@@archana2467 ı had the same mistake. random_state should be equal to 2 or yo should check taht you have written the X_test equation correctly in the "converting the textual data to numerical data" section . I mean ; X_test = vectorizer.transform(X_test)
sir, there are other options to make this step faster. it took me only 5 mins to perform stemming. Here's the code:
import pandas as pd
import re
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm
# Initialize stopwords
stop_words = set(stopwords.words('english'))
def stemming(content):
try:
port_stem = PorterStemmer() # Instantiate inside the function
stemmed_content = re.sub('[^a-zA-Z]', ' ', content).lower()
return ' '.join(port_stem.stem(word) for word in stemmed_content.split() if word not in stop_words)
except Exception as e:
print(f"Error processing content: {content}. Error: {e}")
return "" # Return an empty string on error
def process_data(df):
with ThreadPoolExecutor() as executor:
return list(tqdm(executor.map(stemming, df['text']), total=len(df)))
# Process the DataFrame in chunks
chunk_size = 50000 # Adjust based on your memory capacity
num_chunks = len(twitter_data) // chunk_size + 1
stemmed_contents = []
for i in tqdm(range(num_chunks)):
start = i * chunk_size
end = min((i + 1) * chunk_size, len(twitter_data))
chunk = twitter_data.iloc[start:end]
stemmed_chunk = process_data(chunk)
stemmed_contents.extend(stemmed_chunk)
# Add the stemmed content back to the DataFrame
twitter_data['stemmed_content'] = stemmed_contents
what to do when null values are found in dataset
where is link to colab file
sir how to get this project coalab sheet. you told me that link in description.i searched it,i did not get
Go and search for google colab in google and then click on new notebook .so u will get it
Twitter API is not free, what to do now ??
please share the collab notebook link
Is this Siddharth? That guy who has a telegram channel named Machine Learning and also a youtube channel?
If so! Great to see you buddy❤..
can anyone tell me where we can find the all code in this is project
You can find everything in this article : www.geeksforgeeks.org/twitter-sentiment-analysis-using-python/
the way he saved model was not enough, you have to lay a data pipeline for new data and also save the vectorizer
51:27
stemming process is taking too long time to execute, due to which i'm not able to apply modelling
we need to optimize the stemming function. like in list comprehension we are using stopwords.words('english') instead of that create a variable outside the function and use that variable instead..
here I guess it will optimize the code than the previous version
How to deploy this model on web??
Can anybody please help....?? 🙏🙏
@cll1670 thanks for help
what about creating an interface This is just a suggestion if u have any other ideas please share them.
Here from Seasons of AI 2024👋🏼
hello I am making web app on it using flask but i am facing this error(tweepy.errors.Unauthorized: 401 Unauthorized 89 - Invalid or expired token.) because i have not basic account so can you help me to solve error .please reply
sir how to get colab lin
how to get colab link
Where is the code??
28:14
can you plz give me all souce code of this project
22.25 code shows error. can anyone help?
can u elaborate more
can you provide the source code please
What if you your true value you got=0
But model prediction is 1
sir why we are not perfoming lemmatization
based on requirement.lemmatization is some what slow when compared to stemming
I didn't know 1.6 million was actually 16 million.
Can you share the codes
my training data accuracy is 99% and test data accuracy is 50 my model is overfitted how to avoid it
L1 and L2 regularization are techniques used to prevent overfitting by adding a penalty term to the loss function during training
Hi brother. Can you please provide me the code . I will pay you the amount you demand . 🙏🙏🙏🙏🙏🙏
I also face the same problem
Where is the code
Can u please provide the source code?
snaives bayes multinomial model has a higher accuracy
50.22
Can anyone tell me is twitter api free
No
It's ML not YemYel
😄
What the point you considered 😅
😂😂😂
Bro concentrate on problem statement
develop brain tumor detection project using cnn
hello friends i have completed the more than half tutorial still need to go..
in the tutorial I have seen that the stemming is taking to much time (almost 50m) so I just optimized it little bit....
code:
[ ] pattern =re.compile('[^a-zA-Z]')
[ ] english_stopwords = stopwords.words('english')
[ ] port_stemmer = PorterStemmer()
[ ] def stemming(content):
stemmed_content = re.sub(pattern,' ',content)
stemmed_content = stemmed_content.lower()
stemmed_content = stemmed_content.split()
stemmed_content = [port_stemmer.stem(word) for word in stemmed_content if not word in english_stopwords]
stemmed_content = ' '.join(stemmed_content)
return stemmed_content
[ ] twitter_data['stemmed_content'] = twitter_data['text'].apply(stemming)
mine just completed in 6m
I can't thank you enough 🛐finally I can continue the video😭
@@infinity8982did u complete it bro is it working??
can u provide the link of source code or share it with me
Hey can u please provide the whole source code if u have done
49:52