Kaggle Faker News Classifier Using LSTM- Deep LEarning| Natural Language Processing

Krish Naik

มุมมอง 113 323

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 20 ธ.ค. 2024

ความคิดเห็น • 182

@prajg4310 3 ปีที่แล้ว ⁺¹⁸
those who are seeing this after tensorflow 2.5.0, predict_classes(X_test) will not work, instead use:
y_pred = (model.predict(X_test) > 0.5).astype("int32")
@MrHamid-ct7hy 2 ปีที่แล้ว
I used this but after using this accuracy only 40 percent
@zeelthumar ปีที่แล้ว ⁺¹
Thanks for pointing out this crucial information....can you please give the reference link for ".predict()"
@tanmayjadav213 ปีที่แล้ว
Yaa it happened to me too,
So do you find any solution to that ?@@MrHamid-ct7hy
@manojitsaha6262 หลายเดือนก่อน
Thank you for the solution...😊❤🙌
@suyashdandekar4653 4 ปีที่แล้ว ⁺¹⁰
Dude you are the best ML tutor ever. You not only focus on theory but also teach practical implementations and that too with in a video not more than 30 minutes. I truly love your teachings and am always excited for latest of your videos
@sukanthenss914 4 ปีที่แล้ว ⁺³
Thank you so much. I was waiting for this session for months..
@krishnaik06 4 ปีที่แล้ว ⁺¹
Finally
@estefaniafreiretubay2670 4 ปีที่แล้ว
@@krishnaik06 Hola, ¿podrias activar los subtitulos? Por favor. Muchas gracias
@jaikishank 3 ปีที่แล้ว
That was a great presentation and could work on 23000 observation data set with good accuracy. Many thanks Mr Krish for the knowledge disseminated.
@peddamallamuralikrishna5557 2 ปีที่แล้ว ⁺³
I followed step by step process which you explained sir, even though i got an error as Classification metrics can't handle a mix of binary and continuous targets for this i used e-mail spam detection data set
@kedarhardikar575 2 หลายเดือนก่อน
y_pred=model.predict(X_test)
y_pred_class = [1 if prob > 0.5 else 0 for prob in y_pred]
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test,y_pred_class)
@AbdulSattar-zq5yg 4 ปีที่แล้ว ⁺⁷
Great work! Please make a video on LSTM with class imbalance dataset
@subarnasamanta4945 4 ปีที่แล้ว ⁺¹
sir you have achieved more accuracy with the help of passiveaggressive classifier and i too 90% is less than that(in your members only project playlist)
@hinata4661 4 ปีที่แล้ว ⁺³
Hey Kish, the video was great. In NLP the most important step is preprocessing. It will be very useful if you can demonstrate different preprocessing techniques for different problems like Twitter analysis where hashtags and URLs should be removed. If possible make a video tutorial on Regex.
@naumanmansuri4441 4 ปีที่แล้ว ⁺¹
Atlast released requested during live thanks!
@krishnaik06 4 ปีที่แล้ว ⁺²
Trying my best to uplaod soon as possible
@jayshreesharma6879 4 ปีที่แล้ว ⁺⁶
Sir please upload multiclass classification confusion matrix with F1 score precision and other measures..with full code..and explanation...
@krishnaik06 4 ปีที่แล้ว
Sure
@kalppanwala6439 4 ปีที่แล้ว ⁺³
Hey Krish, i got an test accuracy of 99.8% on this dataset using bert :)
@krishnaik06 4 ปีที่แล้ว ⁺⁶
Bert will be the upcoming videos
@mohdzuhaib4138 4 ปีที่แล้ว
What's bert?
@kalppanwala6439 4 ปีที่แล้ว
krish, if possible also explain about robert xlnet and many such versions
thanks for your content !!
@mizgaanmasani8456 4 ปีที่แล้ว
@@krishnaik06 We will be eagerly waiting :)
@rukshanthdevakumar9421 4 ปีที่แล้ว ⁺⁴
Thanks Krish that was good implementation , am trying to understand if we can use tfidfvectorizer during nstm model , but I can see sequence and one hot gives great result , even though I am just learning NLp with ML still not into deep learning part yet .
@krishnaik06 4 ปีที่แล้ว ⁺¹
Check my NLP playlist
@rukshanthdevakumar9421 4 ปีที่แล้ว
@@krishnaik06 thank you
@adeyinkasotunde6870 3 ปีที่แล้ว
Sir you're doing a great job. I am one of your faithful fans. My suggestion though, sir try to always include test data in your tutorial. So that we can always see the model effectiveness on test data after using the model to make prediction on unseen test data. And this is very Paramount to all your fans. I must say.... Thank you very much for your great job always ♥️👍🙏
@dheerajkumark2268 4 ปีที่แล้ว ⁺³
Thank you so much sir 🙏, it is really helpful.
@shubhamsankpal270 4 ปีที่แล้ว ⁺³
Hey Krish, just a quick question. The X.shape in the video shows the output as (18285, 20) but as the data have one 5 columns of which we are not considering "label" column for X so it should show X.shape as (18285, 4) isn't it?
@shindepratibha31 4 ปีที่แล้ว ⁺¹
True.
@VarunSharma-ym2ns 4 ปีที่แล้ว
The same here
@aekanshgupta6642 2 ปีที่แล้ว ⁺¹
Why are we using one hot encoding here instead of TFIDF or Bag of words to vectorize this data?
@joeljoseph26 3 ปีที่แล้ว ⁺²
sent_length=20 is chosen based on the maximum number of words in a sentence or is it top 20 words or random? If there are words in a sentence of more than 20 words, then what to do (adjust the sent_length, or is there any method to handle this scenario)?
@novagamings4505 6 หลายเดือนก่อน
Use the value of sent_length depending on the sentence which contains the maximum words.
@nidhisolanki5314 3 ปีที่แล้ว
Krish sir in my opinion you are the best data science teacher in youtube.please keep making practical sessions like this in deep learning
@chitneedihemanthsaikumar7511 4 ปีที่แล้ว ⁺¹
Hi sir,
Why is the padding done 'pre'..is there any specific reason.. i mean can it be done as 'post'
also the GIT URL is having apostrophe ' which is creating a 404 please correct if possible.
@krishnaik06 4 ปีที่แล้ว
U can use post and verify which works well...fixed the github issue
@MuhammadAbdullah-gx2ou ปีที่แล้ว
Dear, its great to learn, one thing is can you please make a video on Fake news detection on unlabeled data set and method like transformer based learning technique. Thanks
@bhismaosti7949 4 ปีที่แล้ว ⁺¹
Hello Data Science Community, i need a small help for the deployment of DL model in heroku . I want to deploy a model using flask which include tensorflow2.2.0 but the tf is 500+ mb and couldnot support more than 500+ mb by heroku and gives application error. Please provide me a help. (Note: I am a begineer in DL field)
@souravrishishwar9982 4 ปีที่แล้ว ⁺¹
Deploy it on aws, Heroku allows deployment of
@aqibhuda7652 4 ปีที่แล้ว ⁺¹
do we need to code every single step or is there a hint of code already provided in a tensorflow website? please reply
@krishnaik06 4 ปีที่แล้ว ⁺¹
This is not from tensorflow website...we have to write our own code
@tech-talks-with-shakeel0346 4 ปีที่แล้ว ⁺¹
Hi krish sir
I have developed my own emotion detection lstm tensorflow model for our regional language.
How to deploye this model with an android app to make it into production ?? Any help plz
@kalppanwala6439 4 ปีที่แล้ว ⁺²
search for tensorflow lite
@theniyal 4 ปีที่แล้ว ⁺¹
Hey Krish, one hot does the same thing as Tensorflow Tokenizer, right?
@krishnaik06 4 ปีที่แล้ว ⁺¹
See my previous videos to understand about one hot representation
@nickrathee8891 2 ปีที่แล้ว ⁺¹
if we don't have any labeled data then how to do it?
@anaswarak6242 3 ปีที่แล้ว
How test_size is taken? 18:30
@sandipansarkar9211 3 ปีที่แล้ว
great work .Practiced the code in colab
@subhanjanbasu6148 4 ปีที่แล้ว ⁺²
Hey Krish, do we need good GPU for running deep learning projects?
@krishnaik06 4 ปีที่แล้ว ⁺³
Start with Google colab
@Vignesh0206 4 ปีที่แล้ว ⁺²
sir, is the deep learning playlist of yours is completely enough to learn the deep learning ?
or do u have plans to update it ?
@krishnaik06 4 ปีที่แล้ว ⁺¹
More videos will be updated...
@subhashachutha7413 4 ปีที่แล้ว ⁺¹
krish if words mmore than voc_size then what happens???
@krishnaik06 4 ปีที่แล้ว ⁺¹
I don't think so there will be more than 10k vocab size
@subhashachutha7413 4 ปีที่แล้ว
If it happens like i am having more than 20k unique words in tweet vocab size is less than that then what happens sir????
@ifmondayhadaface9490 4 ปีที่แล้ว ⁺¹
Subhash Achutha This comment is late, but if vocab_size is 10,000, it will only take the 10000 most common words. Any other words will be deleted.
@subhashachutha7413 4 ปีที่แล้ว
@@ifmondayhadaface9490 thank you for answering
@ifmondayhadaface9490 4 ปีที่แล้ว ⁺¹
Subhash Achutha You’re welcome.
@harissaeed5811 2 ปีที่แล้ว
@krish Naik Sir i am following ur pattern on Roman Urdu data set but my accuracy is nearly 48 percent . how can i implement roman urdu stop word please any 1 help me in this
@shreyjain6447 2 ปีที่แล้ว ⁺²
You are splitting the data after using the embedding which might lead to data leakage. Is there a way to avoid this?
@mdraihanulislamtomal6064 2 ปีที่แล้ว
According to Bangla dataset it's not working. what I should do? I am facing problem,when I print corpus. It's seems empty such as [ " ", " "," ", " " ]
@bhavinmoriya9216 3 ปีที่แล้ว
Thanks for the video. DO I need to do minmaxscaling?
@ganeshgulati5780 4 ปีที่แล้ว ⁺¹
Hi Krish, can we use Pytorch ?
@krishnaik06 4 ปีที่แล้ว ⁺²
Yes ofcourse
@dineshkumarprabhakar5525 4 ปีที่แล้ว
it is ok with binary classification. what changes are required for the multiclass classification.
@MrHamid-ct7hy 2 ปีที่แล้ว
y_pred=model.predict_classes(X_test)
after this line i got this error and when i solve this error the accuracy of model only 56%
error
('Sequential' object has no attribute 'predict_classes')
@guptagaurav916 2 ปีที่แล้ว
Please post if you find a solution
@heysoumyadeep 2 ปีที่แล้ว ⁺¹
Try
y_pred = (model.predict(X_test) > 0.5).astype("int64")
@MrHamid-ct7hy 2 ปีที่แล้ว ⁺¹
@@heysoumyadeep got 50 percent accuracy to using this
@rudranshbhardwaj2547 2 ปีที่แล้ว
@@MrHamid-ct7hy my accuracy is 66%
@rudranshbhardwaj2547 2 ปีที่แล้ว
bro just the last 5 cell continously you will get some better accuracy eash time.
@VarunSharma-ym2ns 4 ปีที่แล้ว
Hey Krish, The X.shape in the video showing (18285, 20) the data have 5 columns, while I am using my train.csv, it is showing X.shape as (18285, 4). Could you please tell this reason. Where is the mistake
@devendrachavan765 4 ปีที่แล้ว
(18285, 20)==after doing embedding got 20 columns
@manasviemmadi8072 ปีที่แล้ว
@@devendrachavan765 no, check in the beginning itself...it shows 20 columns for x.shape
@mak_kry 10 หลายเดือนก่อน
It’s better to use Stratify when you’re splitting your dataset into train and test to keep the same balance between classes in train and test
@ayushthakral6692 2 ปีที่แล้ว
In my code , the review in for loop only shows last row after re.sub...Please help me with this.
@subarnasamanta4945 4 ปีที่แล้ว
sir i think countvectorizer should only be used during train time i.e. fit(train) and transform through that data only to both whereas you have apply fit_transform at the beginning which may be a wrong approach
here our model is also learning from test data which should not be our aim
@viviennele7760 4 ปีที่แล้ว
great video! how do you plot the graph of the training loss vs. validation loss over the number of epochs? and how do you plot the graph of training accuracy vs. validation accuracy over the number of epochs?
@novagamings4505 6 หลายเดือนก่อน
```model_history = classifier.fit(x_train,y_train, validation_split = 0.33, batch_size=10, epochs = 100)
plt.plot(model_history.history["accuracy"])
plt.plot(model_history.history["val_accuracy"])
plt.title("model accuracy")
plt.ylabel("accuracy")
plt.xlabel("epoch")
plt.legend(["train","test"], loc="upper left")
plt.show()```
You can use this code. Make sure to save model history when training the model.
@RitikSingh-ub7kc 4 ปีที่แล้ว ⁺¹
Krish, can you explain some applications of nlp using lstm like next word prediction, translation and Image captioning ?
@krishnaik06 4 ปีที่แล้ว ⁺¹
Sure Ritik
@MAYANKAGARWAL1234 4 ปีที่แล้ว
Hi Krish, I wanted to know that why we didn't used any Flatten layer as embedding will be a 2d Vector?
@karthikvegeta5981 3 ปีที่แล้ว
sir may i know why we use 1d and 2d layer in nlp like we use in cnn?
@nek_insan 2 ปีที่แล้ว
Hello sir, Thanks for the great explanation, I have one doubt here, could you please clear that. While testing this model if we have sentence more than sent length , then how we can handle the padding?
@PatientInAffliction 3 ปีที่แล้ว
can someone please explain why vocab size of 5000 was chosen? Shouldnt vocab size be = the vocab of the corpus?
@munishrajora2303 4 ปีที่แล้ว
We should use padding = "Pre" or "post" . can you help to provid the mathematical insights.
@iamrahulkumar11 3 ปีที่แล้ว
Cast string to float is not supported
[[node binary_crossentropy/Cast (defined at :2) ]] [Op:__inference_train_function_9451]
Function call stack:
train_function
Hi sir , I am getting this error while fitting the model , could you help me out
@laharipenmatsa1276 4 ปีที่แล้ว
Great work sir, plz implement with soft attention- blstm
@mihirjoshi8792 4 ปีที่แล้ว
i have one question, how will it identify negative statements? for example: "this earphone is not good as the another one.". In this statement stopwords will remove "not" but it is the most important word.
@sreekanthkumar4297 3 ปีที่แล้ว
what we will be giving input to this model
@rog0079 4 ปีที่แล้ว
hey, great video
next time can u also add like implementing this trained model with a real example, like creating a negative or positive title and giving it to the model to predict the output, this would just add cherry on top :D
@krishnaik06 4 ปีที่แล้ว ⁺¹
We can do the deployment of this model and yes it will lool good
@navinseab3620 3 ปีที่แล้ว
Do you know how we can make a prediction with this model since we can not save one hot for later use
@Jai_Ram.2602 3 ปีที่แล้ว
sir actually u are applying pad_sequence for max_len = 20,but length of some list in one_hotrepr is more than 20 ,so it decreases that sentences to 20 words if u do so we lose some words right sir......so can we find the maxlength of the sentences in one hotrep and give it as max_len right sir
@sagarwaghela1118 4 ปีที่แล้ว
Krish sir in case my Data dose not have labels feature that is 0 and 1 then in that case do we have to give that manually or is there any coding technique for the same?
@louerleseigneur4532 3 ปีที่แล้ว
Thanks Krish
@unclesam7853 3 ปีที่แล้ว
Do you know how to predict whether a single sample is spam or not?
Like there is this text msg and we have to predict whether it is spam or not
Using this model to make a single prediction and get the probability that this msg is spam/fake news
@fatemehebrahimi7575 3 ปีที่แล้ว
thanks a lot, it is really helpful.
@oriabnu1 3 ปีที่แล้ว
how can we use LSTM for text steganography
@shindepratibha31 4 ปีที่แล้ว
What does LSTM(100) indicate? I mean we have 3 gates in LSTM and how this 100 neurons coming into picture? Can anyone please explain this?
@pallebharath3138 4 ปีที่แล้ว
sir while fitting the model we are getting accuracy 1.00..Why dont the model is said to be overfitting..Can anyone sort out this.
@somyajain5579 4 ปีที่แล้ว
Hello Krish... how to predict for a single instance in this case...?
@satyammishra3355 4 ปีที่แล้ว ⁺¹
If we remove the emojis, can we increase the accuracy in similar datasets?
@ravilourembam9412 3 ปีที่แล้ว
@Krish Naik can you do a tutorial on different level sentiment analysis.. classifocation at word level , sentence level, doc level?
@sachins4522 4 ปีที่แล้ว
hi,
is there any impact of pre or post padding sequence?
@ajitkumar2670 4 ปีที่แล้ว
Hello Krish,
I hope you are doing
can you please tell me how to decide vocab_size
@mohdzuhaib4138 4 ปีที่แล้ว ⁺²
Amazing thank you
@abhicasm9237 ปีที่แล้ว
how does X.shape is (18285, 20). Doesn't that 20 means 20 features. I could only see 4. Somebody please explain
@jitendrakumarshah3443 4 ปีที่แล้ว
How we can decide length of the sentence?i.e set_legth?
@tarunshukla1521 4 ปีที่แล้ว
hey, krish.. been following all videos very closely. did you miss to upload Adam optimizer and binaryentropy video? Cuz the optimizers you discussed in your videos were adagrad, adadelta, rmsprop etc. but in almost all your codes, you use Adam. Would be great if you can explain that too and insert appropriately in the playlists? Thank you for all the hard work.
@Sonu-ev9zp 4 ปีที่แล้ว ⁺²
Sir please make a video on deployment of ml model on android app...
@krishnaik06 4 ปีที่แล้ว ⁺¹
Sure in the upcoming videos
@navinseab3620 3 ปีที่แล้ว
How can we predict new sentence with this model sir?
@ssamiit 4 ปีที่แล้ว
Great video! Just one comment - the validation set is same as X_test, and its accuracy is being displayed after each epoch. So, maybe there is no need to predict on X_test (again).
@jaikishank 3 ปีที่แล้ว
Suggestion is we can use the following code to extract the validation data from the train set itself.
base_history=model.fit(X_train,y_train,validation_split=0.2,epochs=10,batch_size=100,verbose=1)
@prakashkafle454 3 ปีที่แล้ว ⁺¹
how this can be modified for multicalss ?? i am getting an error
@yogendrapratapsingh7618 3 ปีที่แล้ว
@@prakashkafle454 Hey did you get the answer cuz im having the same problem
@prakashkafle454 3 ปีที่แล้ว
@@yogendrapratapsingh7618 ya I did it myself
@manasviemmadi8072 ปีที่แล้ว
@@prakashkafle454 how? could you tell as to what lines of code you added?
@penugondasaichand692 3 ปีที่แล้ว
is embedding layer and word2vec are same ???
@krishnakumarprathipati7186 3 ปีที่แล้ว
Sir make a video on word embeddings using BERT
@tttanvi 4 ปีที่แล้ว
Can you pls create video for glove and word2vec also for the same problem - Fake News Classifier.
@kjayeshnaidu6012 4 ปีที่แล้ว
Sir I have a question when I used the text attribute instead of title I got an validation accuracy of 83%. Can anybody help me with how can I improve
@shashanksingh5411 4 ปีที่แล้ว
anyone know how to use the weight of this model to make a flask app out of it?
@PraveenKumar-pd9sx 4 ปีที่แล้ว
Why No Flatten layer before Dense?
@manishsharma2211 4 ปีที่แล้ว
Why the x shape has 20 cols ?? It has only 4 right
@AkshayDudvadkar 3 ปีที่แล้ว
#preprocessing the data
import gensim
title_text = df.title.apply(gensim.utils.simple_preprocess)
@pratikshajd 3 ปีที่แล้ว
Hello,your all videos are very informative.I want 1 help related to data set...how can I contact you.please let me know.Thanyou
@subarnasamanta4945 4 ปีที่แล้ว
sir please tell hoe to do fit one hot code in train data and fit on test data
@mikiyasassefakassa9136 2 ปีที่แล้ว
sir how to remove self built stopwords using python
@kumaragurumohan9751 3 ปีที่แล้ว
Could you please provide the notebook link
@Abhishek-fw7oo 4 ปีที่แล้ว
Im beginner but still watched full video
@PAVANKUMAR-vx9ty 4 ปีที่แล้ว ⁺¹
What about docker tutorials tat you started?
@krishnaik06 4 ปีที่แล้ว ⁺²
Coming up.. tomorrow next video
@PAVANKUMAR-vx9ty 4 ปีที่แล้ว ⁺¹
@@krishnaik06 can you please add django like flask to create docker and then after kubernetes
@krishnaik06 4 ปีที่แล้ว ⁺¹
Yes after this I can go ahead with django too
@pranavpratyush4075 ปีที่แล้ว
You are using the dataset which has 20 columns - X.Shape (18285, 20) And the link u gave in the description has only 4 columns excluding Label because u use that as output. Kindly provide the dataset link you are using
@mahendrakumarbhooshan9706 4 ปีที่แล้ว
Sir can you provide a tutorial on how to train annotated video dataset? There isn't a quality article on this topic.
@DP-od4yr 4 ปีที่แล้ว ⁺¹
with dropout = 0.3, accuracy is reducing to 67% as false positives are increasing
@karansagar7870 4 ปีที่แล้ว
Error tokenizing data. C error: Expected 5 fields in line 4772, saw 7
same dataset giving error on colab but working on kaggle
@sanabasingha8918 2 ปีที่แล้ว
frustrated.......
NotImplementedError in LSTM
Tried many solutions yet not resolved....
@saitharun3334 4 ปีที่แล้ว ⁺¹
Hey Krish, when we do padding, it is assigning zero right, but zero is a index of a word in the vocabulary right, instead of zero can we assign -1?
Curious to know
@manishv963 4 ปีที่แล้ว ⁺¹
Hi Sai,
We have used onehot method to represent our data. If you print onehot represntation in the console you will find index of every number is greater than zero.
We are applying padding to one hot representation which does not have index as 0 so there is no problem in adding zero in it
I ran the following code to get minimum index
min_value = -1
for a in onehot:
if len(a) > 0:
if min(a) > min_value:
min_value = min(a)
print(min_value)
The answer was 4917.
@roshankumarsharma8725 4 ปีที่แล้ว
sir please make a video on sentiment analysis
@milantripathi720 4 ปีที่แล้ว
would you decrease audio volume for start of your video....the difference is too much...before and after
@ketkinimdeo1304 4 ปีที่แล้ว
hi krish, can you make a video on time series example using LSTM?
@sonalgarg5628 4 ปีที่แล้ว
Expected string or byte like object error occurred in datapreprocessing
@ShirishSonvane 4 ปีที่แล้ว
I guess you are passing data frame where that function requires string.
@manojrangera 3 ปีที่แล้ว
In this model the validation loss is increasing so model is Overfitting so to stop that we need to use early stopping so that we didn't overfit our model...
@murtuzadahodwala301 4 ปีที่แล้ว
How can we predict for a new unseen news?
@Lucifer-wd7gh 3 ปีที่แล้ว ⁺¹
Ok guys I have one request, use lemmatize instead of stemming
@datastory5244 3 ปีที่แล้ว
nice video!! but I didn't understand that why you said that we can't achieve an accuracy above 83% without LSTM😕😕😕

ต่อไป

เล่นอัตโนมัติ

Stock Price Prediction And Forecasting Using Stacked LSTM- Deep Learning