So i have a csv file with two columns, first is text and second is class. When i use the apply the filter, i don't see the list of words in my fist column, i simply don't see the attributes the way shown in your video @4:22 ! Any ideas how to get that ? When i click on the Edit, i see that Weka is treating each row in the first column as a whole word, meaning it doesn't split words in the sentences. I tried using stemmer, tokenizer, etc etc but i am still getting the saem ieeuse !
Probably that your string data get converted into a nominal type (instead of a string type) when loading it into WEKA. StringToWordVector doesn't support nominal data type and does not work though.
@jengolbeck, Well, as a member in WEKA community I can say that there are 2 main issues with your video: First: it is not bad way of using the "StringToWordVector" filter, but not absolutely the correct way, as this approach in the way you explained, brings some class information to the tokens, which provides clue to the machine learning algorithm later on about the class type and then provides an optimistic result such the one you had. Second: NaiveBayesMultinomialText can work with string data type, so the default StringToWordVector is not really necessary if you managed to use NaiveBayesMultinomialText classifier ;)
Really appreciate if you can do a video about how to convert CSV files or .txt files into ARFF files. Is there any cleaning process before we convert it to ARFF or anything. Because a lot of students are suffering including myself due to this issue. Thank You...
In that scenario, it's not clear what you would be using Weka for. Weka allows you to build a model based on your data. If you don't have labels on the data, there is nothing to train a model from. From your comment mentioning "positive or negative", I feel like you might be interested in doing sentiment analysis? If that's the case, you would want to use an off-the-shelf sentiment analysis tool.
Thanks for your explanation, I have a data set of Arabic tweets, when i try to open it using WEKA a question marks appear!, is there a way for defining the Arabic language in WEKA ? Regards,
Try to save the arff file using notepad as utf-8 format instead of ANSI. Then, sure it will read Arabic texts. Or use the CLI with updated package of languages fetched from java updates!
Thanks for explaining this without much jargon. Your teaching style is friendly and accessible. Cheers 😀
How did you create this arff file, I tried many times but did not
tooo much nice explanation love your way of teaching..
So i have a csv file with two columns, first is text and second is class. When i use the apply the filter, i don't see the list of words in my fist column, i simply don't see the attributes the way shown in your video @4:22 ! Any ideas how to get that ? When i click on the Edit, i see that Weka is treating each row in the first column as a whole word, meaning it doesn't split words in the sentences. I tried using stemmer, tokenizer, etc etc but i am still getting the saem ieeuse !
When you apply the StringToWord vector, what does it show?
Nothing happens. I still see the window of Attributes with only the names of my attributes, on the selected attribute window, nothing changes !
If you want, drop me an email at jgolbeck@umd.edu with your file and I'd be happy to take a look
Probably that your string data get converted into a nominal type (instead of a string type) when loading it into WEKA. StringToWordVector doesn't support nominal data type and does not work though.
I get this same issue. What is the solution?
Thanks a lot for the videos on Weka. I like the way you explain stuffs, they are very clear and easy to understand :)
@jengolbeck,
Well, as a member in WEKA community I can say that there are 2 main issues with your video:
First: it is not bad way of using the "StringToWordVector" filter, but not absolutely the correct way, as this approach in the way you explained, brings some class information to the tokens, which provides clue to the machine learning algorithm later on about the class type and then provides an optimistic result such the one you had.
Second: NaiveBayesMultinomialText can work with string data type, so the default StringToWordVector is not really necessary if you managed to use NaiveBayesMultinomialText classifier ;)
Really appreciate if you can do a video about how to convert CSV files or .txt files into ARFF files. Is there any cleaning process before we convert it to ARFF or anything. Because a lot of students are suffering including myself due to this issue. Thank You...
Great tutorial, straight to the point, thanks!
Hello. What if the dataset is not labelled. it's just plain reviews with no label i.e. positive or negative. How do you go about labelling this
In that scenario, it's not clear what you would be using Weka for. Weka allows you to build a model based on your data. If you don't have labels on the data, there is nothing to train a model from. From your comment mentioning "positive or negative", I feel like you might be interested in doing sentiment analysis? If that's the case, you would want to use an off-the-shelf sentiment analysis tool.
can you share with us your trump.arff file please?
Thanks alot ❤
you are awesome ! Thank you for the very informative video
Thanks for this very helpful vedio!
Thanks for your explanation, I have a data set of Arabic tweets, when i try to open it using WEKA a question marks appear!, is there a way for defining the Arabic language in WEKA ?
Regards,
Try to save the arff file using notepad as utf-8 format instead of ANSI.
Then, sure it will read Arabic texts. Or use the CLI with updated package of languages fetched from java updates!
😍
Does the word COVFEFE appear in the list? 😁