Getting Word Frequency from a Text File using Python Dictionaries

Adam Gaweda

มุมมอง 19 329

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 22 พ.ย. 2024

ความคิดเห็น • 25

@최주연-x9j 2 ปีที่แล้ว ⁺⁵
Hi, I am a college student majoring in computer engineering in South Korea. Your video really helped me a lot with my studies. Thank you.😊
@AMGaweda 2 ปีที่แล้ว
Glad to help!
@comrade_dankbob6876 7 หลายเดือนก่อน ⁺¹
You are my most beautiful sunshine Adam Gaweda, you give me the light of my tunnel. You make the grey days bright with your wonderful smile. You are my pookie wookie stuffy bear-boy and I want to cherish you for days-on-end. Adam, I love you dai-dai-dai-dai-ski
@RRB47tv 2 ปีที่แล้ว
How do I get the least frequent? Excellent video!! Thank you
@AMGaweda 2 ปีที่แล้ว
When I did sorted_values = sorted(sorted_values) you would omit the [::-1] portion. The [::-1] reverses the list so the largest appear first, but sorted(sorted_values) will have the least frequent first. It will be a lot of 1 count words, but that should do what you are looking for.
@bahaminakhtari4997 2 ปีที่แล้ว
Hello, I enjoy watching your videos. This video helped a lot. I do have a question. How would you put the top ten words into a dictionary, where the key would be the word and the count would be the value?
@AMGaweda 2 ปีที่แล้ว ⁺¹
Around Minute 8 there is a function that creates a sorted list of the most frequent words. If you wanted to put the top 10 in a dictionary, you'd need to create a new dictionary and add only the words from the sorted list into it.
@bahaminakhtari4997 2 ปีที่แล้ว
@@AMGaweda I see. Thank you so much for replying!
@ammaralamin-z4l ปีที่แล้ว
hi thanks, can I use it for the Arabic language to count words for me
@solomonngare8382 2 ปีที่แล้ว ⁺¹
Thanks bro
@thomaskersig5291 2 ปีที่แล้ว
Thanks for this!
Using my own file (a .csv which I saved as .txt), I get the following output after running list.(word_count.keys())[:10] =
[' \x00']
Any suggestions of what to do? Does it make sense to rewrite the code to open the .csv, or will I run into the same problem?
Best
Thomas
@AMGaweda 2 ปีที่แล้ว
You'll most likely still run into the issue, since CSV files are just TXT files. Its mostly programs that treat them differently. I'd recommend doing a little "preprocessing" before you count your words by doing things like making all letters lowercase and removing excess white space. Such as your example, it might be good to do something like sentence = sentence.replace("\x00", "") to remove these kind of characters from analysis
@andytamburino1743 2 ปีที่แล้ว
Do you teach a masters class at NCU? im aobut to finish my BA in Comp Sci and man you are an awesome teacher
@AMGaweda 2 ปีที่แล้ว
Thanks, I'm finishing up my PhD now, but hopefully in the fall wherever I end up I'll be teaching there
@pradnyakasar614 2 ปีที่แล้ว
sir,How to find out the count of unique words from multiple text file at one time?
@AMGaweda 2 ปีที่แล้ว
I would still recommend using the counting method from this video but process it across a list of files. Once you've finished each file, the dictionary will have a list of keys you can look at (using the .keys() function). This will give you the list of unique words which you can then get how many by using len()
@LukaDonesnitch 2 ปีที่แล้ว
can you explain how to swap out the .txt file for a .csv file? I'm trying to add a user input line and when the user searches for a word in the csv file on column 3 it prints the output of how many occurrences the word is in the csv file. so far when i make the changes to csv and increase the increment by 1 it has an error message TypeError: string indices must be integers.
@AMGaweda ปีที่แล้ว
It depends on the format of the file, but take a look at my video on using CSVReader th-cam.com/video/116KWyLc6J8/w-d-xo.html
You'll follow the same ideas - getting a list of the words, then use a dictionary to get the count of that word. You may also not need the dictionary, since a running total just needs a for loop to iterate through a list. One trick I like to use is to load the contents of a file into a "contents" file first, ala contents = open(filename, 'r').readlines(). This way, I no longer need to worry about the file handling aspect of my analysis and can instead rely on the list.
@Vagabund92 2 ปีที่แล้ว
Thank you. I learned a lot. I also appeciate the comments in the code.
Only thing is that I didn't get rid off all the punctuation in my text (that I wrote myself as a dummy. I mass copied "thousand.thousand.thousand.thousand" next to each other and it stayed that way).
Would be cool if you could share the code and the Alice in Wonderland text.
@AMGaweda 2 ปีที่แล้ว
I don't share my code mostly to encourage students to code along with me BUT you can download a copy of Alice in Wonderland on Project Gutenburg www.gutenberg.org/ebooks/11
@Vagabund92 2 ปีที่แล้ว ⁺¹
@@AMGawedaokay, I already replicated you Code and thought that copying it would have been handy. :D
@sharma3226 2 ปีที่แล้ว
Sir could you Pleasee guide me how to sort number of frequent words used in pdf document. because i want to learn the most important major words for exam would be very helpful 🙏🏽🙏🏽.
@AMGaweda 2 ปีที่แล้ว
There isn't a "clean" way to extract text from a PDF, however you can utilize some of Python's third-party libraries to do this. For example, PDFPlumber (github.com/jsvine/pdfplumber) will allow you to extract text. Please note, this is expecting the PDF's text to be TEXT. Text inside of graphics or pictures, or pictures of text, will not get extracted.
@sharma3226 2 ปีที่แล้ว
@@AMGaweda okk i able to convert pdf text into text file then...?
@AMGaweda 2 ปีที่แล้ว
@@sharma3226 Then you can do the methods shown in the video

ต่อไป

เล่นอัตโนมัติ