Forgot to mentioned that the output from read_html method is a list. To convert the list object to a DataFrame object, simple extract the first element from the output. For example df = df[0].
dude ur awsome 😀 . did my first wikipedia scrapping referring to this video.Thanks for that, now the only problem i have is that there are multiple tables in the page and the output im getting is of the table thats on the top of the original table that i intended to scrap. trying to figure it out
For those with trouble finding table_id: You can use table class name, instead of the table_id (i.e: ) In that case, I made a change to these 2 lines of code: table_name = 'wikitable sortable' soup_table = soup.find('table', {'class':table_name}) Hope this helps
I tried to do so but for wiki pages with several tables by the class_name = 'wikitable sortable' the program only sends back the 1st one it finds... How do i get the other ones ? Thanx
Thanks I am so nearly there! One question. I get to 5 mins 48 secs with the same results as Jie. But when I try to print(df),the terminal says: "Traceback (most recent call last): ///File "", line 1, in ///NameError: name 'df' is not defined". From my understanding I have defined df in line 12 - so I can't work out why it's not working? I am a newbie so answers for dummies appreciated.
Dumb mistake where I needed to write print(df) at the end of the programme and select all the line of code and run that - it looked like you wrote it into the terminal which didnt workfor me
This is something I failed to mentioned in the video. To convert the df (while still is a list) to a DataFrame object, extract the first element. For example df = df[0].
Forgot to mentioned that the output from read_html method is a list. To convert the list object to a DataFrame object, simple extract the first element from the output. For example df = df[0].
dude ur awsome 😀 . did my first wikipedia scrapping referring to this video.Thanks for that, now the only problem i have is that there are multiple tables in the page and the output im getting is of the table thats on the top of the original table that i intended to scrap. trying to figure it out
i figured it out . when multiple tables have same attributes , we just need to find the corresponding index of the table and mention it .
For those with trouble finding table_id:
You can use table class name, instead of the table_id (i.e: )
In that case, I made a change to these 2 lines of code:
table_name = 'wikitable sortable'
soup_table = soup.find('table', {'class':table_name})
Hope this helps
this helped out a lot. thanks for sharing
I tried to do so but for wiki pages with several tables by the class_name = 'wikitable sortable' the program only sends back the 1st one it finds... How do i get the other ones ? Thanx
thanks alot. this helped
thanks man!
@@miloyang5893 You can try the soup.find_all() method instead of soup.find(). It will return a list of all the concerned tables.
Why would you want to scrape a table instead of text? What would a table be used for?
I cant find a table ID on the wiki page
Nice and simple, thanks man!
Thanks I am so nearly there! One question. I get to 5 mins 48 secs with the same results as Jie. But when I try to print(df),the terminal says: "Traceback (most recent call last):
///File "", line 1, in ///NameError: name 'df' is not defined".
From my understanding I have defined df in line 12 - so I can't work out why it's not working? I am a newbie so answers for dummies appreciated.
Dumb mistake where I needed to write print(df) at the end of the programme and select all the line of code and run that - it looked like you wrote it into the terminal which didnt workfor me
Glad you were able to solve your issue. Apology for the late reply, currently moving back to the U.S. from Asia, too much stuff going on.
Thanks for the video, I see you also forgot to mention that df makes use of lxml, thankfully I can read the errors and so installed it.
Very good! It worked perfectly! Thank you!
Hello, I am using Chrome but I can't see the table ID, only the class. Do I need to do something else to get the table ID?
You should be able to. What steps you took to attempt viewing the source code?
Same problem over here. Were you able to find any solution?
same thing
I went to the 'debugger' part on Firefox and under debugger the class had a slightly different name. I used that class name and everything worked
wikipedia tables don't always have table ID's just use the class_name
I can not use pandas. why is it happening?
Did you install Pandas library?
@@jiejenn oh ! hank you so much . can you kindly tell me how to install that library. actually I am a new learner and don't know most of the things.
@@farhangony952 Try tiping pip install pandas in conda prompt
Thanks for the vid, man! Do you happen to live in Alabama btw?
I can a table ID? on the wiki page
Can you recommend any good extensions for python in VS Code
What python Client are you using?
looks alot more simplified than pycharm
VS Code. The configuration takes a bit to setup, but i like the flexibility much better than PyCharm.
How to turn the output of this into a DataFrame?
This is something I failed to mentioned in the video. To convert the df (while still is a list) to a DataFrame object, extract the first element. For example df = df[0].