Web Scraping: HTML Tables with Python

John Watson Rooney

มุมมอง 74 419

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 19 ม.ค. 2025

ความคิดเห็น • 122

@rverm1000 3 ปีที่แล้ว ⁺¹⁷
Your web scraping abilities are much better than udemy im taking.
@dennyswandia4462 3 ปีที่แล้ว
hello John. would you please do a video on scraping & extracting URLs and following those URLs to extract information from those URLs?tried following your video on false plants but I was scraping amazon. my code kinda returning empty CSV
@armanikayden3337 3 ปีที่แล้ว
i guess Im asking the wrong place but does anyone know a way to get back into an Instagram account..?
I was stupid lost the account password. I would appreciate any tricks you can give me!
@dennyswandia4462 3 ปีที่แล้ว
@@armanikayden3337 you remember your email address? You could just click forgot password and enter your email address. A link to reset your password will be sent to you
@DJMrTen 3 ปีที่แล้ว ⁺⁶
I've been trying to figure this out for way too long. This quick vid got me rolling in under 15 minutes! Thank you John.
@JohnWatsonRooney 3 ปีที่แล้ว
Thanks!
@e.eiyayi2381 2 ปีที่แล้ว ⁺⁴
You were able to transfer your knowledge spot-on, which unfortunately too many geeks fail at.
@bencole8301 3 ปีที่แล้ว ⁺²
Hi John, I took an Udemy course in Python and got slightly off track (I didn't really want to say bored! but now I have, ooops!) about halfway through but your videos are awesome. Feel like your filling in all the gaps and I'm enjoying a very straight talking no messing approach you take to tutoring. Thanks for the videos.
@JohnWatsonRooney 3 ปีที่แล้ว ⁺¹
Thanks Ben! I’m glad you’re enjoying my videos
@ryan-zd4jm 3 ปีที่แล้ว ⁺¹
Thank you!! I love how you explain your mistakes in a way that actually makes sense.
@JohnWatsonRooney 3 ปีที่แล้ว ⁺¹
Thanks!
@Guitarfreek27 4 ปีที่แล้ว ⁺¹¹
Hello, first of all, great video. Question:
The URL I am trying to scrape from keeps giving me "None" when I try to find the table class. Ever run into this problem? Any suggestions on what I can do to fix? Any help is much appreciated!
My code is something like this:
table = soup.find('table', class_ = 'class_in_question')
@gabefife951 2 ปีที่แล้ว ⁺¹
Same, I don’t know what to do
@kike210587 ปีที่แล้ว
Same for me, man.
@krishnas8173 4 หลายเดือนก่อน
same problem
@dhiahmila9549 2 ปีที่แล้ว ⁺²
You could also use pandas instead of a loop: pd.read_table
@areijkandi5424 3 ปีที่แล้ว ⁺¹
Perfect. This is what I am looking for. Thank you so much. You deserve 1000,000 likes
@JohnWatsonRooney 3 ปีที่แล้ว
Thank you I’m glad you enjoyed it
@stonebanksyt 3 ปีที่แล้ว ⁺²
Thanks for this tutorial..i was so confused earlier..i finished my project because of you..thanks a lot
@paulfearn7571 4 ปีที่แล้ว ⁺¹
good work john - very clear easy to follow video and voice over - thank you keep up the great work
@127bits7 3 ปีที่แล้ว ⁺¹
dude this was the best video! thank you so much John!
@felipelandim2881 3 ปีที่แล้ว ⁺¹
Finally a Web Scraping of Soccer Tables!
@alphagam3r933 3 ปีที่แล้ว ⁺¹
love you sir literally love you, this video is very useful in my college project, next level video
@JohnWatsonRooney 3 ปีที่แล้ว
Thank you good luck with your project
@xedifice4421 3 ปีที่แล้ว ⁺³
Exactly what I needed, Thanks a lot!
Go Gunners!
@cherico94 4 ปีที่แล้ว ⁺²
Thanks man. This really helped me out with what I was trying to figure out for days.
@JohnWatsonRooney 4 ปีที่แล้ว ⁺¹
Thank you I’m happy it helped!
@mhm6 4 ปีที่แล้ว ⁺¹
Same I spent a whole day not being able to extract data from websites with multiple tables because I didn’t know how to access specific classes. Now I know it’s class_ = “ “
Thanks!
@gawd2891 2 ปีที่แล้ว ⁺¹
How can I access which doesn't have any classes, same for the table tag
@JohnWatsonRooney 2 ปีที่แล้ว ⁺¹
Use find all to get all the tables, then index the one you want, the same for the td tags
@gawd2891 2 ปีที่แล้ว
@@JohnWatsonRooney Thank you so much John 👏
@jimmykarago2598 3 ปีที่แล้ว ⁺¹
Thanks for this I was able to scrape all the data I needed
@JohnWatsonRooney 3 ปีที่แล้ว ⁺¹
That’s great thanks for watching
@karmantan 2 ปีที่แล้ว ⁺¹
Hi, what happens when an element is not visible in the page source, but only when you click on inspect?
@JohnWatsonRooney 2 ปีที่แล้ว ⁺¹
It’s loaded by something like Ajax via JavaScript - I have a few methods on my channel for this if you’d like to have a look, the newer ones are generally better
@karmantan 2 ปีที่แล้ว
@@JohnWatsonRooney Just watched the video! That was really helpful, thank you!
@melih.a 3 ปีที่แล้ว ⁺¹
when you type in rows = team.find_all('tr') it doesn't register .find_all? I don't know what i'm doing wrong here
@JohnWatsonRooney 3 ปีที่แล้ว
I think the website I used here has changed and won’t work with this method anymore, it’s not finding anything that matches your find_all
@aqibsunesara 2 ปีที่แล้ว
My html does not have td class. The tbody has multiple TRs. Each TR has a TH and TD and I want to extract each of those TDs.text. Can you help?
@MingoDiMedici 2 ปีที่แล้ว ⁺¹
What program are you editing in on the screen?
@JohnWatsonRooney 2 ปีที่แล้ว
This is VS Code it’s free and very popular
@shacharbard1613 ปีที่แล้ว
great video John. thanks!
regarding extracting the teams and their current points, I tried "pl_points = row.find_all('td')[9].text" and it also worked.
is this because what matters here is the "td" index? and the reason to include the class name is to have code which is clearer?
@jordanleo 4 ปีที่แล้ว ⁺²
what if the text we are looking for is not in a class, rather it is just South Australia? what would i do then?
@JohnWatsonRooney 4 ปีที่แล้ว ⁺¹
Hi - does the table not have a name or an ID of some description? If not you could try just using find all for ‘table’ and see what you get. You can send me the url if you’d like some more specific help
@jordanleo 4 ปีที่แล้ว ⁺¹
John Watson Rooney thanks that worked! Great video!
@paulohsgoes1959 4 ปีที่แล้ว ⁺¹
Excellent job, John. Congrats!
@ollie_har 3 ปีที่แล้ว ⁺¹
Hi there, I am trying to scan in the second table for a website and it has the same class. How would I clarify to Python that I want the second table and not the first?
I have tried to add the title into another .find() field but it returns 'none'.
Thanks in advance!
@JohnWatsonRooney 3 ปีที่แล้ว ⁺²
hi! I think you need to either find the table element specifically by its class or ID, or find_all and index the table you want using [0] or [1]
@ollie_har 3 ปีที่แล้ว
@@JohnWatsonRooney ahhh thank you for your help. Excellent video by the way, very clear and really well explained!
@gatorpika 3 ปีที่แล้ว ⁺¹
Thank you! This is just what I needed.
@ronitpithani2661 4 ปีที่แล้ว ⁺¹
what if the html table is formatted like such
2020-08-07
10.09
0.00
@johnteres2339 4 ปีที่แล้ว ⁺¹
I have a same problem. Did you find an answer?
@Ivan98ok 3 ปีที่แล้ว ⁺¹
if td = None:
Continue
@JamesTangGunner ปีที่แล้ว ⁺¹
CMYG
We are top of the league! Say we are top of the league!
Very helpful video. Love it
@chamopediapedia4888 4 ปีที่แล้ว ⁺¹
How do you download a table and converted it into a csv?
@haithinhtran5108 2 ปีที่แล้ว ⁺¹
if no class then what do you do?
@JohnWatsonRooney 2 ปีที่แล้ว ⁺¹
find_all tables, which will return a list you can index
@haithinhtran5108 2 ปีที่แล้ว
@@JohnWatsonRooney thank you😘
@RocknRollDina 4 ปีที่แล้ว ⁺¹
I used python shell nothing ran unless I added [0] at the end of the leaque_table line. Do you know why?
@JohnWatsonRooney 4 ปีที่แล้ว ⁺¹
The [0] is an index which means you were returning a list back - maybe something has changed on the website since, but if it works that’s ok!
@RocknRollDina 4 ปีที่แล้ว
@@JohnWatsonRooney thanks man
@drravindraboojhawon3832 3 ปีที่แล้ว
How to scrape data (tables) from a webpage having different tabs which gets activated and which present data only when you click with your mouse? Thanks
@kevj2001 3 ปีที่แล้ว
Hey, my soup is not able to find the required table, what to do :/ ?
@tomcat9761 3 ปีที่แล้ว
Great video! I subscribed!
but what if I want to extract a specific range of columns? Like in 10 columns, I only want to extract from column 1 to column 7?
@AmanpreetSinghCHD 4 ปีที่แล้ว ⁺¹
Great, was looking for something similar, I am having an issue exporting to a csv, I am using csv.writer to export to csv, is their a better way to do it?
@JohnWatsonRooney 4 ปีที่แล้ว ⁺³
Hi! Sure, I use Pandas. It’s a library used for data science but it’s dataframes are really easy to export to csv. I have a video on my channel explaining how I use it!
@AmanpreetSinghCHD 4 ปีที่แล้ว ⁺¹
@@JohnWatsonRooney Thanks will look into it :) cheers
@geekboy77 3 ปีที่แล้ว
What shall I do if my table shows 50 rows per page and I want to extract all its data
@revill0 3 ปีที่แล้ว ⁺¹
Very useful, you have helped me solve my issue, thanks!
@LLFRA 3 ปีที่แล้ว
keep getting the error :AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
when looking for tbody?
@nahakuu 4 ปีที่แล้ว
I wonder how to get html table behind login, i am looking way hot to create a python app to gather data from our database, as they do not want us give access to SQL for easier data processing. I would use python to gather all data to excel.
But i am failing to get to the table as i need to log in first.
@TheSahil360 4 ปีที่แล้ว ⁺¹
I got mine to work! How would you export the printout to a data table?
@JohnWatsonRooney 4 ปีที่แล้ว
Great work! I use pandas - I have a video on exporting script to csv on my channel that might help you
@robertocell3694 3 ปีที่แล้ว
how can I extract information from this table
how can I extract information from this table
362198
@ayswaryagovindaraju2679 2 ปีที่แล้ว
This video is helpful. I want to now save the above table into a dataframe. Do you have any video where data from HTML is made into a dataframe?
@claybyfrancisshop ปีที่แล้ว
df = pd.DataFrame({"column_name": values})
@prastutnepal7137 4 ปีที่แล้ว
Great video! I tried doing the same thing on fantasy premier league's website but when I viewed the page's source code, all I saw was cryptic lines of code which was nothing similar to what I saw on your video and I coudn't search for the HTML elements. The page's source included numbers separated by commas and nothing else. Please help me out on this.
@vamsi4864 4 ปีที่แล้ว ⁺¹
Which editor are you using?
@JohnWatsonRooney 4 ปีที่แล้ว ⁺¹
VS Code - code.visualstudio.com
It’s free and works on windows, Linux and Mac, the only extension I use is the Python one, it’s easy to find and setup
@VincentComfy 3 ปีที่แล้ว
I'm really sorry to ask but I'm at wits end. The html I'm trying to scrape has two tables but they have the same class name, and using find() only returns information from the first table. I can't use find_all() as it's throwing the following error:
"AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?"
How do I search through both tables? I don't know if it's just the html that I'm trying to scrape or what, but it's the most infuriating process to what should be a really easy project.
@dzeykop 3 ปีที่แล้ว ⁺¹
Thank you John, again great lecture
@sounakchatterjee9059 4 ปีที่แล้ว ⁺¹
it was so easy to understand!! thanks!
@shortsgrower 4 ปีที่แล้ว
Thanks for the video, I am able to scrap the data from table, but I am unable to parse the Table information. I feel the all the item tags are seems to be similar, Can I get your help?
@skunkfog1333 3 ปีที่แล้ว ⁺¹
Thank you! This helped me a lot!
@JohnWatsonRooney 3 ปีที่แล้ว
That’s great!
@sadoxy-sbac 3 ปีที่แล้ว
thanks a lot for this tutorial but I have a problem in my project in tds which came empty list. ı have written same code with you but I get empty list.
@bngtnsnyndn8840 3 ปีที่แล้ว ⁺¹
this is the website i tried to get the table data www.sahibinden.com/opel-omega
when do the coding on anaconda it doesnt give any error but also doesnt show the output no result, do u think it might be bc of the website?
@JohnWatsonRooney 3 ปีที่แล้ว
hey! that site has a login, so your code will only see the login window and not the actual data
@bngtnsnyndn8840 3 ปีที่แล้ว
@@JohnWatsonRooney i got it.. thank you soooo much
@jatinyadav8960 3 ปีที่แล้ว
I find your tutorial very helpful
@yogeshbane9647 2 ปีที่แล้ว ⁺¹
what to do if td tag has no class
@JohnWatsonRooney 2 ปีที่แล้ว
Go back up the tree and find an element you can access easily then index or find the td tag
@NarutoUzumaki-xn1pr 4 ปีที่แล้ว
Hi John,
I faced below scenario:
Ex: td has 'A' and it contains two values 1,2
but when I tried to print both values using find. It is giving only 1 as output. Which means
A 1.
But I need output as
A 1
A 2
Please let me know your thoughts
@sujithsaikalakonda4863 ปีที่แล้ว ⁺¹
Great explanation.
@ugurdev 3 ปีที่แล้ว ⁺¹
John, a fellow Arsenal fan too! Not good this season either! :(
@JohnWatsonRooney 3 ปีที่แล้ว
It’s been difficult to watch!!
@penglipur_lara 4 ปีที่แล้ว ⁺¹
thanks man you really help me
@kodediego 4 ปีที่แล้ว
great video.. so i did this to send straight to pandas table
league_tb = []
for team in l.find_all('tbody'):
rows = team.find_all('tr')
for row in rows:
pl_team = row.find('td', class_='standing-table__cell standing-table__cell--name').text.strip()
pl_points = row.find_all('td', class_='standing-table__cell')[9].text

lister = { #creating a dict with data scrapped
'club':pl_team,
'points': pl_points
}
league_tb.append(lister) #adding it to list
table = pd.DataFrame(league_tb)
table.head(5)
@hadish4529 3 ปีที่แล้ว
hi
tanks for your education
i have a question , how we can scraping secure cookies from website? without use selenium module
for example in this tat.exirbroker.com/mobile/index.html website i can only 2 cookies and i need all cookies scraping with request module
tanks for your help
@buttert5091 3 ปีที่แล้ว ⁺¹
Thanks really helpful
@Yeeeeeehaw 2 ปีที่แล้ว ⁺¹
Thank you!
@fleimbeck9384 4 ปีที่แล้ว ⁺²
Thanks man !
@KayiEdits 4 ปีที่แล้ว ⁺¹
you legend
@victormartinsdias9427 2 ปีที่แล้ว ⁺¹
Very good
@domukelis 3 ปีที่แล้ว
only gives data for 20th team norwich whats up with that
@ssr765 3 ปีที่แล้ว
don't trust, doesn't work
@emmanuelolorunbogun772 3 ปีที่แล้ว
Nice video John, I really love your approach and explanation. But how do I get the data from a particular row/column of a table if a class isn't defined under the td tag?
For example: the first table in this link en.wikipedia.org/wiki/List_of_African_countries_by_area
@bdcash 3 ปีที่แล้ว
You can do table scraping really easily with pandas - th-cam.com/video/ODNMNwgtehk/w-d-xo.html (and it works fine with wiki pages. You just index the results with [0] or [1] etc until you find the table you want)
@leonardoalvarado7632 3 ปีที่แล้ว ⁺¹
Thank you your video was very helpful!
@pini5076 ปีที่แล้ว ⁺¹
Thank you!

ต่อไป

เล่นอัตโนมัติ

Web Scrape Websites with a LOGIN - Python Basic Auth