Web Scraping: HTML Tables with Python

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ธ.ค. 2024

ความคิดเห็น • 122

  • @rverm1000
    @rverm1000 3 ปีที่แล้ว +17

    Your web scraping abilities are much better than udemy im taking.

    • @dennyswandia4462
      @dennyswandia4462 3 ปีที่แล้ว

      hello John. would you please do a video on scraping & extracting URLs and following those URLs to extract information from those URLs?tried following your video on false plants but I was scraping amazon. my code kinda returning empty CSV

    • @armanikayden3337
      @armanikayden3337 3 ปีที่แล้ว

      i guess Im asking the wrong place but does anyone know a way to get back into an Instagram account..?
      I was stupid lost the account password. I would appreciate any tricks you can give me!

    • @dennyswandia4462
      @dennyswandia4462 3 ปีที่แล้ว

      @@armanikayden3337 you remember your email address? You could just click forgot password and enter your email address. A link to reset your password will be sent to you

  • @DJMrTen
    @DJMrTen 3 ปีที่แล้ว +6

    I've been trying to figure this out for way too long. This quick vid got me rolling in under 15 minutes! Thank you John.

  • @e.eiyayi2381
    @e.eiyayi2381 2 ปีที่แล้ว +4

    You were able to transfer your knowledge spot-on, which unfortunately too many geeks fail at.

  • @bencole8301
    @bencole8301 3 ปีที่แล้ว +2

    Hi John, I took an Udemy course in Python and got slightly off track (I didn't really want to say bored! but now I have, ooops!) about halfway through but your videos are awesome. Feel like your filling in all the gaps and I'm enjoying a very straight talking no messing approach you take to tutoring. Thanks for the videos.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      Thanks Ben! I’m glad you’re enjoying my videos

  • @dhiahmila9549
    @dhiahmila9549 2 ปีที่แล้ว +2

    You could also use pandas instead of a loop: pd.read_table

  • @ryan-zd4jm
    @ryan-zd4jm 3 ปีที่แล้ว +1

    Thank you!! I love how you explain your mistakes in a way that actually makes sense.

  • @Guitarfreek27
    @Guitarfreek27 3 ปีที่แล้ว +11

    Hello, first of all, great video. Question:
    The URL I am trying to scrape from keeps giving me "None" when I try to find the table class. Ever run into this problem? Any suggestions on what I can do to fix? Any help is much appreciated!
    My code is something like this:
    table = soup.find('table', class_ = 'class_in_question')

    • @gabefife951
      @gabefife951 2 ปีที่แล้ว +1

      Same, I don’t know what to do

    • @kike210587
      @kike210587 ปีที่แล้ว

      Same for me, man.

    • @krishnas8173
      @krishnas8173 3 หลายเดือนก่อน

      same problem

  • @stonebanksyt
    @stonebanksyt 3 ปีที่แล้ว +2

    Thanks for this tutorial..i was so confused earlier..i finished my project because of you..thanks a lot

  • @areijkandi5424
    @areijkandi5424 2 ปีที่แล้ว +1

    Perfect. This is what I am looking for. Thank you so much. You deserve 1000,000 likes

  • @paulfearn7571
    @paulfearn7571 3 ปีที่แล้ว +1

    good work john - very clear easy to follow video and voice over - thank you keep up the great work

  • @felipelandim2881
    @felipelandim2881 3 ปีที่แล้ว +1

    Finally a Web Scraping of Soccer Tables!

  • @127bits7
    @127bits7 3 ปีที่แล้ว +1

    dude this was the best video! thank you so much John!

  • @alphagam3r933
    @alphagam3r933 3 ปีที่แล้ว +1

    love you sir literally love you, this video is very useful in my college project, next level video

  • @xedifice4421
    @xedifice4421 3 ปีที่แล้ว +3

    Exactly what I needed, Thanks a lot!
    Go Gunners!

  • @cherico94
    @cherico94 4 ปีที่แล้ว +2

    Thanks man. This really helped me out with what I was trying to figure out for days.

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว +1

      Thank you I’m happy it helped!

    • @mhm6
      @mhm6 4 ปีที่แล้ว +1

      Same I spent a whole day not being able to extract data from websites with multiple tables because I didn’t know how to access specific classes. Now I know it’s class_ = “ “
      Thanks!

  • @jimmykarago2598
    @jimmykarago2598 3 ปีที่แล้ว +1

    Thanks for this I was able to scrape all the data I needed

  • @chamopediapedia4888
    @chamopediapedia4888 4 ปีที่แล้ว +1

    How do you download a table and converted it into a csv?

  • @aqibsunesara
    @aqibsunesara 2 ปีที่แล้ว

    My html does not have td class. The tbody has multiple TRs. Each TR has a TH and TD and I want to extract each of those TDs.text. Can you help?

  • @kevj2001
    @kevj2001 3 ปีที่แล้ว

    Hey, my soup is not able to find the required table, what to do :/ ?

  • @melih.a
    @melih.a 3 ปีที่แล้ว +1

    when you type in rows = team.find_all('tr') it doesn't register .find_all? I don't know what i'm doing wrong here

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      I think the website I used here has changed and won’t work with this method anymore, it’s not finding anything that matches your find_all

  • @JamesTangGunner
    @JamesTangGunner ปีที่แล้ว +1

    CMYG
    We are top of the league! Say we are top of the league!
    Very helpful video. Love it

  • @gawd2891
    @gawd2891 2 ปีที่แล้ว +1

    How can I access which doesn't have any classes, same for the table tag

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      Use find all to get all the tables, then index the one you want, the same for the td tags

    • @gawd2891
      @gawd2891 2 ปีที่แล้ว

      @@JohnWatsonRooney Thank you so much John 👏

  • @karmantan
    @karmantan 2 ปีที่แล้ว +1

    Hi, what happens when an element is not visible in the page source, but only when you click on inspect?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      It’s loaded by something like Ajax via JavaScript - I have a few methods on my channel for this if you’d like to have a look, the newer ones are generally better

    • @karmantan
      @karmantan 2 ปีที่แล้ว

      @@JohnWatsonRooney Just watched the video! That was really helpful, thank you!

  • @geekboy77
    @geekboy77 3 ปีที่แล้ว

    What shall I do if my table shows 50 rows per page and I want to extract all its data

  • @jordanleo
    @jordanleo 4 ปีที่แล้ว +2

    what if the text we are looking for is not in a class, rather it is just South Australia? what would i do then?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว +1

      Hi - does the table not have a name or an ID of some description? If not you could try just using find all for ‘table’ and see what you get. You can send me the url if you’d like some more specific help

    • @jordanleo
      @jordanleo 4 ปีที่แล้ว +1

      John Watson Rooney thanks that worked! Great video!

  • @drravindraboojhawon3832
    @drravindraboojhawon3832 3 ปีที่แล้ว

    How to scrape data (tables) from a webpage having different tabs which gets activated and which present data only when you click with your mouse? Thanks

  • @shacharbard1613
    @shacharbard1613 ปีที่แล้ว

    great video John. thanks!
    regarding extracting the teams and their current points, I tried "pl_points = row.find_all('td')[9].text" and it also worked.
    is this because what matters here is the "td" index? and the reason to include the class name is to have code which is clearer?

  • @nahakuu
    @nahakuu 4 ปีที่แล้ว

    I wonder how to get html table behind login, i am looking way hot to create a python app to gather data from our database, as they do not want us give access to SQL for easier data processing. I would use python to gather all data to excel.
    But i am failing to get to the table as i need to log in first.

  • @MingoDiMedici
    @MingoDiMedici 2 ปีที่แล้ว +1

    What program are you editing in on the screen?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      This is VS Code it’s free and very popular

  • @ronitpithani2661
    @ronitpithani2661 4 ปีที่แล้ว +1

    what if the html table is formatted like such
    2020-08-07
    10.09
    0.00

    • @johnteres2339
      @johnteres2339 4 ปีที่แล้ว +1

      I have a same problem. Did you find an answer?

    • @ivanarielrocha7026
      @ivanarielrocha7026 3 ปีที่แล้ว +1

      if td = None:
      Continue

  • @robertocell3694
    @robertocell3694 3 ปีที่แล้ว

    how can I extract information from this table
    how can I extract information from this table
    362198

  • @LLFRA
    @LLFRA 3 ปีที่แล้ว

    keep getting the error :AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
    when looking for tbody?

  • @prastutnepal7137
    @prastutnepal7137 4 ปีที่แล้ว

    Great video! I tried doing the same thing on fantasy premier league's website but when I viewed the page's source code, all I saw was cryptic lines of code which was nothing similar to what I saw on your video and I coudn't search for the HTML elements. The page's source included numbers separated by commas and nothing else. Please help me out on this.

  • @sadettinbacanli4895
    @sadettinbacanli4895 3 ปีที่แล้ว

    thanks a lot for this tutorial but I have a problem in my project in tds which came empty list. ı have written same code with you but I get empty list.

  • @paulohsgoes1959
    @paulohsgoes1959 4 ปีที่แล้ว +1

    Excellent job, John. Congrats!

  • @haithinhtran5108
    @haithinhtran5108 2 ปีที่แล้ว +1

    if no class then what do you do?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      find_all tables, which will return a list you can index

    • @haithinhtran5108
      @haithinhtran5108 2 ปีที่แล้ว

      @@JohnWatsonRooney thank you😘

  • @gatorpika
    @gatorpika 3 ปีที่แล้ว +1

    Thank you! This is just what I needed.

  • @tomcat9761
    @tomcat9761 3 ปีที่แล้ว

    Great video! I subscribed!
    but what if I want to extract a specific range of columns? Like in 10 columns, I only want to extract from column 1 to column 7?

  • @francesboy2
    @francesboy2 3 ปีที่แล้ว

    I'm really sorry to ask but I'm at wits end. The html I'm trying to scrape has two tables but they have the same class name, and using find() only returns information from the first table. I can't use find_all() as it's throwing the following error:
    "AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?"
    How do I search through both tables? I don't know if it's just the html that I'm trying to scrape or what, but it's the most infuriating process to what should be a really easy project.

  • @ollie_har
    @ollie_har 3 ปีที่แล้ว +1

    Hi there, I am trying to scan in the second table for a website and it has the same class. How would I clarify to Python that I want the second table and not the first?
    I have tried to add the title into another .find() field but it returns 'none'.
    Thanks in advance!

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +2

      hi! I think you need to either find the table element specifically by its class or ID, or find_all and index the table you want using [0] or [1]

    • @ollie_har
      @ollie_har 3 ปีที่แล้ว

      @@JohnWatsonRooney ahhh thank you for your help. Excellent video by the way, very clear and really well explained!

  • @shortsgrower
    @shortsgrower 4 ปีที่แล้ว

    Thanks for the video, I am able to scrap the data from table, but I am unable to parse the Table information. I feel the all the item tags are seems to be similar, Can I get your help?

  • @NarutoUzumaki-xn1pr
    @NarutoUzumaki-xn1pr 4 ปีที่แล้ว

    Hi John,
    I faced below scenario:
    Ex: td has 'A' and it contains two values 1,2
    but when I tried to print both values using find. It is giving only 1 as output. Which means
    A 1.
    But I need output as
    A 1
    A 2
    Please let me know your thoughts

  • @RocknRollDina
    @RocknRollDina 4 ปีที่แล้ว +1

    I used python shell nothing ran unless I added [0] at the end of the leaque_table line. Do you know why?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว +1

      The [0] is an index which means you were returning a list back - maybe something has changed on the website since, but if it works that’s ok!

    • @RocknRollDina
      @RocknRollDina 4 ปีที่แล้ว

      @@JohnWatsonRooney thanks man

  • @revill0
    @revill0 3 ปีที่แล้ว +1

    Very useful, you have helped me solve my issue, thanks!

  • @kodediego
    @kodediego 4 ปีที่แล้ว

    great video.. so i did this to send straight to pandas table
    league_tb = []
    for team in l.find_all('tbody'):
    rows = team.find_all('tr')
    for row in rows:
    pl_team = row.find('td', class_='standing-table__cell standing-table__cell--name').text.strip()
    pl_points = row.find_all('td', class_='standing-table__cell')[9].text

    lister = { #creating a dict with data scrapped
    'club':pl_team,
    'points': pl_points
    }
    league_tb.append(lister) #adding it to list
    table = pd.DataFrame(league_tb)
    table.head(5)

  • @yogeshbane9647
    @yogeshbane9647 2 ปีที่แล้ว +1

    what to do if td tag has no class

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      Go back up the tree and find an element you can access easily then index or find the td tag

  • @TheSahil360
    @TheSahil360 4 ปีที่แล้ว +1

    I got mine to work! How would you export the printout to a data table?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว

      Great work! I use pandas - I have a video on exporting script to csv on my channel that might help you

  • @skunkfog1333
    @skunkfog1333 3 ปีที่แล้ว +1

    Thank you! This helped me a lot!

  • @dzeykop
    @dzeykop 3 ปีที่แล้ว +1

    Thank you John, again great lecture

  • @ayswaryagovindaraju2679
    @ayswaryagovindaraju2679 2 ปีที่แล้ว

    This video is helpful. I want to now save the above table into a dataframe. Do you have any video where data from HTML is made into a dataframe?

    • @claybyfrancisshop
      @claybyfrancisshop ปีที่แล้ว

      df = pd.DataFrame({"column_name": values})

  • @AmanpreetSinghCHD
    @AmanpreetSinghCHD 4 ปีที่แล้ว +1

    Great, was looking for something similar, I am having an issue exporting to a csv, I am using csv.writer to export to csv, is their a better way to do it?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว +3

      Hi! Sure, I use Pandas. It’s a library used for data science but it’s dataframes are really easy to export to csv. I have a video on my channel explaining how I use it!

    • @AmanpreetSinghCHD
      @AmanpreetSinghCHD 4 ปีที่แล้ว +1

      @@JohnWatsonRooney Thanks will look into it :) cheers

  • @vamsi4864
    @vamsi4864 4 ปีที่แล้ว +1

    Which editor are you using?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว +1

      VS Code - code.visualstudio.com
      It’s free and works on windows, Linux and Mac, the only extension I use is the Python one, it’s easy to find and setup

  • @jatinyadav8960
    @jatinyadav8960 3 ปีที่แล้ว

    I find your tutorial very helpful

  • @sounakchatterjee9059
    @sounakchatterjee9059 3 ปีที่แล้ว +1

    it was so easy to understand!! thanks!

  • @ugurdev
    @ugurdev 3 ปีที่แล้ว +1

    John, a fellow Arsenal fan too! Not good this season either! :(

  • @bngtnsnyndn8840
    @bngtnsnyndn8840 3 ปีที่แล้ว +1

    this is the website i tried to get the table data www.sahibinden.com/opel-omega
    when do the coding on anaconda it doesnt give any error but also doesnt show the output no result, do u think it might be bc of the website?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      hey! that site has a login, so your code will only see the login window and not the actual data

    • @bngtnsnyndn8840
      @bngtnsnyndn8840 3 ปีที่แล้ว

      ​@@JohnWatsonRooney i got it.. thank you soooo much

  • @sujithsaikalakonda4863
    @sujithsaikalakonda4863 ปีที่แล้ว +1

    Great explanation.

  • @hadish4529
    @hadish4529 3 ปีที่แล้ว

    hi
    tanks for your education
    i have a question , how we can scraping secure cookies from website? without use selenium module
    for example in this tat.exirbroker.com/mobile/index.html website i can only 2 cookies and i need all cookies scraping with request module
    tanks for your help

  • @Yeeeeeehaw
    @Yeeeeeehaw 2 ปีที่แล้ว +1

    Thank you!

  • @penglipur_lara
    @penglipur_lara 4 ปีที่แล้ว +1

    thanks man you really help me

  • @victormartinsdias9427
    @victormartinsdias9427 2 ปีที่แล้ว +1

    Very good

  • @buttert5091
    @buttert5091 3 ปีที่แล้ว +1

    Thanks really helpful

  • @fleimbeck9384
    @fleimbeck9384 4 ปีที่แล้ว +2

    Thanks man !

  • @domukelis
    @domukelis 3 ปีที่แล้ว

    only gives data for 20th team norwich whats up with that

  • @KayiEdits
    @KayiEdits 4 ปีที่แล้ว +1

    you legend

  • @ssr765
    @ssr765 3 ปีที่แล้ว

    don't trust, doesn't work

  • @emmanuelolorunbogun772
    @emmanuelolorunbogun772 3 ปีที่แล้ว

    Nice video John, I really love your approach and explanation. But how do I get the data from a particular row/column of a table if a class isn't defined under the td tag?
    For example: the first table in this link en.wikipedia.org/wiki/List_of_African_countries_by_area

    • @bdcash
      @bdcash 3 ปีที่แล้ว

      You can do table scraping really easily with pandas - th-cam.com/video/ODNMNwgtehk/w-d-xo.html (and it works fine with wiki pages. You just index the results with [0] or [1] etc until you find the table you want)

  • @leonardoalvarado7632
    @leonardoalvarado7632 3 ปีที่แล้ว +1

    Thank you your video was very helpful!

  • @pini5076
    @pini5076 ปีที่แล้ว +1

    Thank you!