Web Scraping Football Matches From The EPL With Python [part 1 of 2]

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ก.ย. 2024

ความคิดเห็น • 233

  • @JoaoSantos-jb7ul
    @JoaoSantos-jb7ul 2 ปีที่แล้ว +5

    Nice explanations, Vikas! The combination requests + Beautiful Soup + pandas is fantastic! Thanks! Greetings from São Paulo, Brazil!

  • @jonathanchagolla9217
    @jonathanchagolla9217 2 ปีที่แล้ว +20

    Love your teaching style. Thanks for this content!

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Thanks, Jonathan! -Vik

  • @nikolavladimirov8838
    @nikolavladimirov8838 9 หลายเดือนก่อน +4

    Outstanding tutorial with concise explanations for each line of code! Great for both beginners and advaned pandas users

  • @mementomori8856
    @mementomori8856 2 ปีที่แล้ว +5

    I've always wanted to work on a project on football since it's my favorite sport, this is a good starting point. Love your pace as well 🙏🏽.

    • @35162me
      @35162me 6 หลายเดือนก่อน

      hello A year later. How is the project coming along? Just an interested party.

  • @imfrshlikeuhh
    @imfrshlikeuhh 2 ปีที่แล้ว +7

    Really really enjoy your content. Love the examples. Love the teaching style. Love the explanations.

  • @everflores9484
    @everflores9484 ปีที่แล้ว

    Something I did that may be useful for other people: I added a comment before every line/block to tell future me what I was doing.
    Great video!

  • @4tifk
    @4tifk 2 หลายเดือนก่อน

    thank u vikas paruchuri...this video saved me...greetings from pakistan...teaching style very good!!!

  • @samcrowson167
    @samcrowson167 ปีที่แล้ว +6

    This is a great tutorial. I tried following along but instead of team stats tried extracting player stats for the season. fell over on the last hurdle of the loop. But going to give it another go this evening. Great content, thank you

    • @thomasyusuf1366
      @thomasyusuf1366 ปีที่แล้ว

      did you ever figure it out?

    • @sebbyclarke2304
      @sebbyclarke2304 8 หลายเดือนก่อน

      YEAH PLEASE LMK

    • @samcrowson167
      @samcrowson167 6 หลายเดือนก่อน

      @@sebbyclarke2304 Hi, I used the below code to complete the loop at the end of the script. You should be able to follow the video and amend teams links with players, then apply something similar to the below as the final step.
      combined_df=pd.DataFrame(columns=individual_matches.columns)
      combined_df["Player"]=""
      for squad_url in squad_urls:
      player_name=squad_url.split("/")[-1].replace("-Match-Logs", "").replace("-", " ")
      data=requests.get(squad_url)
      individual_matches=pd.read_html(data.text, match="2005-2006 Match Logs")[0]
      individual_matches.columns=individual_matches.columns.droplevel()
      individual_matches=individual_matches[individual_matches["Comp"]=="Premier League"]
      individual_matches["Player"]=player_name
      combined_df=combined_df.append(individual_matches)
      time.sleep(1)

  • @migi7787
    @migi7787 2 ปีที่แล้ว +2

    Wonderful teaching, wonderful project, so easy to access the knowledge, THANK YOU!!!😊

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Glad you liked it :) - Vik

  • @waves3188
    @waves3188 2 ปีที่แล้ว +3

    Tip: When web scraping assign the
    html code to a variable or copy it to a
    notepad as a text file before the site
    you're working with kicks you out for
    exceeding max requests.
    Learned this the hard way lol 🥴

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว +1

      Great tip! I personally like to cache files where I can (just save them as html files) and then load from disk if I need to. -Vik

    • @Yahik00
      @Yahik00 ปีที่แล้ว

      How long does it block?

  • @benjaminhorn8420
    @benjaminhorn8420 2 ปีที่แล้ว +1

    This was very useful! Thank you. I also had issues with the Premier League data so scrapped La Liga instead which worked fine. Will now attempt to follow the second part!

  • @rodneymawero9063
    @rodneymawero9063 2 ปีที่แล้ว

    Adding l = links at 8:22
    saved the day! Thanks for the video!

  • @RubensBarrichello.
    @RubensBarrichello. 6 หลายเดือนก่อน +2

    I went at it with a different approach. I started with the year I wanted to start with and did 'next season'. that way the dataframe is in chronological order. Otherwise it would read the August 2022 to may 2022 and then previous season is scraped thus Auguest 2021 to may 2021 follows.

  • @DamilolaAyodele-wq1su
    @DamilolaAyodele-wq1su ปีที่แล้ว

    Thank you so much! I've been putting off scrapping data online forever. Finally did it, thanks to you

  • @kevinr662
    @kevinr662 11 หลายเดือนก่อน

    you are a good teacher clear and precise and i wish you all the success in the world. thank for the info

  • @principeabel
    @principeabel 2 ปีที่แล้ว +1

    All your videos have helped me a lot.
    Thank you very much for your videos, I learn a lot.
    Thank you for this content that you upload 😊

  • @bencole8301
    @bencole8301 3 หลายเดือนก่อน

    Really enjoyed this walkthrough! Thank you for sharing!

  • @williehogan1822
    @williehogan1822 2 ปีที่แล้ว

    Excellent content and super teaching style. Thank you for sharing. Keep it going, it's very much appreciated.

  • @TrixsterProductions
    @TrixsterProductions 12 วันที่ผ่านมา +1

    Regarding the standings_table = 'soup.select('table.stats_table')[0]
    IndexError: list index out of range ' error - Fbref limits scraping by blocking users who send more than one request every three seconds, so i think it is important to use the time.sleep function. if you get this error (like me) I believe you just have to wait some time. But will update if this works

  • @AndrewGuimeres
    @AndrewGuimeres ปีที่แล้ว

    Thanks for the tutorial. It was really easy to follow. keep up the good work. Cheers!

  • @rudraparikh4115
    @rudraparikh4115 11 หลายเดือนก่อน +4

    standings_table = soup.select('table.stats_table')[0]
    getting list index out of range error.
    Please help me

    • @joeguerby
      @joeguerby 7 หลายเดือนก่อน

      I got the same issue, it seem that the HTML structure have changed @Dataquestio

    • @AndrewPutraHartanto
      @AndrewPutraHartanto 5 หลายเดือนก่อน

      ​@@joeguerbyyou have solution with new HTML?

  • @josephchoi7362
    @josephchoi7362 2 ปีที่แล้ว

    You have THE most soothing voice

  • @andynos
    @andynos 2 ปีที่แล้ว

    Thanks man!!! you are doing great. Very interesting to watch your videos

  • @Chariotzable
    @Chariotzable 2 ปีที่แล้ว +1

    You are a great teacher. Thank you so much for sharing. When should we expect part 2?

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Thanks a lot! Part 2 is actually live here - th-cam.com/video/0irmDBWLrco/w-d-xo.html .

  • @tamzen7945
    @tamzen7945 2 ปีที่แล้ว

    Thanks for the motivation. I wasn't sure if I could do it, but I might try it eventually.

  • @simmiesanya6003
    @simmiesanya6003 ปีที่แล้ว

    This is really awesome. I learnt alot.
    I'm having issues scrapping multiple years though. Something about remote host cutting off the connection

  • @Yahik00
    @Yahik00 ปีที่แล้ว +4

    Hey VIK I recently came across this video. I found it very helpful, and I'm trying to extend it to include the other tables as well. However, I've encountered some difficulties in retrieving the other tables using the approaches you mentioned in the code. I've tried searching for specific URLs or identifiers, but I haven't been successful so far. I was wondering if you could kindly provide an example code snippet that demonstrates how to add the passing table or any other table from the website.

  • @kurkdebraine8139
    @kurkdebraine8139 ปีที่แล้ว

    Perfectly explained. TY a lot ! :)

  • @ameybikram5781
    @ameybikram5781 2 ปีที่แล้ว

    Wow , thank you so much, you made web scrapping look so easy .

  • @04mdsimps
    @04mdsimps 2 ปีที่แล้ว

    Well thats my day sorted. Kudos sir

  • @kennyquango76
    @kennyquango76 ปีที่แล้ว

    This is an excellent tutorial. Thank you very much!

  • @roopeshpyneni5496
    @roopeshpyneni5496 9 หลายเดือนก่อน

    Nice explanation! Really helped me a lott!

  • @SuperSumittanwar
    @SuperSumittanwar 2 ปีที่แล้ว

    Awsome content and nice new mic that you have now👌

  • @jaikermontoya8891
    @jaikermontoya8891 2 ปีที่แล้ว

    Thank you very much. I have learned sooo much with this video.

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Glad it was helpful! -Vik

  • @ryosato7558
    @ryosato7558 2 ปีที่แล้ว +11

    hi guys, so i was having the no table found error too, and analyzing the code i noticed that the error was in the data.text where the pag was blocking the request code, so i just increased the time sleep by 5 and put another time sleep where we request the shooting dataframe, the code should be very slow but it works, hope it helps!!

    • @vivekaugustine9583
      @vivekaugustine9583 2 ปีที่แล้ว

      Thank you for that, it has helped me heaps since I found the same problem.
      How long did the code take to respond?

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว +2

      Thanks for the solution, Ryo!

    • @g18ytstar34
      @g18ytstar34 ปีที่แล้ว

      Sorry, I have this issue too but I don’t understand how to go through it ? Can you help ?

  • @asdfgh6906
    @asdfgh6906 2 ปีที่แล้ว

    I love your explanations

  • @FelixOnyango-o6o
    @FelixOnyango-o6o 8 หลายเดือนก่อน

    Sir this is a great video. It is helping me get started in web scraping. You didn't close the parentheses in your last long code having try and except part
    31:30

  • @danielcharles1086
    @danielcharles1086 ปีที่แล้ว

    One can notice the mastery of the subject in you throughout. Thank you will be following other tutorials

  • @carlgreener1728
    @carlgreener1728 2 ปีที่แล้ว +3

    Hi Vik, Thanks for this. I get an error in the for loop stating that the 'list index out of range' for the 'standings_table = soup.select('table.stats_table')[0]' line. I've reviewed against the code in github and there aren't any differences. Can you help please?

    • @cregv
      @cregv 2 ปีที่แล้ว

      I think it is site security or popups

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      This would happen when there is no table in the html you downloaded. You might want to try rendering the html (save it to a file and open it in a browser) to see what the issue is. There could be an issue with rate limiting or another site issue causing problems with the html. -Vik

    • @lordrahl372
      @lordrahl372 2 ปีที่แล้ว +2

      I ran into the same issue when attempting more than 2 years of seasons, and it seems to be working if you import the time module and place the following code: "time.sleep(5)" under "soup = BeautifulSoup(data.text)".
      I think what is happening is the website is blocking us from doing too many requests. Time.sleep(5) delays the scraping process, thus limiting too many requests at once.

    • @user-cg6os8qt6u
      @user-cg6os8qt6u ปีที่แล้ว

      @@lordrahl372 thank you so much bro, that code helped to solve this issue.

    • @adrianbusuttil3012
      @adrianbusuttil3012 6 หลายเดือนก่อน

      @@lordrahl372 I did this and worked like a charm - thanks

  • @abdulmalikbello538
    @abdulmalikbello538 2 ปีที่แล้ว +1

    Amazing content! Thanks a lot.
    I noticed that the shooting data has been summarized as of today(10/05/22), it is no longer a detailed match by match table.

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Thanks, Abdulmalik! That's too bad about the shooting data on fbref. Hopefully it is a temporary bug, and will be fixed.

    • @joeguerby
      @joeguerby ปีที่แล้ว +1

      @@Dataquestio In fact this is not a bug, the code actually allows to extract the sum of the shots of the match and not the number of shots per team. I have revised the code so that I get the stats of the team and the stats of the opponents of the team. This is the code for scrapping shots by team and opponent : teamshooting = pd.read_html(data4.text, match = "Shooting")[0]
      oppshooting = pd.read_html(data4.text, match = "Shooting")[1]
      teamshooting.head()
      oppshooting.head()

  • @saladin2020
    @saladin2020 ปีที่แล้ว +1

    may I know why is there error quite often on the class name of table.stats_table while using the css selector?
    standings_table = soup.select('table.stats_table')[0]

  • @Qhorin
    @Qhorin ปีที่แล้ว

    Super cool, thank you!!

  • @prawson81
    @prawson81 ปีที่แล้ว

    brilliant, I too am getting value errors, just trying the time adjustments now.

  • @shahabasmuhammed7523
    @shahabasmuhammed7523 ปีที่แล้ว +1

    links gives me a null list, can anyone help me with this ?

  • @sigfigronath
    @sigfigronath 4 หลายเดือนก่อน

    we need another EPL video :)

  • @judechi2652
    @judechi2652 ปีที่แล้ว +1

    hi Vik and everyone else, I have an issue which I'm hoping anyone can help me fix. on trying to concatenate all_matches with the code match_df = pd.concat(all_matches) the error message is that there's nothing to concatenate

  • @thewebscrapingclub
    @thewebscrapingclub ปีที่แล้ว

    Great tutorial, thanks.

  • @majidmenouar2444
    @majidmenouar2444 2 ปีที่แล้ว +1

    Great rythm, When is the next video taking place please

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว +1

      We'll be releasing the next video on Monday. -Vik

  • @enochtolulope15
    @enochtolulope15 2 ปีที่แล้ว +1

    Thank you for this tutorial. However, I ran into errors that I couldn't solve. I tried concatenating the dataframes using "pd.concat(all_matches)" but I keep getting "ValueError: No objects to concatenate". What could be the issue?

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Hi Enoch - this will happen if the `all_matches` list is empty. Are you sure you're appending the match data to the list? The code is here if you want to check - github.com/dataquestio/project-walkthroughs/blob/master/football_matches/scraping.ipynb

  • @alessoclass3929
    @alessoclass3929 2 ปีที่แล้ว

    Waiting part 2 :)

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Part 2 is live! It's at th-cam.com/video/0irmDBWLrco/w-d-xo.html .

  • @lordrahl372
    @lordrahl372 2 ปีที่แล้ว +2

    I came back to this tutorial hoping to start continue this webscraping project. I started from scratch in a new notebook so I could understand it better however I am getting these errors:
    matches = pd.read_html(data.text, match="Scores & Fixtures")[0]
    ValueError: No tables found
    or
    ValueError: No tables found
    matches = pd.read_html(data.text, match="Scores & Fixtures")[0]
    At first I thought it was a typo due to my own fault, however I went back to my old notebook file, and I remember I was able to execute the code and create a new file with the merged data.
    My old notebook file had the same errors ("no tables found"). I even went onto the dataquest github repo and cloned the notebook files for this tutorial. I ran the code and got the same value errors. Not sure what to do at this point and I have been trying to figure it out all day.

    • @robertooliveira8736
      @robertooliveira8736 2 ปีที่แล้ว

      did you manage to solve the problem?

    • @lordrahl372
      @lordrahl372 2 ปีที่แล้ว

      @camposI haven't had too much time lately. I ran the code again on Sunday, but it returned the same error. I've been trying to think of a solution while I am doing other things, but unfortunately I can't think of anything except trying a different scraping method other than requests.

    • @lordrahl372
      @lordrahl372 2 ปีที่แล้ว +1

      @@robertooliveira8736 Haven't yet, they may have updated the website or something, because it worked a month ago. Strangely retrieving the table by itself outside of the loop works.
      Unfortunately I am still learning webscraping but I thought about trying out scrapy (another web scraper).

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Hey, sorry to hear about the issue. This will happen if the page if the full page content wasn't scraped. This could happen for a few reasons - the site is down, the site has blocked you, or the content has changed. I think the site may be blocking people. I'll look into this soon, and will try to post a solution.
      One way around this is in the meantime is to use a headless browser instead of downloading the html with requests. There is a video on how to use a headless browser (playwright) here - th-cam.com/video/SJ7xnhSLwi0/w-d-xo.html .
      -Vik

    • @robertooliveira8736
      @robertooliveira8736 2 ปีที่แล้ว

      @campos Hello.
      Thanks for the feedback.
      I verified that when placing 'sleep(1)' it is blocked generating the Error.
      so I put the 'sleep(15)', now it runs normally.

  • @satyajitpaul339
    @satyajitpaul339 2 ปีที่แล้ว +1

    informative...thanks

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Glad you liked it! -Vik

  • @tomaszd1875
    @tomaszd1875 8 หลายเดือนก่อน +1

    Hi, great tutorial. just wonder why " table = soup.select("table.stats_table") " returning empty list? when I use index 0 is telling me that list index out of range. it worked well ok until I wanted to scale and finished all the code in tutorial

    • @tomaszd1875
      @tomaszd1875 8 หลายเดือนก่อน

      ​ @lordrahl372 thanks for your comment to another user. I am sorted

    • @AndrewPutraHartanto
      @AndrewPutraHartanto 5 หลายเดือนก่อน

      ​@@tomaszd1875You have solution?

  • @MaartenRobaeys
    @MaartenRobaeys ปีที่แล้ว

    Hi, extremely valuable, where to find part 2 please? Thanks

  • @MrWopper7
    @MrWopper7 2 ปีที่แล้ว

    Awesome vid man thanks!! when is part 2 coming out? :D

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      It came out today! You can find it here - th-cam.com/video/0irmDBWLrco/w-d-xo.html .

  • @Hkillelea0924
    @Hkillelea0924 8 หลายเดือนก่อน

    At the end of code it doesnt return anything for me for len(all_matches)?
    Also, the tables didnt print out at the end when I typed in match_df

  • @ditirorampate502
    @ditirorampate502 ปีที่แล้ว

    when trying to scrape for seasons from 2016, there is KeyError: "['FK'] not in index", dont know what causes it. what might be the problem?

  • @matthewmoore8445
    @matthewmoore8445 ปีที่แล้ว

    Is anyone able to explain to me why the code that was utilized in the project does not extract future matches? Been banging my head off the wall on how to get these future matches in and I cannot figure out why.

  • @luvlifereal4023
    @luvlifereal4023 6 หลายเดือนก่อน

    @dataquest your web scraping for the premier league after the request and data.text nothing happened i followed your video , or it because im using visual studio code and you use jupiter

  • @chadrickclarke1730
    @chadrickclarke1730 2 ปีที่แล้ว

    Hey, thanks for the video. Would you be able to give some guidance on how to pull the info from the match report ?

  • @temitopeayoade5924
    @temitopeayoade5924 2 ปีที่แล้ว

    hello, when running the for loop, i am getting No tables found, i have check the code on github, everything is same. please help...

  • @miiyyke
    @miiyyke ปีที่แล้ว

    Please 🙏🏾 I’m getting error once’s I reach
    Import pandas as pd
    matches = pd.read_html(data.text, match=“Scores & Fixtures”)

  • @suhaas1709
    @suhaas1709 7 หลายเดือนก่อน

    I get ‘html’ is not defined error @27:30 would really appreciate any help with this issue

  • @2ncielkrommalzeme210
    @2ncielkrommalzeme210 9 หลายเดือนก่อน

    can we apply this requests to horse race every eac horse. to investigate their perfornmance and predict their feature tendencies.

  • @hichamelkaissi7786
    @hichamelkaissi7786 2 ปีที่แล้ว +1

    Thank you for your tutorial.. Unfortunately, I have increased the number of years to 10 and got blocked by the website after scraping just the first year.

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว +1

      Hi Hicham - that's too bad - upping the delay in between requests with `time.sleep(10)` could help. I may also post a tutorial later about how you can do this with a headless browser framework like playwright.

    • @hichamelkaissi7786
      @hichamelkaissi7786 2 ปีที่แล้ว +1

      @@Dataquestio Hello Dataquest! Thank you for taking the time to reply. I think everyone will appreciate a tutorial on a headless browser framework. I tried to use Scraper API. It works for a few iterations but then breaks. I will try to up the sleep time as you mentioned. Thanks again for your time.

  • @mainacyrus74
    @mainacyrus74 11 หลายเดือนก่อน

    i really love your video.. i have a question tried scrapping two football sites and compare the data but its becoming tricky as both websites have different naming of the same team how can i resolve that issue

  • @bhargavpandit2300
    @bhargavpandit2300 7 หลายเดือนก่อน

    FBRef just doesnt allow me to scrape data anymore?? I always get a 403 status code back. any one else facing the same problem?? What can I do to fix it ?

  • @sheetcreate9016
    @sheetcreate9016 ปีที่แล้ว

    team_data = matches.merge(shooting[["Date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="Date") got an err
    AttributeError: 'list' object has no attribute 'merge'
    how to fix this error?

  • @joeguerby
    @joeguerby ปีที่แล้ว

    Thanks for this amazing video, i got an error on all_matches.append(team_data) line. all matches ins not defined. How can you help me to fix it ? Please

  • @rezarafieirad
    @rezarafieirad 2 ปีที่แล้ว

    thanks alot. very nice explanation. where is part 2 ?

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว +1

      You can find part 2 at th-cam.com/video/0irmDBWLrco/w-d-xo.html .

  • @cuttell2000
    @cuttell2000 ปีที่แล้ว

    Great video. Is there a part 2?

  • @hamzaelmi5584
    @hamzaelmi5584 6 หลายเดือนก่อน

    is it possible to use pychar / vscode for this project? not that familiar with Jptr / G colab

  • @hades7167
    @hades7167 6 หลายเดือนก่อน +1

    Hi can u do it for the 2024 table?

  • @dcr7417
    @dcr7417 2 ปีที่แล้ว +4

    Hi, thanks for this video! As others have mentioned - great teaching style.
    I'm getting an error with the final for loop. It's something to do with:
    matches = pd.read_html(data.text, match="Scores & Fixtures")[0]
    or
    shooting = pd.read_html(data.text, match="Shooting")[0]
    I get this error:
    ValueError: No tables found
    Anyone got any ideas?

    • @principeabel
      @principeabel 2 ปีที่แล้ว

      No, I also get that error

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว +4

      This would happen when you don't get any data back from the server. I've heard about some issues people have when the time.sleep() is too short. If there are too many requests too quickly, the server will stop returning results. Try changing it to time.sleep(10) to pause longer between requests. That might fix it. -Vik

    • @SkyyJames7
      @SkyyJames7 2 ปีที่แล้ว

      @@principeabel yeah I’m getting the same errors too. I tried to add another time.sleep under shooting but it isn’t working. The website may have changed something

    • @viniccciuz
      @viniccciuz 2 ปีที่แล้ว

      I was having the exact same error, increasing time.sleep() to 10 worked for me

  • @DozieOhiri
    @DozieOhiri ปีที่แล้ว

    Please can anyone explain why at 6:50 he only calls the first index of the standings table?

  • @kmind71
    @kmind71 ปีที่แล้ว

    I'm still trying to understand at around 16:20 when you do a List Comprehension as links = [l for l in links if l and 'all_comps/shooting/' in l] you have to add the "if l" portion of the condition. I know you mentioned that you add it because some of the list items don't have an 'href' but it's still not clicking for me. Any chance you or someone could please go into detail a tad more? Thanks so much!

    • @Dataquestio
      @Dataquestio  ปีที่แล้ว +1

      This filters out any cases when l is None. So if there is no href, then None will be assigned to l, and we can filter it out with this list comprehension.

    • @kmind71
      @kmind71 ปีที่แล้ว

      @@Dataquestio Thank you!

  • @saifulanwar4394
    @saifulanwar4394 ปีที่แล้ว

    why use "/" in team_name, i do that and the result is southampton. please explain about that

  • @razinchoudhury1368
    @razinchoudhury1368 2 หลายเดือนก่อน

    when i run the code on jupyter lab it was working in the first couple tries, but now i keep getting an error early in the code. for some reason i get a index out of boudns error for the soup.select(table.stats_table) part of the code. it was working perfeclty before and showed all the links and eveyderhitng, and out of nowehre it stopped and i keep getting this error. Can anyoen explain why please? Thanks

    • @razinchoudhury1368
      @razinchoudhury1368 2 หลายเดือนก่อน

      for those with the same problem change your time.sleep to more seconds

  • @lyonoconner448
    @lyonoconner448 ปีที่แล้ว

    excelent , part 2 ?

  • @stuck3315
    @stuck3315 2 ปีที่แล้ว

    Great video and content. All of these have been very helpful for someone new to Python.
    I did run into an issue with this example and not sure where I went wrong. Tryingt to use
    match_df = pd.concat(all_matches) gives me a TypeError: cannot concantenate object of type
    Tried using pd.DataFrame instead and got output to my csv but there are just headers (date, pk, etc) but no data.
    If i use print(all_matches) prior to the pd.concat or pd.DataFrame command I can see tthe actual data correctly

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Hi there - I'm guessing the data didn't scrape properly in this case (if it did scrape properly, you'd have data in all_matches). I'd try increasing the value in time.sleep, because the website you're getting data from can return empty tables if you scrape too quickly.

  • @Nunexx97
    @Nunexx97 2 ปีที่แล้ว

    What changes do I have to make to the script to collect only match data without the shooting stats? The shooting stats section is currently empty on FBref... Thanks a lot for the great video!

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Hi Nuno - you should just be able to remove the code to scrape the shooting stats, and everything else should work fine!

  • @marzm7050
    @marzm7050 ปีที่แล้ว

    there is no table names scores & fixtures what am i supposed to do now

  • @adelekefikayomi8351
    @adelekefikayomi8351 ปีที่แล้ว

    I need help anybody!
    I tried to webscrape other sections like passing,goal shot creation etc but it's saying list out of index
    Any ideas anyone?

  • @davidwisemantel5041
    @davidwisemantel5041 2 ปีที่แล้ว

    Love it, thanks so much! OOI, how would have you got the table using other means such as id? (rather than matching the string)

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว +1

      Thanks, David! You can get the table by position (when pandas parses html, the first table on the page is element 0 in the list, and so on). You can also do it by id by first extracting only the table html with beautifulsoup, then parsing it with pandas.

    • @davidwisemantel5041
      @davidwisemantel5041 2 ปีที่แล้ว

      @@Dataquestio Makes sense. Sorry one more quesiton. How would you deal with a situation where each key value is it's own table? For example if you were scraping horse racing data, where each horse had it's own table of information. Using concat would join the data but how would you reference the key? TIA!!!!!

  • @emanuelviola2609
    @emanuelviola2609 2 ปีที่แล้ว

    really nice teaching, sad they changed the shooting stats presentations, I´m thinking on focusing only on the premier league fixtures and shooting stats so i can go through all the video.

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      I just checked fbref.com/en/squads/b8fd03ef/2020-2021/matchlogs/all_comps/shooting/Manchester-City-Match-Logs-All-Competitions , and it looks like the shooting stats are working again!

    • @werty7099
      @werty7099 2 ปีที่แล้ว

      @@Dataquestio those are for the 2020-2021 season, the 2021-2022 ones are still not there :( thanks for a great video though

  • @vuquanghuy55
    @vuquanghuy55 20 วันที่ผ่านมา

    how to scraping other seasons?

  • @dhruvmanojpujaristudent590
    @dhruvmanojpujaristudent590 ปีที่แล้ว

    Hey VIK, i m getting a indexerror list index out of range in standing_table=soup.select('table.stats_table')[0] in the for loop because im not able to execute it i have tried various things and used the solution provided in the comments section as well can you help me out here?? please.

    • @alperengul9331
      @alperengul9331 4 หลายเดือนก่อน

      Did you solve it?

  • @sayantanighosh1493
    @sayantanighosh1493 10 หลายเดือนก่อน

    Hello sir i am getting error in the line "standings_table = soup.select('table.stats_table')[0]"..
    The error is stating that list index out of range..please help me out

    • @xyz-gn6jy
      @xyz-gn6jy 7 หลายเดือนก่อน

      did you find any solution

    • @AndrewPutraHartanto
      @AndrewPutraHartanto 5 หลายเดือนก่อน

      ​@@xyz-gn6jyyou have solution?

  • @robertooliveira8736
    @robertooliveira8736 2 ปีที่แล้ว

    managed to solve the problem
    that shows?
    'ValueError: No tables found'
    @Dataquest ?

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว +1

      Hi Roberto - you would see this error if no tables are showing on the original page. It may be because the page isn't working, or you've been blocked. I would check the html you downloaded to ensure that it has tables in it. -Vik

    • @robertooliveira8736
      @robertooliveira8736 2 ปีที่แล้ว

      @@Dataquestio Hello.
      Thanks for the feedback.
      I verified that when placing 'sleep(1)' it is blocked generating the Error.
      so I put the 'sleep(15)', now it runs normally.

  • @olimics9639
    @olimics9639 4 หลายเดือนก่อน

    My app, says there is something with the url

  • @pablosilva10127
    @pablosilva10127 2 ปีที่แล้ว

    Vik, is there any chance you guys could make a path with julia language?

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Hi Pablo - it's something we've thought about. Have you seen job postings that require Julia, or do you use it at work?

  • @mrmason13
    @mrmason13 2 ปีที่แล้ว

    I want this but for streaming football, like create a framework that scraps all the link to stream a single match

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Stream as in recreate a match from text match logs, or stream as in watch a video of the match? You would need a different site if you want to get video.

  • @vigneshravichandran3422
    @vigneshravichandran3422 2 ปีที่แล้ว +1

    Hi,
    When running this code " matches = pd.read_html(data.text, match="Scores & Fixtures")[0]"
    I am facing this error -> ValueError: No tables found
    Please help me this.
    Thanks!

    • @mathiasbaden9745
      @mathiasbaden9745 2 ปีที่แล้ว

      @campos Have the same issue

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Hi - this will happen if the page if the full page content wasn't scraped. This could happen for a few reasons - the site is down, the site has blocked you, or the content has changed. I think the site may be blocking people. I'll look into this soon.
      One way around this is to use a headless browser instead of downloading the html with requests. There is a video on how to use a headless browser (playwright) here - th-cam.com/video/SJ7xnhSLwi0/w-d-xo.html .
      -Vik

  • @manohartanna7423
    @manohartanna7423 2 ปีที่แล้ว

    While typing links to find squad it's showing empty list could u please tell me why

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Hi Manohar - I can't be sure without the full code. But you can look at the example code here to compare - github.com/dataquestio/project-walkthroughs/blob/master/football_matches/scraping.ipynb .

    • @manohartanna7423
      @manohartanna7423 2 ปีที่แล้ว

      Thank you

  • @samdowns4786
    @samdowns4786 2 ปีที่แล้ว

    Hi, Great video and very easy to follow.
    I have followed the code very closely but get the following error when trying to run the for loop.
    it seems to not like this line:
    matches = pd.read_html(data.text, match="Scores & Fixtures")[0]
    and the error reads:
    ImportError: html5lib not found, please install it
    I have tried installing the html5lib and then importing it but to no success.
    I think it is quite a simple thing to fix but I just cannot see it.
    Any help?
    Thanks

    • @principeabel
      @principeabel 2 ปีที่แล้ว

      In the description of the video comes the project code.
      If it works for you in the cycle part, put this: time.sleep(10)
      it takes a long, long, long time, so let it run

  • @svenwitte2503
    @svenwitte2503 ปีที่แล้ว

    Can you help me, i got stuck on errors

  • @faisalali-yp7tw
    @faisalali-yp7tw 2 ปีที่แล้ว

    Can you make a tutorial? , how dockrize scarpy +PostgreSQL?

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Hi Faisal - thanks for the suggestion! I'll keep this in mind. - Vik

  • @senerustunerciyas2918
    @senerustunerciyas2918 ปีที่แล้ว

    How can I get 3 or multiple seasons ?

  • @RUDRAPRABUS
    @RUDRAPRABUS 7 หลายเดือนก่อน +1

    1 soup = BeautifulSoup(data.text)
    ----> 2 standings_table = soup.select('table.stats_table')[0]
    3 links = standings_table.find_all('a')
    4 links = [l.get("href") for l in links]
    5 links = [l for l in links if '/squads/' in l]
    IndexError: list index out of range
    I am getting this error what should i do??

    • @manav_chak7410
      @manav_chak7410 23 วันที่ผ่านมา

      hey i am also getting this, did you ever get a solution

    • @notbobafett9368
      @notbobafett9368 18 วันที่ผ่านมา

      I am using Pycharm but try adding 'lxml' as a feature in the 1st prompt so:
      soup = BeautifulSoup(data.text, 'lxml')

  • @rafaelg8238
    @rafaelg8238 2 ปีที่แล้ว

    Thanks for video. Could you to do video with post method and export file like .csv, .xlsx because there is a little videos that in youtube, please.

    • @Dataquestio
      @Dataquestio  2 ปีที่แล้ว

      Thanks for the suggestion! I'll look into doing that. -Vik