This is a great tutorial. I tried following along but instead of team stats tried extracting player stats for the season. fell over on the last hurdle of the loop. But going to give it another go this evening. Great content, thank you
@@sebbyclarke2304 Hi, I used the below code to complete the loop at the end of the script. You should be able to follow the video and amend teams links with players, then apply something similar to the below as the final step. combined_df=pd.DataFrame(columns=individual_matches.columns) combined_df["Player"]="" for squad_url in squad_urls: player_name=squad_url.split("/")[-1].replace("-Match-Logs", "").replace("-", " ") data=requests.get(squad_url) individual_matches=pd.read_html(data.text, match="2005-2006 Match Logs")[0] individual_matches.columns=individual_matches.columns.droplevel() individual_matches=individual_matches[individual_matches["Comp"]=="Premier League"] individual_matches["Player"]=player_name combined_df=combined_df.append(individual_matches) time.sleep(1)
Tip: When web scraping assign the html code to a variable or copy it to a notepad as a text file before the site you're working with kicks you out for exceeding max requests. Learned this the hard way lol 🥴
This was very useful! Thank you. I also had issues with the Premier League data so scrapped La Liga instead which worked fine. Will now attempt to follow the second part!
I went at it with a different approach. I started with the year I wanted to start with and did 'next season'. that way the dataframe is in chronological order. Otherwise it would read the August 2022 to may 2022 and then previous season is scraped thus Auguest 2021 to may 2021 follows.
Regarding the standings_table = 'soup.select('table.stats_table')[0] IndexError: list index out of range ' error - Fbref limits scraping by blocking users who send more than one request every three seconds, so i think it is important to use the time.sleep function. if you get this error (like me) I believe you just have to wait some time. But will update if this works
Hey VIK I recently came across this video. I found it very helpful, and I'm trying to extend it to include the other tables as well. However, I've encountered some difficulties in retrieving the other tables using the approaches you mentioned in the code. I've tried searching for specific URLs or identifiers, but I haven't been successful so far. I was wondering if you could kindly provide an example code snippet that demonstrates how to add the passing table or any other table from the website.
hi guys, so i was having the no table found error too, and analyzing the code i noticed that the error was in the data.text where the pag was blocking the request code, so i just increased the time sleep by 5 and put another time sleep where we request the shooting dataframe, the code should be very slow but it works, hope it helps!!
Sir this is a great video. It is helping me get started in web scraping. You didn't close the parentheses in your last long code having try and except part 31:30
Hi Vik, Thanks for this. I get an error in the for loop stating that the 'list index out of range' for the 'standings_table = soup.select('table.stats_table')[0]' line. I've reviewed against the code in github and there aren't any differences. Can you help please?
This would happen when there is no table in the html you downloaded. You might want to try rendering the html (save it to a file and open it in a browser) to see what the issue is. There could be an issue with rate limiting or another site issue causing problems with the html. -Vik
I ran into the same issue when attempting more than 2 years of seasons, and it seems to be working if you import the time module and place the following code: "time.sleep(5)" under "soup = BeautifulSoup(data.text)". I think what is happening is the website is blocking us from doing too many requests. Time.sleep(5) delays the scraping process, thus limiting too many requests at once.
Amazing content! Thanks a lot. I noticed that the shooting data has been summarized as of today(10/05/22), it is no longer a detailed match by match table.
@@Dataquestio In fact this is not a bug, the code actually allows to extract the sum of the shots of the match and not the number of shots per team. I have revised the code so that I get the stats of the team and the stats of the opponents of the team. This is the code for scrapping shots by team and opponent : teamshooting = pd.read_html(data4.text, match = "Shooting")[0] oppshooting = pd.read_html(data4.text, match = "Shooting")[1] teamshooting.head() oppshooting.head()
may I know why is there error quite often on the class name of table.stats_table while using the css selector? standings_table = soup.select('table.stats_table')[0]
hi Vik and everyone else, I have an issue which I'm hoping anyone can help me fix. on trying to concatenate all_matches with the code match_df = pd.concat(all_matches) the error message is that there's nothing to concatenate
Thank you for this tutorial. However, I ran into errors that I couldn't solve. I tried concatenating the dataframes using "pd.concat(all_matches)" but I keep getting "ValueError: No objects to concatenate". What could be the issue?
Hi Enoch - this will happen if the `all_matches` list is empty. Are you sure you're appending the match data to the list? The code is here if you want to check - github.com/dataquestio/project-walkthroughs/blob/master/football_matches/scraping.ipynb
I came back to this tutorial hoping to start continue this webscraping project. I started from scratch in a new notebook so I could understand it better however I am getting these errors: matches = pd.read_html(data.text, match="Scores & Fixtures")[0] ValueError: No tables found or ValueError: No tables found matches = pd.read_html(data.text, match="Scores & Fixtures")[0] At first I thought it was a typo due to my own fault, however I went back to my old notebook file, and I remember I was able to execute the code and create a new file with the merged data. My old notebook file had the same errors ("no tables found"). I even went onto the dataquest github repo and cloned the notebook files for this tutorial. I ran the code and got the same value errors. Not sure what to do at this point and I have been trying to figure it out all day.
@camposI haven't had too much time lately. I ran the code again on Sunday, but it returned the same error. I've been trying to think of a solution while I am doing other things, but unfortunately I can't think of anything except trying a different scraping method other than requests.
@@robertooliveira8736 Haven't yet, they may have updated the website or something, because it worked a month ago. Strangely retrieving the table by itself outside of the loop works. Unfortunately I am still learning webscraping but I thought about trying out scrapy (another web scraper).
Hey, sorry to hear about the issue. This will happen if the page if the full page content wasn't scraped. This could happen for a few reasons - the site is down, the site has blocked you, or the content has changed. I think the site may be blocking people. I'll look into this soon, and will try to post a solution. One way around this is in the meantime is to use a headless browser instead of downloading the html with requests. There is a video on how to use a headless browser (playwright) here - th-cam.com/video/SJ7xnhSLwi0/w-d-xo.html . -Vik
@campos Hello. Thanks for the feedback. I verified that when placing 'sleep(1)' it is blocked generating the Error. so I put the 'sleep(15)', now it runs normally.
Hi, great tutorial. just wonder why " table = soup.select("table.stats_table") " returning empty list? when I use index 0 is telling me that list index out of range. it worked well ok until I wanted to scale and finished all the code in tutorial
Is anyone able to explain to me why the code that was utilized in the project does not extract future matches? Been banging my head off the wall on how to get these future matches in and I cannot figure out why.
@dataquest your web scraping for the premier league after the request and data.text nothing happened i followed your video , or it because im using visual studio code and you use jupiter
Thank you for your tutorial.. Unfortunately, I have increased the number of years to 10 and got blocked by the website after scraping just the first year.
Hi Hicham - that's too bad - upping the delay in between requests with `time.sleep(10)` could help. I may also post a tutorial later about how you can do this with a headless browser framework like playwright.
@@Dataquestio Hello Dataquest! Thank you for taking the time to reply. I think everyone will appreciate a tutorial on a headless browser framework. I tried to use Scraper API. It works for a few iterations but then breaks. I will try to up the sleep time as you mentioned. Thanks again for your time.
i really love your video.. i have a question tried scrapping two football sites and compare the data but its becoming tricky as both websites have different naming of the same team how can i resolve that issue
FBRef just doesnt allow me to scrape data anymore?? I always get a 403 status code back. any one else facing the same problem?? What can I do to fix it ?
team_data = matches.merge(shooting[["Date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="Date") got an err AttributeError: 'list' object has no attribute 'merge' how to fix this error?
Thanks for this amazing video, i got an error on all_matches.append(team_data) line. all matches ins not defined. How can you help me to fix it ? Please
Hi, thanks for this video! As others have mentioned - great teaching style. I'm getting an error with the final for loop. It's something to do with: matches = pd.read_html(data.text, match="Scores & Fixtures")[0] or shooting = pd.read_html(data.text, match="Shooting")[0] I get this error: ValueError: No tables found Anyone got any ideas?
This would happen when you don't get any data back from the server. I've heard about some issues people have when the time.sleep() is too short. If there are too many requests too quickly, the server will stop returning results. Try changing it to time.sleep(10) to pause longer between requests. That might fix it. -Vik
@@principeabel yeah I’m getting the same errors too. I tried to add another time.sleep under shooting but it isn’t working. The website may have changed something
I'm still trying to understand at around 16:20 when you do a List Comprehension as links = [l for l in links if l and 'all_comps/shooting/' in l] you have to add the "if l" portion of the condition. I know you mentioned that you add it because some of the list items don't have an 'href' but it's still not clicking for me. Any chance you or someone could please go into detail a tad more? Thanks so much!
This filters out any cases when l is None. So if there is no href, then None will be assigned to l, and we can filter it out with this list comprehension.
when i run the code on jupyter lab it was working in the first couple tries, but now i keep getting an error early in the code. for some reason i get a index out of boudns error for the soup.select(table.stats_table) part of the code. it was working perfeclty before and showed all the links and eveyderhitng, and out of nowehre it stopped and i keep getting this error. Can anyoen explain why please? Thanks
Great video and content. All of these have been very helpful for someone new to Python. I did run into an issue with this example and not sure where I went wrong. Tryingt to use match_df = pd.concat(all_matches) gives me a TypeError: cannot concantenate object of type Tried using pd.DataFrame instead and got output to my csv but there are just headers (date, pk, etc) but no data. If i use print(all_matches) prior to the pd.concat or pd.DataFrame command I can see tthe actual data correctly
Hi there - I'm guessing the data didn't scrape properly in this case (if it did scrape properly, you'd have data in all_matches). I'd try increasing the value in time.sleep, because the website you're getting data from can return empty tables if you scrape too quickly.
What changes do I have to make to the script to collect only match data without the shooting stats? The shooting stats section is currently empty on FBref... Thanks a lot for the great video!
Thanks, David! You can get the table by position (when pandas parses html, the first table on the page is element 0 in the list, and so on). You can also do it by id by first extracting only the table html with beautifulsoup, then parsing it with pandas.
@@Dataquestio Makes sense. Sorry one more quesiton. How would you deal with a situation where each key value is it's own table? For example if you were scraping horse racing data, where each horse had it's own table of information. Using concat would join the data but how would you reference the key? TIA!!!!!
really nice teaching, sad they changed the shooting stats presentations, I´m thinking on focusing only on the premier league fixtures and shooting stats so i can go through all the video.
I just checked fbref.com/en/squads/b8fd03ef/2020-2021/matchlogs/all_comps/shooting/Manchester-City-Match-Logs-All-Competitions , and it looks like the shooting stats are working again!
Hey VIK, i m getting a indexerror list index out of range in standing_table=soup.select('table.stats_table')[0] in the for loop because im not able to execute it i have tried various things and used the solution provided in the comments section as well can you help me out here?? please.
Hello sir i am getting error in the line "standings_table = soup.select('table.stats_table')[0]".. The error is stating that list index out of range..please help me out
Hi Roberto - you would see this error if no tables are showing on the original page. It may be because the page isn't working, or you've been blocked. I would check the html you downloaded to ensure that it has tables in it. -Vik
@@Dataquestio Hello. Thanks for the feedback. I verified that when placing 'sleep(1)' it is blocked generating the Error. so I put the 'sleep(15)', now it runs normally.
Stream as in recreate a match from text match logs, or stream as in watch a video of the match? You would need a different site if you want to get video.
Hi, When running this code " matches = pd.read_html(data.text, match="Scores & Fixtures")[0]" I am facing this error -> ValueError: No tables found Please help me this. Thanks!
Hi - this will happen if the page if the full page content wasn't scraped. This could happen for a few reasons - the site is down, the site has blocked you, or the content has changed. I think the site may be blocking people. I'll look into this soon. One way around this is to use a headless browser instead of downloading the html with requests. There is a video on how to use a headless browser (playwright) here - th-cam.com/video/SJ7xnhSLwi0/w-d-xo.html . -Vik
Hi Manohar - I can't be sure without the full code. But you can look at the example code here to compare - github.com/dataquestio/project-walkthroughs/blob/master/football_matches/scraping.ipynb .
Hi, Great video and very easy to follow. I have followed the code very closely but get the following error when trying to run the for loop. it seems to not like this line: matches = pd.read_html(data.text, match="Scores & Fixtures")[0] and the error reads: ImportError: html5lib not found, please install it I have tried installing the html5lib and then importing it but to no success. I think it is quite a simple thing to fix but I just cannot see it. Any help? Thanks
In the description of the video comes the project code. If it works for you in the cycle part, put this: time.sleep(10) it takes a long, long, long time, so let it run
1 soup = BeautifulSoup(data.text) ----> 2 standings_table = soup.select('table.stats_table')[0] 3 links = standings_table.find_all('a') 4 links = [l.get("href") for l in links] 5 links = [l for l in links if '/squads/' in l] IndexError: list index out of range I am getting this error what should i do??
Nice explanations, Vikas! The combination requests + Beautiful Soup + pandas is fantastic! Thanks! Greetings from São Paulo, Brazil!
Thanks, Joao! -Vik
Love your teaching style. Thanks for this content!
Thanks, Jonathan! -Vik
Outstanding tutorial with concise explanations for each line of code! Great for both beginners and advaned pandas users
I've always wanted to work on a project on football since it's my favorite sport, this is a good starting point. Love your pace as well 🙏🏽.
hello A year later. How is the project coming along? Just an interested party.
Really really enjoy your content. Love the examples. Love the teaching style. Love the explanations.
Something I did that may be useful for other people: I added a comment before every line/block to tell future me what I was doing.
Great video!
thank u vikas paruchuri...this video saved me...greetings from pakistan...teaching style very good!!!
This is a great tutorial. I tried following along but instead of team stats tried extracting player stats for the season. fell over on the last hurdle of the loop. But going to give it another go this evening. Great content, thank you
did you ever figure it out?
YEAH PLEASE LMK
@@sebbyclarke2304 Hi, I used the below code to complete the loop at the end of the script. You should be able to follow the video and amend teams links with players, then apply something similar to the below as the final step.
combined_df=pd.DataFrame(columns=individual_matches.columns)
combined_df["Player"]=""
for squad_url in squad_urls:
player_name=squad_url.split("/")[-1].replace("-Match-Logs", "").replace("-", " ")
data=requests.get(squad_url)
individual_matches=pd.read_html(data.text, match="2005-2006 Match Logs")[0]
individual_matches.columns=individual_matches.columns.droplevel()
individual_matches=individual_matches[individual_matches["Comp"]=="Premier League"]
individual_matches["Player"]=player_name
combined_df=combined_df.append(individual_matches)
time.sleep(1)
Wonderful teaching, wonderful project, so easy to access the knowledge, THANK YOU!!!😊
Glad you liked it :) - Vik
Tip: When web scraping assign the
html code to a variable or copy it to a
notepad as a text file before the site
you're working with kicks you out for
exceeding max requests.
Learned this the hard way lol 🥴
Great tip! I personally like to cache files where I can (just save them as html files) and then load from disk if I need to. -Vik
How long does it block?
This was very useful! Thank you. I also had issues with the Premier League data so scrapped La Liga instead which worked fine. Will now attempt to follow the second part!
Adding l = links at 8:22
saved the day! Thanks for the video!
I went at it with a different approach. I started with the year I wanted to start with and did 'next season'. that way the dataframe is in chronological order. Otherwise it would read the August 2022 to may 2022 and then previous season is scraped thus Auguest 2021 to may 2021 follows.
Thank you so much! I've been putting off scrapping data online forever. Finally did it, thanks to you
you are a good teacher clear and precise and i wish you all the success in the world. thank for the info
All your videos have helped me a lot.
Thank you very much for your videos, I learn a lot.
Thank you for this content that you upload 😊
Really enjoyed this walkthrough! Thank you for sharing!
Excellent content and super teaching style. Thank you for sharing. Keep it going, it's very much appreciated.
Thanks, Willie!
Regarding the standings_table = 'soup.select('table.stats_table')[0]
IndexError: list index out of range ' error - Fbref limits scraping by blocking users who send more than one request every three seconds, so i think it is important to use the time.sleep function. if you get this error (like me) I believe you just have to wait some time. But will update if this works
Thanks for the tutorial. It was really easy to follow. keep up the good work. Cheers!
standings_table = soup.select('table.stats_table')[0]
getting list index out of range error.
Please help me
I got the same issue, it seem that the HTML structure have changed @Dataquestio
@@joeguerbyyou have solution with new HTML?
You have THE most soothing voice
Thanks man!!! you are doing great. Very interesting to watch your videos
You are a great teacher. Thank you so much for sharing. When should we expect part 2?
Thanks a lot! Part 2 is actually live here - th-cam.com/video/0irmDBWLrco/w-d-xo.html .
Thanks for the motivation. I wasn't sure if I could do it, but I might try it eventually.
I hope you try it!
This is really awesome. I learnt alot.
I'm having issues scrapping multiple years though. Something about remote host cutting off the connection
Hey VIK I recently came across this video. I found it very helpful, and I'm trying to extend it to include the other tables as well. However, I've encountered some difficulties in retrieving the other tables using the approaches you mentioned in the code. I've tried searching for specific URLs or identifiers, but I haven't been successful so far. I was wondering if you could kindly provide an example code snippet that demonstrates how to add the passing table or any other table from the website.
Perfectly explained. TY a lot ! :)
Wow , thank you so much, you made web scrapping look so easy .
Well thats my day sorted. Kudos sir
This is an excellent tutorial. Thank you very much!
Nice explanation! Really helped me a lott!
Awsome content and nice new mic that you have now👌
Thanks! -Vik
Thank you very much. I have learned sooo much with this video.
Glad it was helpful! -Vik
hi guys, so i was having the no table found error too, and analyzing the code i noticed that the error was in the data.text where the pag was blocking the request code, so i just increased the time sleep by 5 and put another time sleep where we request the shooting dataframe, the code should be very slow but it works, hope it helps!!
Thank you for that, it has helped me heaps since I found the same problem.
How long did the code take to respond?
Thanks for the solution, Ryo!
Sorry, I have this issue too but I don’t understand how to go through it ? Can you help ?
I love your explanations
Sir this is a great video. It is helping me get started in web scraping. You didn't close the parentheses in your last long code having try and except part
31:30
One can notice the mastery of the subject in you throughout. Thank you will be following other tutorials
Hi Vik, Thanks for this. I get an error in the for loop stating that the 'list index out of range' for the 'standings_table = soup.select('table.stats_table')[0]' line. I've reviewed against the code in github and there aren't any differences. Can you help please?
I think it is site security or popups
This would happen when there is no table in the html you downloaded. You might want to try rendering the html (save it to a file and open it in a browser) to see what the issue is. There could be an issue with rate limiting or another site issue causing problems with the html. -Vik
I ran into the same issue when attempting more than 2 years of seasons, and it seems to be working if you import the time module and place the following code: "time.sleep(5)" under "soup = BeautifulSoup(data.text)".
I think what is happening is the website is blocking us from doing too many requests. Time.sleep(5) delays the scraping process, thus limiting too many requests at once.
@@lordrahl372 thank you so much bro, that code helped to solve this issue.
@@lordrahl372 I did this and worked like a charm - thanks
Amazing content! Thanks a lot.
I noticed that the shooting data has been summarized as of today(10/05/22), it is no longer a detailed match by match table.
Thanks, Abdulmalik! That's too bad about the shooting data on fbref. Hopefully it is a temporary bug, and will be fixed.
@@Dataquestio In fact this is not a bug, the code actually allows to extract the sum of the shots of the match and not the number of shots per team. I have revised the code so that I get the stats of the team and the stats of the opponents of the team. This is the code for scrapping shots by team and opponent : teamshooting = pd.read_html(data4.text, match = "Shooting")[0]
oppshooting = pd.read_html(data4.text, match = "Shooting")[1]
teamshooting.head()
oppshooting.head()
may I know why is there error quite often on the class name of table.stats_table while using the css selector?
standings_table = soup.select('table.stats_table')[0]
same error. did you find a fix?
Super cool, thank you!!
brilliant, I too am getting value errors, just trying the time adjustments now.
links gives me a null list, can anyone help me with this ?
we need another EPL video :)
hi Vik and everyone else, I have an issue which I'm hoping anyone can help me fix. on trying to concatenate all_matches with the code match_df = pd.concat(all_matches) the error message is that there's nothing to concatenate
Great tutorial, thanks.
Great rythm, When is the next video taking place please
We'll be releasing the next video on Monday. -Vik
Thank you for this tutorial. However, I ran into errors that I couldn't solve. I tried concatenating the dataframes using "pd.concat(all_matches)" but I keep getting "ValueError: No objects to concatenate". What could be the issue?
Hi Enoch - this will happen if the `all_matches` list is empty. Are you sure you're appending the match data to the list? The code is here if you want to check - github.com/dataquestio/project-walkthroughs/blob/master/football_matches/scraping.ipynb
Waiting part 2 :)
Part 2 is live! It's at th-cam.com/video/0irmDBWLrco/w-d-xo.html .
I came back to this tutorial hoping to start continue this webscraping project. I started from scratch in a new notebook so I could understand it better however I am getting these errors:
matches = pd.read_html(data.text, match="Scores & Fixtures")[0]
ValueError: No tables found
or
ValueError: No tables found
matches = pd.read_html(data.text, match="Scores & Fixtures")[0]
At first I thought it was a typo due to my own fault, however I went back to my old notebook file, and I remember I was able to execute the code and create a new file with the merged data.
My old notebook file had the same errors ("no tables found"). I even went onto the dataquest github repo and cloned the notebook files for this tutorial. I ran the code and got the same value errors. Not sure what to do at this point and I have been trying to figure it out all day.
did you manage to solve the problem?
@camposI haven't had too much time lately. I ran the code again on Sunday, but it returned the same error. I've been trying to think of a solution while I am doing other things, but unfortunately I can't think of anything except trying a different scraping method other than requests.
@@robertooliveira8736 Haven't yet, they may have updated the website or something, because it worked a month ago. Strangely retrieving the table by itself outside of the loop works.
Unfortunately I am still learning webscraping but I thought about trying out scrapy (another web scraper).
Hey, sorry to hear about the issue. This will happen if the page if the full page content wasn't scraped. This could happen for a few reasons - the site is down, the site has blocked you, or the content has changed. I think the site may be blocking people. I'll look into this soon, and will try to post a solution.
One way around this is in the meantime is to use a headless browser instead of downloading the html with requests. There is a video on how to use a headless browser (playwright) here - th-cam.com/video/SJ7xnhSLwi0/w-d-xo.html .
-Vik
@campos Hello.
Thanks for the feedback.
I verified that when placing 'sleep(1)' it is blocked generating the Error.
so I put the 'sleep(15)', now it runs normally.
informative...thanks
Glad you liked it! -Vik
Hi, great tutorial. just wonder why " table = soup.select("table.stats_table") " returning empty list? when I use index 0 is telling me that list index out of range. it worked well ok until I wanted to scale and finished all the code in tutorial
@lordrahl372 thanks for your comment to another user. I am sorted
@@tomaszd1875You have solution?
Hi, extremely valuable, where to find part 2 please? Thanks
Awesome vid man thanks!! when is part 2 coming out? :D
It came out today! You can find it here - th-cam.com/video/0irmDBWLrco/w-d-xo.html .
At the end of code it doesnt return anything for me for len(all_matches)?
Also, the tables didnt print out at the end when I typed in match_df
when trying to scrape for seasons from 2016, there is KeyError: "['FK'] not in index", dont know what causes it. what might be the problem?
Is anyone able to explain to me why the code that was utilized in the project does not extract future matches? Been banging my head off the wall on how to get these future matches in and I cannot figure out why.
@dataquest your web scraping for the premier league after the request and data.text nothing happened i followed your video , or it because im using visual studio code and you use jupiter
Hey, thanks for the video. Would you be able to give some guidance on how to pull the info from the match report ?
hello, when running the for loop, i am getting No tables found, i have check the code on github, everything is same. please help...
Please 🙏🏾 I’m getting error once’s I reach
Import pandas as pd
matches = pd.read_html(data.text, match=“Scores & Fixtures”)
I get ‘html’ is not defined error @27:30 would really appreciate any help with this issue
can we apply this requests to horse race every eac horse. to investigate their perfornmance and predict their feature tendencies.
Thank you for your tutorial.. Unfortunately, I have increased the number of years to 10 and got blocked by the website after scraping just the first year.
Hi Hicham - that's too bad - upping the delay in between requests with `time.sleep(10)` could help. I may also post a tutorial later about how you can do this with a headless browser framework like playwright.
@@Dataquestio Hello Dataquest! Thank you for taking the time to reply. I think everyone will appreciate a tutorial on a headless browser framework. I tried to use Scraper API. It works for a few iterations but then breaks. I will try to up the sleep time as you mentioned. Thanks again for your time.
i really love your video.. i have a question tried scrapping two football sites and compare the data but its becoming tricky as both websites have different naming of the same team how can i resolve that issue
FBRef just doesnt allow me to scrape data anymore?? I always get a 403 status code back. any one else facing the same problem?? What can I do to fix it ?
team_data = matches.merge(shooting[["Date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="Date") got an err
AttributeError: 'list' object has no attribute 'merge'
how to fix this error?
Thanks for this amazing video, i got an error on all_matches.append(team_data) line. all matches ins not defined. How can you help me to fix it ? Please
thanks alot. very nice explanation. where is part 2 ?
You can find part 2 at th-cam.com/video/0irmDBWLrco/w-d-xo.html .
Great video. Is there a part 2?
is it possible to use pychar / vscode for this project? not that familiar with Jptr / G colab
Hi can u do it for the 2024 table?
Hi, thanks for this video! As others have mentioned - great teaching style.
I'm getting an error with the final for loop. It's something to do with:
matches = pd.read_html(data.text, match="Scores & Fixtures")[0]
or
shooting = pd.read_html(data.text, match="Shooting")[0]
I get this error:
ValueError: No tables found
Anyone got any ideas?
No, I also get that error
This would happen when you don't get any data back from the server. I've heard about some issues people have when the time.sleep() is too short. If there are too many requests too quickly, the server will stop returning results. Try changing it to time.sleep(10) to pause longer between requests. That might fix it. -Vik
@@principeabel yeah I’m getting the same errors too. I tried to add another time.sleep under shooting but it isn’t working. The website may have changed something
I was having the exact same error, increasing time.sleep() to 10 worked for me
Please can anyone explain why at 6:50 he only calls the first index of the standings table?
I'm still trying to understand at around 16:20 when you do a List Comprehension as links = [l for l in links if l and 'all_comps/shooting/' in l] you have to add the "if l" portion of the condition. I know you mentioned that you add it because some of the list items don't have an 'href' but it's still not clicking for me. Any chance you or someone could please go into detail a tad more? Thanks so much!
This filters out any cases when l is None. So if there is no href, then None will be assigned to l, and we can filter it out with this list comprehension.
@@Dataquestio Thank you!
why use "/" in team_name, i do that and the result is southampton. please explain about that
when i run the code on jupyter lab it was working in the first couple tries, but now i keep getting an error early in the code. for some reason i get a index out of boudns error for the soup.select(table.stats_table) part of the code. it was working perfeclty before and showed all the links and eveyderhitng, and out of nowehre it stopped and i keep getting this error. Can anyoen explain why please? Thanks
for those with the same problem change your time.sleep to more seconds
excelent , part 2 ?
Great video and content. All of these have been very helpful for someone new to Python.
I did run into an issue with this example and not sure where I went wrong. Tryingt to use
match_df = pd.concat(all_matches) gives me a TypeError: cannot concantenate object of type
Tried using pd.DataFrame instead and got output to my csv but there are just headers (date, pk, etc) but no data.
If i use print(all_matches) prior to the pd.concat or pd.DataFrame command I can see tthe actual data correctly
Hi there - I'm guessing the data didn't scrape properly in this case (if it did scrape properly, you'd have data in all_matches). I'd try increasing the value in time.sleep, because the website you're getting data from can return empty tables if you scrape too quickly.
What changes do I have to make to the script to collect only match data without the shooting stats? The shooting stats section is currently empty on FBref... Thanks a lot for the great video!
Hi Nuno - you should just be able to remove the code to scrape the shooting stats, and everything else should work fine!
there is no table names scores & fixtures what am i supposed to do now
I need help anybody!
I tried to webscrape other sections like passing,goal shot creation etc but it's saying list out of index
Any ideas anyone?
Love it, thanks so much! OOI, how would have you got the table using other means such as id? (rather than matching the string)
Thanks, David! You can get the table by position (when pandas parses html, the first table on the page is element 0 in the list, and so on). You can also do it by id by first extracting only the table html with beautifulsoup, then parsing it with pandas.
@@Dataquestio Makes sense. Sorry one more quesiton. How would you deal with a situation where each key value is it's own table? For example if you were scraping horse racing data, where each horse had it's own table of information. Using concat would join the data but how would you reference the key? TIA!!!!!
really nice teaching, sad they changed the shooting stats presentations, I´m thinking on focusing only on the premier league fixtures and shooting stats so i can go through all the video.
I just checked fbref.com/en/squads/b8fd03ef/2020-2021/matchlogs/all_comps/shooting/Manchester-City-Match-Logs-All-Competitions , and it looks like the shooting stats are working again!
@@Dataquestio those are for the 2020-2021 season, the 2021-2022 ones are still not there :( thanks for a great video though
how to scraping other seasons?
Hey VIK, i m getting a indexerror list index out of range in standing_table=soup.select('table.stats_table')[0] in the for loop because im not able to execute it i have tried various things and used the solution provided in the comments section as well can you help me out here?? please.
Did you solve it?
Hello sir i am getting error in the line "standings_table = soup.select('table.stats_table')[0]"..
The error is stating that list index out of range..please help me out
did you find any solution
@@xyz-gn6jyyou have solution?
managed to solve the problem
that shows?
'ValueError: No tables found'
@Dataquest ?
Hi Roberto - you would see this error if no tables are showing on the original page. It may be because the page isn't working, or you've been blocked. I would check the html you downloaded to ensure that it has tables in it. -Vik
@@Dataquestio Hello.
Thanks for the feedback.
I verified that when placing 'sleep(1)' it is blocked generating the Error.
so I put the 'sleep(15)', now it runs normally.
My app, says there is something with the url
Vik, is there any chance you guys could make a path with julia language?
Hi Pablo - it's something we've thought about. Have you seen job postings that require Julia, or do you use it at work?
I want this but for streaming football, like create a framework that scraps all the link to stream a single match
Stream as in recreate a match from text match logs, or stream as in watch a video of the match? You would need a different site if you want to get video.
Hi,
When running this code " matches = pd.read_html(data.text, match="Scores & Fixtures")[0]"
I am facing this error -> ValueError: No tables found
Please help me this.
Thanks!
@campos Have the same issue
Hi - this will happen if the page if the full page content wasn't scraped. This could happen for a few reasons - the site is down, the site has blocked you, or the content has changed. I think the site may be blocking people. I'll look into this soon.
One way around this is to use a headless browser instead of downloading the html with requests. There is a video on how to use a headless browser (playwright) here - th-cam.com/video/SJ7xnhSLwi0/w-d-xo.html .
-Vik
While typing links to find squad it's showing empty list could u please tell me why
Hi Manohar - I can't be sure without the full code. But you can look at the example code here to compare - github.com/dataquestio/project-walkthroughs/blob/master/football_matches/scraping.ipynb .
Thank you
Hi, Great video and very easy to follow.
I have followed the code very closely but get the following error when trying to run the for loop.
it seems to not like this line:
matches = pd.read_html(data.text, match="Scores & Fixtures")[0]
and the error reads:
ImportError: html5lib not found, please install it
I have tried installing the html5lib and then importing it but to no success.
I think it is quite a simple thing to fix but I just cannot see it.
Any help?
Thanks
In the description of the video comes the project code.
If it works for you in the cycle part, put this: time.sleep(10)
it takes a long, long, long time, so let it run
Can you help me, i got stuck on errors
Can you make a tutorial? , how dockrize scarpy +PostgreSQL?
Hi Faisal - thanks for the suggestion! I'll keep this in mind. - Vik
How can I get 3 or multiple seasons ?
1 soup = BeautifulSoup(data.text)
----> 2 standings_table = soup.select('table.stats_table')[0]
3 links = standings_table.find_all('a')
4 links = [l.get("href") for l in links]
5 links = [l for l in links if '/squads/' in l]
IndexError: list index out of range
I am getting this error what should i do??
hey i am also getting this, did you ever get a solution
I am using Pycharm but try adding 'lxml' as a feature in the 1st prompt so:
soup = BeautifulSoup(data.text, 'lxml')
Thanks for video. Could you to do video with post method and export file like .csv, .xlsx because there is a little videos that in youtube, please.
Thanks for the suggestion! I'll look into doing that. -Vik