I am taking two of the most highly rated courses on udemy about scrape and they do not have half of your production, and teaching you are great. Success for the future!, Éxito.
Helloooo, thank you so so much for literallyy making a video about my comment. Learnt so so much about python and API requests. You are one of the best teachers in youtube period. This certainly gave me a head start in my project and I can wait to complete it ! However, there is this one issue I am facing. According to calculations for 1 million job listing, there should be = 1 000 000 / 30 = 33 333 pages, given 1 page has 30 listing. But whenever I cross the 332 page mark and hover into pages like 400/500 I get the following message, " {"message":"[query_phase_execution_exception] Result window is too large, from + size must be less than or equal to: [10000] but was [1000020]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.","error":{"status":500}} " This is a major problem I am facing. Given the genius you are, I am sure you will come up with an idea to fix this lol. Thanks in advance loll !
Unfortunately, it looks like you are at the mercy of the data provider, I did a bit of google and only found these two StackOverflow questions stackoverflow.com/questions/63086701/web-scraping-a-job-platform-with-1-million-listings stackoverflow.com/questions/63097845/web-scraping-api-see-the-scroll-api-for-a-more-efficient-way-to-request-large The answers confirm the fact that the website has a limit on 10,000 requests, now! While you can only get 10k max requests.... You can get max 10k requests per filter AND one of the filters is longitude and latitude based... What this means is, technically, IN THEORY, we could script a clever bit of Python to walk the earth collecting job listings along its way. Open up your dev tools and double click on the map and see the API calls flowing in, if walking the earth sounds a little painful and time consuming you could also find a list online of all the major capital cities around the world and their longitude and latitude as a good start. Best of luck! - Adam
@@MakeDataUseful so unfortunately I am able to scrape 10, 000 listings only :( ? Is it possible to use Selenium Web Driver and scroll through all the pages ( like you did in the youtube scraping video) and then collect all the data from the html ? Just a thought haha !
@@rafidrahman8654 sadly not, selenium will simply be triggering the same API. I would suggest applying a combination of different filters and you should be able to get 100's of thousands of listings. The 10k limit is only for the current query.
@@MakeDataUseful I have been struggling to scrape www.snakeriverfarms.com/american-kobe-beef.html - was unsuccessful finding a private API to call, and had limited success scraping a single weight and price from one of their pages. The issue is that with a drop down menu to select each weight of steak, it seem hard to use bs4 to extract any information. Is this a use case for selenium? I was able to scrape holygrailsteak.com/collections/japanese-wagyu with much success for reference
Really good video on web scraping. Please do more of that kind of videos. If you could do the video of how to feed the data into a spl database that would be awesome - thx.
@@MakeDataUseful thank you for the very quick reply. Perfect! I am looking forward to it. I did subscibe to your channel and rang the bell, so I 'll not miss it. Keep going - with those well presented content the 1000 subscribers should be no problem till 2020 ends ;)
Awesome video! Subscribed! Quick question for you though. On the scraping project I'm working on, when I go to copy the cURL bash into the converter as you did, mine has a cookies section as well as the headers, params, data, and python request code. What do you think that means about the site I'm scraping? Should I delete the cookies section of the conversion? Cheers, Joe
Test it without and see how you go, if it doesn't want to play fair you may need to look at using the requests Session method to collect those cookies and use them in your request. If all else fails I have a couple of tutorials on using Selenium web browser automation that may help. Best of luck and let me know if you get stuck!
@@MakeDataUseful Thanks! I tried removing the cookies section and got: NameError: name 'cookies' is not defined I guess the cURL has it there for a reason! I will just keep using that one. I guess it just means that the site i'm scraping will be able to identify that it is me scraping it every time I do, right?
@@MakeDataUseful I just checked and indeed there is cookies=cookies in my request.post() Should I delete that if I'm trying to remove the cookies line all together? Thanks again man!
Okay after typing this I went and tried it and viola it worked! So I guess that means that cookies is just optional? I tried googling what cookies really means in this situation, but couldn't find a clear answer. What would be the benefit of using the cookings line the cURL gave me vs just deleting it all?
Thank you for the great content! I'm wondering if there is any way to get this approach to not fail if there is javascript, or at least be accepted as a real and current browser. I'm aware that copying out the curl provides all the headers/user agents etc but some websites seem to still be able to tell that it is not a real browser, perhaps it is because javascript is not rendering properly and it gives it away? any thoughts would be much appreciated!
Hey yeah an alternative route is to use Selenium and automate the browser. I have a couple of videos on my channel showing logging in and scraping with Selnium
Hey man ... what is the tool that you use to work with the data ? Edit: It's jupyter notebook ... I figured it out watching the first episode of "making money with python" .
One question it's may be related .. I do have user id and password for a website now I want to scrap the data from there ?? How to use this technique ( the one you showed in video) on scraping those data??? Probably if you will give some hints or direction that would be good if you will get time you may make a video may 😀😀 . Thanks 👍
Hi, it is Barrsido from Reddit. I'm having another problem with my code. Most of the game is done, but for some reason the 'bal' variable does not update so during the betting, results, and scoring phases, it messes up on the second run. I put it back into the codeshare. Please msg me on reddit if you see this.
@@MakeDataUseful Hi, i've actually figured out how to fix it for the majority of the program, the only spot here it's wrong now it when you get a blackjack, the ball doesn't update.
I am taking two of the most highly rated courses on udemy about scrape and they do not have half of your production, and teaching you are great. Success for the future!, Éxito.
Thank you!! Plenty more to come :)
Helloooo, thank you so so much for literallyy making a video about my comment. Learnt so so much about python and API requests. You are one of the best teachers in youtube period. This certainly gave me a head start in my project and I can wait to complete it !
However, there is this one issue I am facing. According to calculations for 1 million job listing, there should be = 1 000 000 / 30 = 33 333 pages, given 1 page has 30 listing. But whenever I cross the 332 page mark and hover into pages like 400/500 I get the following message,
" {"message":"[query_phase_execution_exception] Result window is too large, from + size must be less than or equal to: [10000] but was [1000020]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.","error":{"status":500}} "
This is a major problem I am facing. Given the genius you are, I am sure you will come up with an idea to fix this lol. Thanks in advance loll !
Unfortunately, it looks like you are at the mercy of the data provider, I did a bit of google and only found these two StackOverflow questions stackoverflow.com/questions/63086701/web-scraping-a-job-platform-with-1-million-listings
stackoverflow.com/questions/63097845/web-scraping-api-see-the-scroll-api-for-a-more-efficient-way-to-request-large
The answers confirm the fact that the website has a limit on 10,000 requests, now! While you can only get 10k max requests.... You can get max 10k requests per filter AND one of the filters is longitude and latitude based... What this means is, technically, IN THEORY, we could script a clever bit of Python to walk the earth collecting job listings along its way. Open up your dev tools and double click on the map and see the API calls flowing in, if walking the earth sounds a little painful and time consuming you could also find a list online of all the major capital cities around the world and their longitude and latitude as a good start.
Best of luck!
- Adam
@@MakeDataUseful so unfortunately I am able to scrape 10, 000 listings only :( ? Is it possible to use Selenium Web Driver and scroll through all the pages ( like you did in the youtube scraping video) and then collect all the data from the html ? Just a thought haha !
@@rafidrahman8654 sadly not, selenium will simply be triggering the same API. I would suggest applying a combination of different filters and you should be able to get 100's of thousands of listings. The 10k limit is only for the current query.
@@MakeDataUseful Okayy understood ! Can I apply the filters through my python code ?
Amazing value. Wish you make vids like this all the time(make money with python)! Thank you!
Hey!! yes, exporting to SQL would be a very nice thing to know
Great production and knowledge. Made it look real tight. Also like the way you explained it all. Subscribed!
Hey thanks so much! I am still working through my audio levels but really appreciate the positive feedback. Thanks for subscribing!
@@MakeDataUseful I have been struggling to scrape www.snakeriverfarms.com/american-kobe-beef.html - was unsuccessful finding a private API to call, and had limited success scraping a single weight and price from one of their pages. The issue is that with a drop down menu to select each weight of steak, it seem hard to use bs4 to extract any information. Is this a use case for selenium? I was able to scrape holygrailsteak.com/collections/japanese-wagyu with much success for reference
Loved the enthusiasm when you were checking the website for data! This was a great course. Just what i needed. You got a new subscriber :)
Thank you! Appreciate the feedback.
This was really good content! I was able for follow along on my system and got the same results.
Really good video on web scraping.
Please do more of that kind of videos.
If you could do the video of how to feed the data into a spl database that would be awesome - thx.
Can do Christoph! Thanks for the feedback, keep an eye out for an upcoming video about saving to a database :)
@@MakeDataUseful thank you for the very quick reply.
Perfect! I am looking forward to it.
I did subscibe to your channel and rang the bell, so I 'll not miss it.
Keep going - with those well presented content the 1000 subscribers should be no problem till 2020 ends ;)
Great man !! I will be using this in near future 😀😀
Awesome video! Subscribed! Quick question for you though. On the scraping project I'm working on, when I go to copy the cURL bash into the converter as you did, mine has a cookies section as well as the headers, params, data, and python request code. What do you think that means about the site I'm scraping? Should I delete the cookies section of the conversion? Cheers, Joe
Test it without and see how you go, if it doesn't want to play fair you may need to look at using the requests Session method to collect those cookies and use them in your request. If all else fails I have a couple of tutorials on using Selenium web browser automation that may help. Best of luck and let me know if you get stuck!
@@MakeDataUseful Thanks! I tried removing the cookies section and got:
NameError: name 'cookies' is not defined
I guess the cURL has it there for a reason! I will just keep using that one. I guess it just means that the site i'm scraping will be able to identify that it is me scraping it every time I do, right?
@@KoldbyTheEye interesting error, double check your requests.get() make sure there is no cookies=cookies in there
@@MakeDataUseful I just checked and indeed there is cookies=cookies in my request.post() Should I delete that if I'm trying to remove the cookies line all together? Thanks again man!
Okay after typing this I went and tried it and viola it worked! So I guess that means that cookies is just optional? I tried googling what cookies really means in this situation, but couldn't find a clear answer. What would be the benefit of using the cookings line the cURL gave me vs just deleting it all?
Great tutorial! Thanks a lot.
Thank you for the great content! I'm wondering if there is any way to get this approach to not fail if there is javascript, or at least be accepted as a real and current browser. I'm aware that copying out the curl provides all the headers/user agents etc but some websites seem to still be able to tell that it is not a real browser, perhaps it is because javascript is not rendering properly and it gives it away? any thoughts would be much appreciated!
Hey yeah an alternative route is to use Selenium and automate the browser. I have a couple of videos on my channel showing logging in and scraping with Selnium
Great video, thank you
You are welcome!
Awesome.
Thanks!
Is there any way to send a request in order to find potential acceptable parameters? (Once you've already found an useful api curl)
Thank you bro!
great lesson !!! It is really good for slow learner. can someone tell me why the data df has only one row and one column?
Hey man ... what is the tool that you use to work with the data ?
Edit: It's jupyter notebook ... I figured it out watching the first episode of "making money with python" .
Hi Velvet, I use a mixture of numpy and pandas to clean, transform and analyse data.
shouldn't lat be first and then lng??
One question it's may be related .. I do have user id and password for a website now I want to scrap the data from there ?? How to use this technique ( the one you showed in video) on scraping those data???
Probably if you will give some hints or direction that would be good if you will get time you may make a video may 😀😀 . Thanks 👍
Hi Banshidhar, great question. I made an auto-login and web scraping video available here th-cam.com/video/BZMVoYhA7KU/w-d-xo.html
all the best.
Adam
Hi, it is Barrsido from Reddit. I'm having another problem with my code. Most of the game is done, but for some reason the 'bal' variable does not update so during the betting, results, and scoring phases, it messes up on the second run. I put it back into the codeshare. Please msg me on reddit if you see this.
Hey Barry, will do!
@@MakeDataUseful Hi, i've actually figured out how to fix it for the majority of the program, the only spot here it's wrong now it when you get a blackjack, the ball doesn't update.
im high af and that intro made me laugh
can use while loop for the page scraping automatically
I also get nervous with while loops.... So many many infinite loops 🤣
nice
Please share the code.py file with us
May i have your email address sir? I need your guidance how to scrape webiste that required username password and entering captcha to login?
Vote up for sqlite