Not work with shopee, there are login page, otp and captcha to scrape product and if i catch the request hidden api it require a lot of cookie, please help me :(((
could you please produce a series on webscraping since i observed that when i am using bs4 i am getting only some part of dom content,please try to explain how to webscrape the important information by manipulating ip's for sake of collecting information. Thanks
@@unconv I think we tend to gravitate towards what we are used to and BS4 has been around for a long time. And it's really good with malformed html, but i really miss the lack of type hinting.
I had no idea there was a python library for playwright! I've been suffering using node/bun and swearing a lot! thanks!
Great example! Thanks a lot for creating this video. And the cool thing is, that even today your code still works :-)
Not work with shopee, there are login page, otp and captcha to scrape product and if i catch the request hidden api it require a lot of cookie, please help me :(((
this is freaking awesome!
perfect.. thanks
Can you build a script similar to agent zero?
Where is the part about getting past blockers? This is only about using playwright.
We would like a betting website scraped, thank you! :)
could you please produce a series on webscraping since i observed that when i am using bs4 i am getting only some part of dom content,please try to explain how to webscrape the important information by manipulating ip's for sake of collecting information. Thanks
Why would you use BS4 if you can use Playwright for the same thing? You are already using it to get the web source..
Good point, haha. I was just using Playwright to get past the block, but yeah it would make sense to get the elements with playwright directly.
@@unconv I think we tend to gravitate towards what we are used to and BS4 has been around for a long time. And it's really good with malformed html, but i really miss the lack of type hinting.
Awesome!! 🎉🎉🎉🎉🎉🎉
But this only shows the results of the first page.
Probably not the best solution but for sites which include a page number parameter you could just loop over it saving each of the page’s results
What about LinkedIn
How about twitter live scrapping of a trending topic like any crypto currency or AI, and print them live as they ar posted.
You have to use proxy, so you will never get banded.
WHy do people always do these on simple sites that return plain html? Be a man and do this vs sites like instagram that return all javascript ;)
I got a headache