In my opinion as i developed multiple web scraping application, half of the time is not spent coding but instead trying to reverse engineer the web application. Simple ones are just matter of looking at requests from dev tools and manually make api calls, while most complicated ones involve backtracing how content is loaded on the page to find the js code responsable to do that. Basically its 70% reverse engineering and 30% coding, if you do things the smart way.
@@Anthony-qg5hj I had a selenium project in which I tried the approach you’re talking about. Not only needed to attach the login cookies (which expire) to the request anyway but also I needed to manually construct the request skeleton. So in the end I had a similar effort as I would have if I just force selenium to click buttons
interesting timing to see this video, literally the day after I completed my first full-stack application which literally revolves around web-scraping :D
i started learning about web scraping YESTERDAY, and stumbled upon your video today. GODDAMN the way you explain stuff and speak really stuck with me! thank you for providing such value and motivating me to improve my communication skills as well :D
I remember starting to watch your videos when I was entering computer science Ba, and as a 28 year old 1 semester left to graduate, you’re still uploading good content that’s unique. Never get tired of your vids , keep it up brother . I’m also concerned with the job market , can you make a vid about new grad Cs students ? For example seems almost every job wants front end or something and my school never taught any of it
You want to get a job from what your school taught you? You are in for a ride brother. Tech is about your own research and self learning, every fucking day.I pity people that majored in CS because they heard about a programmer earning 6figs
Unless u went to ivy league and wanna be a quant then u gotta do front end js react sql are key for majority. School is duhm unless ivybleague except for piece of paper
I saw this video recommended to me about two days after I had to scrape a ton of images and convert them to a PDF. The images are loaded dynamically and I will confess with shame that my script would scroll slowly down the entire page until it couldn't get any further. Then it would queue up all the appropriate image files and compile them into a local directory before turning them into a single PDF file.
I am working on building a project that heavily requires scraping so I been doing a lot of research. And its really hard to find anything good that is not sponsored by brightdata. I get it, their marketing team has done a great job with tapping a perfect niche of creators who provide valuable information but this also creates a problem to ending up finding that almost each good resource is related to using brightdata and its not something I want to pay for when starting a hobby project. Anyway, this is a great video either way. I learned a lot of things I hadn't considered in my planning. Like the ETL(thats a new rabbit hole I need to dive into) or adaptive content extraction to account of layout changes. I was just assuming I will set up reporting to notify me when I start getting no content and then I will fix it. So thank you for that. Do you setup redis or something to make sure some requests are accessed from the cache of recently requested data than scraping again or accessing the db? is that necessary? And at what point should a webhook be setup and for what purpose exactly? Thank you
AFAIK the button highlighting is a feature based on video subtitles, including those generated automatically, but still somewhat random. I didn't catch those because I was already subscribed and like the video a moment before you said it.
I don't think its a video subtitles feature. It just happens randomly in my experience. The thumb up button shakes and subscribe highlights. Didn't happen for me on this video though :(
Thank you for the amazing video! Much appreciated as a young web developer. By the way, none of the buttons lit up or did any animations... I am a subscriber, so I don't know if that's why. Peace!!!
Amazing content, Brother, please make a video explaining how to scrape dybamically loading powerBI tables on a website. There is simply no change in the html/css structure when you engage😅
Is there a reason/advantage to using Bright Data's "scraping browser" product instead of integrating their proxy and IP rotation services into a script I'm running on my own server?
what are the best ai scraping apps : suggestion/recommendations? Just looking for how our nonprofit organization is aligned with other organizations within a county of california in order to partner with them
Hi Forrest. I was wondering how you still feel about AI and the future of software engineering. With chat GPT out for over a year now, have your views changed much? Maybe a good topic for another vid.
this video is what I need. But whoaa so fast changes of screens with code... I'm too old at 35 to be able to push the pause button so fast 😅 Do you have some links with those hacks?
u said u prepared the video without the need of brightdata but for every issue except data storage u propose using brightdata for the most important&challenging parts....................? :/
if you are using selenium,puppeteer, or any other browser automation, you will never be a good web scraper, they are just too damn slow, if you are relying on them to get you passed the WAF javascript function and generate your cookies for you to then go scrape others will beat you to the punch with pure code
@@consolemodding1015 if you have to login repeatedly and solve captcha's, that delay is almost negated , pure code bots just generate new valid cookies, once you hit your 403 forbidden or 401 captcha new tokens are loaded and carry on, not to mention threads instead of instances, , reversing the WAF JS function is the key. a good pure code bot vs a good browser bot is likely to be around 100x more efficient
How can you scrape dynamic content without these tools? Anything else besides trying to find the API endpoint? I am a beginner who knows how to scrape simple pages. I want to learn how to scrape dynamic content. Would love to know your thoughts.
@@mianashhad9802A method that works is to clone the api calls that get the data from the backend server. You can find it in the network tab (fetch) in your browser's developer tools tab
Speaking of Females, if Hitler's fuhrer have Magog carrier of motorized machine monsters then the Northern Magog have ice snow predominant in their place near Arctic circle, and ice surface can better conduct gases and science elements and compounds interaction which can attract those science things from everywhere, who between them is stronger except for the Super Magog Dark Matter? Will they suffice at full force during the final battle end times?
In my opinion as i developed multiple web scraping application, half of the time is not spent coding but instead trying to reverse engineer the web application. Simple ones are just matter of looking at requests from dev tools and manually make api calls, while most complicated ones involve backtracing how content is loaded on the page to find the js code responsable to do that. Basically its 70% reverse engineering and 30% coding, if you do things the smart way.
Yep!
What's the benefit of manually doing API calls instead of just letting selenium click the buttons which will do the exact same thing?
@@mateusb09selenium has overhead
@@mateusb09 because it's faster, less code, lower cost, easier to maintain
@@Anthony-qg5hj I had a selenium project in which I tried the approach you’re talking about. Not only needed to attach the login cookies (which expire) to the request anyway but also I needed to manually construct the request skeleton.
So in the end I had a similar effort as I would have if I just force selenium to click buttons
Yeah. Scraping a dynamic website really makes me want to scream like Linus Torvalds to NVIDIA. And I also hate CloudFlare 😂
You can start a new browser or new context for every "goto()" with a different user-agent, that's how i do with CloudFare
interesting timing to see this video, literally the day after I completed my first full-stack application which literally revolves around web-scraping :D
You're the next Mark Zuckerberg
How do you web scrape secure website?
share website url
@@IshaqKhan010 cant share url in yt comments, gets autofiltered
And I'm starting a web scraping project
I used to web scrape all the time, but stupid js frameworks obsfucated css class names has made it very difficutlt.
I use the "[data-something="foo"], luckly most of the sites i need to scrape make use of this attr
i started learning about web scraping YESTERDAY, and stumbled upon your video today. GODDAMN the way you explain stuff and speak really stuck with me! thank you for providing such value and motivating me to improve my communication skills as well :D
I remember starting to watch your videos when I was entering computer science Ba, and as a 28 year old 1 semester left to graduate, you’re still uploading good content that’s unique. Never get tired of your vids , keep it up brother . I’m also concerned with the job market , can you make a vid about new grad Cs students ? For example seems almost every job wants front end or something and my school never taught any of it
You want to get a job from what your school taught you? You are in for a ride brother. Tech is about your own research and self learning, every fucking day.I pity people that majored in CS because they heard about a programmer earning 6figs
Unless u went to ivy league and wanna be a quant then u gotta do front end js react sql are key for majority. School is duhm unless ivybleague except for piece of paper
I saw this video recommended to me about two days after I had to scrape a ton of images and convert them to a PDF. The images are loaded dynamically and I will confess with shame that my script would scroll slowly down the entire page until it couldn't get any further. Then it would queue up all the appropriate image files and compile them into a local directory before turning them into a single PDF file.
I am working on building a project that heavily requires scraping so I been doing a lot of research. And its really hard to find anything good that is not sponsored by brightdata. I get it, their marketing team has done a great job with tapping a perfect niche of creators who provide valuable information but this also creates a problem to ending up finding that almost each good resource is related to using brightdata and its not something I want to pay for when starting a hobby project.
Anyway, this is a great video either way. I learned a lot of things I hadn't considered in my planning. Like the ETL(thats a new rabbit hole I need to dive into) or adaptive content extraction to account of layout changes. I was just assuming I will set up reporting to notify me when I start getting no content and then I will fix it.
So thank you for that.
Do you setup redis or something to make sure some requests are accessed from the cache of recently requested data than scraping again or accessing the db? is that necessary?
And at what point should a webhook be setup and for what purpose exactly?
Thank you
AFAIK the button highlighting is a feature based on video subtitles, including those generated automatically, but still somewhat random. I didn't catch those because I was already subscribed and like the video a moment before you said it.
I don't think its a video subtitles feature. It just happens randomly in my experience. The thumb up button shakes and subscribe highlights. Didn't happen for me on this video though :(
Thank you for the amazing video! Much appreciated as a young web developer. By the way, none of the buttons lit up or did any animations... I am a subscriber, so I don't know if that's why.
Peace!!!
It actually didn't.
To be honest, i subscribed because the button lit up. Also, I love your content.
dude is literally gilfoyle from silicon valley(love your vids)
i wasn't looking for web scraping video but his face drew my attention, i was like wait this is Gilfoyle right😂❤...
Fr
Great video and really nice energy, and I think you answered my question by using scrape browser to render javascipt headlessly. Thank you
This guy gets it-I’ve been there. I can’t wait to make this all an easy ass python plugin
The JD bottle in the background 😉
The cigars on the shelf ;)
Amazing content,
Brother, please make a video explaining how to scrape dybamically loading powerBI tables on a website. There is simply no change in the html/css structure when you engage😅
such a chill vid
I really like the way you explain things and also the pronunciation issues
The subscribe button didn't light up because I was already subscribed 👍
Hey, man do you have another channel where you teach live?????
If you have provide the link, please so I start learning more.
Can you recommend a course to learn web scraping. A course that taught the tool and techniques you mentioned and other concepts
i am searching for it too, beginner in webscraping
Is there a reason/advantage to using Bright Data's "scraping browser" product instead of integrating their proxy and IP rotation services into a script I'm running on my own server?
Boom. Thanks
can anyone help me? I can't seem to bypass cloudflare loading page with heedless brightdata webscraper
what are the best ai scraping apps : suggestion/recommendations? Just looking for how our nonprofit organization is aligned with other organizations within a county of california in order to partner with them
Is web scraping under data science or software engineering structure?
Depends on the purpose of the data you’re scraping and how it’s used, but it can be both.
Hi Forrest. I was wondering how you still feel about AI and the future of software engineering. With chat GPT out for over a year now, have your views changed much? Maybe a good topic for another vid.
interesting....thanks man
this video is what I need. But whoaa so fast changes of screens with code... I'm too old at 35 to be able to push the pause button so fast 😅 Do you have some links with those hacks?
I hate 502 error, I don't know how to solve it
hopefully brightdata ain't a snitch 🫠
The funny thing is when they block the ranges used by bright data xD
Damn this guy is cool
When I see brightdata sponsorship, I instantly stop watching. Paying to brightdata is not a webscraping skill.
Did u know how to bypass cloudflare or captcha without bright data?
Some people 😂
That's like saying.
"Oh well, these stupid people who drive cars, why would they do that when we still have horses?"
@@ZacMagee why should I pay it if I can do it free?😂
@@zeddscarlxrd4331 How to bypass cloudflare you can find easy.
@@vasyavasin7364do you still not need to scrap a bunch of proxies to use?
wow... i got a long way to go
u said u prepared the video without the need of brightdata but for every issue except data storage u propose using brightdata for the most important&challenging parts....................? :/
GOOD VIDEO🎉👍
My fucking hero
are there vids of that ???
I like your mustache
12:30 nuh uh 🗿🗿
if you are using selenium,puppeteer, or any other browser automation, you will never be a good web scraper, they are just too damn slow, if you are relying on them to get you passed the WAF javascript function and generate your cookies for you to then go scrape others will beat you to the punch with pure code
Define slow?
@@consolemodding1015 if you have to login repeatedly and solve captcha's, that delay is almost negated , pure code bots just generate new valid cookies, once you hit your 403 forbidden or 401 captcha new tokens are loaded and carry on, not to mention threads instead of instances, , reversing the WAF JS function is the key. a good pure code bot vs a good browser bot is likely to be around 100x more efficient
How can you scrape dynamic content without these tools? Anything else besides trying to find the API endpoint?
I am a beginner who knows how to scrape simple pages. I want to learn how to scrape dynamic content. Would love to know your thoughts.
@@mianashhad9802A method that works is to clone the api calls that get the data from the backend server. You can find it in the network tab (fetch) in your browser's developer tools tab
@mianashhad9802 : if attribute data changes, target the tag. If tag changes, target the Ajax calls.
Your mustache looks like a hedgehog 😂
Thank you Jesus
Examples. Are you a Leo? he he
Speaking of Females, if Hitler's fuhrer have Magog carrier of motorized machine monsters then the Northern Magog have ice snow predominant in their place near Arctic circle, and ice surface can better conduct gases and science elements and compounds interaction which can attract those science things from everywhere, who between them is stronger except for the Super Magog Dark Matter? Will they suffice at full force during the final battle end times?
stop web scraping