Building a bot to scrape job data… How NOT to collect data

Luke Barousse

มุมมอง 92 289

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 5 ม.ค. 2025

ความคิดเห็น • 219

@LukeBarousse 3 ปีที่แล้ว ⁺¹⁸
Where do you usually go to collect data for your data science projects? 👨🏼‍💻📊🤖📈
@ShivekMaharaj 3 ปีที่แล้ว ⁺¹
Epic Video, Luke! 👌
@ShivekMaharaj 3 ปีที่แล้ว ⁺³
You know, I actually wanted to tell you, my friend (the one working at VW) actually does the job of a Data Engineer, Data Analyst, Data Scientist, and ML Engineer 😂 Cause he's the first data scientist the company hired. He's in a team of one person 😂
@IMajidAhmadKhan 3 ปีที่แล้ว ⁺³
@@ShivekMaharaj you mean Volkswagen, no way bro
@johnrussell5715 3 ปีที่แล้ว ⁺⁵
Most federal agencies have open data warehouses. I downloaded all of the PPP loan data from the Treasury department.
@LukeBarousse 3 ปีที่แล้ว ⁺²
@@johnrussell5715 This is great to know, thank you for sharing this!
@KenJee_ds 3 ปีที่แล้ว ⁺¹⁴
Congrats on 100K Luke!!
@LukeBarousse 3 ปีที่แล้ว ⁺³
Thanks Ken! Honestly couldn't have done it without your support and shoutouts over this past year; it has helped out more than you know!! Here's to us making it to 1 million subs!
@johnrussell5715 3 ปีที่แล้ว ⁺⁶⁴
I saw an "entry level" data analyst job that required 6-8 years experience!
@LukeBarousse 3 ปีที่แล้ว ⁺¹⁰
🤣 That's just ridiculous (and also unfortunate)! Entry level jobs should not be asking for that many years of experience
@yasinudun4147 2 ปีที่แล้ว
@@LukeBarousse I have encountered so many entry level jobs that requires so many years that I want to make a project focusing on that. I was hoping to scrape data from linkedin. However, your video shows that it is not a good idea. I wonder how did you solve this problem. Thank you for your kind attention.
@LukeBarousse 2 ปีที่แล้ว ⁺³
@@yasinudun4147 Working on this project again RN actually... will have more details for how to do this in the near future
@victorbegnini5754 ปีที่แล้ว
😂😂😂
@nilsoncampos8336 6 หลายเดือนก่อน
😂😂😂😂😂😂
@928khaled 3 ปีที่แล้ว ⁺³²
So ı was watching one of these videos, and the guy was talking about ' how to not to violate terms and conditions' and get the data, he managed to pull an amazing stunt, see one of the issues of bots is your requests keeps their servers busy, what he did is minimizing requests action, an equivlent of opening page 1, then page 2and so on , without running any other scrabing commands, and then content of the pages he opened was saved somehow locally on their machine as a log, now he opens these files and pull the data from the saved pages offline, pretty amazing but a dirty job, you might want to look it up😅
@LukeBarousse 3 ปีที่แล้ว ⁺¹⁰
Thanks for sharing this! LinkedIn Job search pages are actually really slow to load (compared to other pages on my machine) and so I really feel like i'm taxing the LinkedIn servers; because of this I think you offer a really interesting solution to help minimize that. I'll take a look into this, so thank you!!
@wittyeva_ 3 ปีที่แล้ว ⁺³
@@LukeBarousse can’t wait to hear how it went
@928khaled 3 ปีที่แล้ว ⁺⁹
@@LukeBarousse the proper wording for the thing is HAR file, HTTP Archive format, you can use google chrome to download the file and at this point instead of scraping a website overloading it's server with requests, you're basically parsing a HAR file containg the data that you need.
Also one good tip when scarping: don't make your actions look like an outlier when you get analyzed by those who you're scraping, blend in the crowd, computers are fast, humans are not. 🤫😉
@LukeBarousse 3 ปีที่แล้ว ⁺⁵
@@928khaled Thanks for all these tips! I'm going to look more into HAR files and see if I can learn to apply it to my use case. Thanks again!
@thefamousdjx 3 ปีที่แล้ว
I dont get how this helps. If you opened a page then you have already taxed their server. Pulling info from the page isnt affecting them at all so saving the page offline is an unnecessary step.
@bouseuxlatache4140 3 ปีที่แล้ว ⁺²⁴
that was a nice shared experience Luke. a friend had talks with several website administrators and it seems that big companies are trying to limit web scrapping for legal and technical reasons, as it tends to slow down the website performance.
@LukeBarousse 3 ปีที่แล้ว ⁺⁹
Also very true! That's a great point on the technical reasons, that I meant to bring up in this video. 🤦🏼‍♂️ We sometimes don't think of how using a bot may actually slow down servers and thus effects the company's users experience for others, very valid!
@wanderingtravellerAB99 3 ปีที่แล้ว ⁺⁸
That's only the second reason. The main reason is - it's their data, and managing it and curating it is part of their value offering. They limit access to prevent dilution of their assets and maximise their own monetisation potential. Why do you think they built the website in the first place?
@bouseuxlatache4140 3 ปีที่แล้ว ⁺⁹
@@wanderingtravellerAB99 absolutely. however since this informtion is made available on the internet, which is not an intranet, a website can be considered a catalog or a booklet to be shared. also just like our own information will be used by the website during our visit it would be fair to say that the visitor can also save some information also.
@annxiao7721 ปีที่แล้ว
lol my hub is one of those folks working for evil corporate America trying to stop scraping..
@annxiao7721 ปีที่แล้ว
@@wanderingtravellerAB99 I agree with what you are saying about the value of data asset, but speed cannot be underestimated to become secondary. If you have devops friends in big companies, they can tell you website speed is gauged in microseconds. Slowing down the website for a fraction of time can cause millions of direct monetary losses.
@IamAmisos 4 หลายเดือนก่อน ⁺¹
Nice content! I have nothing to add, just wanted to share my appreciation!
@aliffnabil5542 3 ปีที่แล้ว ⁺⁶
i just did this project last week but scraping from indeed and pretty much can relate to you so much!!!!
@LukeBarousse 3 ปีที่แล้ว ⁺²
😂 You spend all this time building this beautiful bot, and then BAM! No more scraping!
@neamenbeyene7598 5 หลายเดือนก่อน
Did you run in to the same problem?
@Major_Data 3 ปีที่แล้ว ⁺³
Congratulations on 100K Luke!
@bouseuxlatache4140 3 ปีที่แล้ว ⁺¹
Yes! i thought we should celebrate
@LukeBarousse 3 ปีที่แล้ว ⁺⁴
Thanks a lot Al! So appreciative of your shoutouts for my channel, this has helped more than you know in reaching this milestone!
@LukeBarousse 3 ปีที่แล้ว ⁺⁴
And Bouseux, looking to launch a special video either later this week or next!
@bouseuxlatache4140 3 ปีที่แล้ว ⁺¹
@@LukeBarousse 🎆
@VhangorD2 3 ปีที่แล้ว ⁺⁴
awesome video! Thanks for sharing this content. I'm in my path to become a Data Analyst (Google DA Certificate in progress) and all this information is really useful.
@LukeBarousse 3 ปีที่แล้ว ⁺¹
Heck yeah, glad you are getting use out of my content Marcelo! Good luck with the Google Certificate!!
@arthurmiller5276 3 ปีที่แล้ว ⁺¹
This is a great followup to the earlier video. I'd hoped when you posted the previous video you would follow up with information on ethically dealing with the TOS limitations on various sites. Looking forward to further followup.
@LukeBarousse 3 ปีที่แล้ว
Heck yeah, thanks for the idea Arthur. More to come on this series!
@marlonlozada3048 ปีที่แล้ว ⁺¹
great video and explanation of your process and pains!
@Cowwy ปีที่แล้ว ⁺¹
I am learning about web scrapping now to tackle this project.
I really love how you laid out your thought process and share with us the legality of web scrapping.
Have you done any more job scrapping since this video?
@LukeBarousse ปีที่แล้ว
Yes! I have a video talking about how "I analyzed XXX,XXX jobs to solve this" Check it out!
@acsy10000 3 ปีที่แล้ว ⁺⁸
Nice project! I am really curios to see any kind of results of the data You have collected until now.
@LukeBarousse 3 ปีที่แล้ว ⁺³
Yeah I'll be looking into the results in the upcoming episodes of this series!!
@rem9486 3 ปีที่แล้ว
Luke! Congratulations!!! 💯🔥🥳🍻
@LukeBarousse 3 ปีที่แล้ว
Thanks Rem!!
@nigelstory5559 3 ปีที่แล้ว ⁺¹³
Definitely wanna be careful with the “public” jobs posting page too because they can ban your IP address, and that would suck… that happened to me with the U.S. Senate website
@LukeBarousse 3 ปีที่แล้ว ⁺⁷
This is good to know, that's why I'm still hesitant! Where you scraping data from the U.S. Senate website?
@Pedro_Israel 2 ปีที่แล้ว ⁺²
Reading the comments, if linkedin will not allow scrapping its data because of servers overload or business info. Then, they shoul provide this kind of analysis for all of us searching. Maybe they could charge it too, I would pay for that
@voidfinder3086 3 ปีที่แล้ว ⁺¹
This is really cool Luke.
@LukeBarousse 3 ปีที่แล้ว
Thanks Kumar, glad you enjoyed it!!
@gelatsx3724 2 ปีที่แล้ว ⁺²
Looking forward to chapter 2!
@LukeBarousse 2 ปีที่แล้ว ⁺¹
I need to get around to making this! 😳
@zolac9732 3 ปีที่แล้ว ⁺²
The data is publicly available without logging in to LinkedIn. However I noticed certain filter fields like easy apply, and experience level and unavailable.
@LukeBarousse 3 ปีที่แล้ว
Yeah, I noticed the sme thing... although for my purposes those filters aren't really necessary so that may work
@PradeepReddy-vm1be ปีที่แล้ว ⁺¹
Hie bro i luved ur work.
Am trying to collect jobs data but it got lot issues. Can you help me out with your git repo
@KenJee_ds 3 ปีที่แล้ว ⁺¹¹
May want to try to scrape glassdoor instead! I may know a playlist or something that can show you how 😉
@LukeBarousse 3 ปีที่แล้ว ⁺³
th-cam.com/video/GmW4F6MHqqs/w-d-xo.html
@LukeBarousse 3 ปีที่แล้ว ⁺³
😜 Should have went glassdoor instead, just watched your vid.. Your approach was a lot easier than mine 🤦🏼‍♂️
@mrhamster2983 2 ปีที่แล้ว
Ken ma man! ty for doing god's work!!
@youssefahmed6383 3 ปีที่แล้ว ⁺¹
awesome as usual ,luke
@LukeBarousse 3 ปีที่แล้ว
Thanks Youssef!! Appreciate it!
@davidmiller-td1sl 3 ปีที่แล้ว ⁺⁵
Glad you used a burner account! That’s a pretty hefty ban!
@LukeBarousse 3 ปีที่แล้ว ⁺²
I learned that trick from you Dave 😘
@rahulpareek328 3 ปีที่แล้ว ⁺²
Hello Luke
Thank you for providing such great videos
Love from 🇮🇳
@LukeBarousse 3 ปีที่แล้ว
No problem at all Rahul!! I appreciate the support!
@barulli87 2 ปีที่แล้ว ⁺¹
why isnt there a video on how to do this project?
@sitaramkakumanu ปีที่แล้ว ⁺¹
Very Nice video Luke.. I think the bot is logging at a regular period every time every day. how about creating a random time generator at a 1 hr window time so that they can't detect?
@muhammedanas638 2 ปีที่แล้ว ⁺²
can you make a video about how we can legally scrape a data from linked in or other applications for business purpose...??
@LukeBarousse 2 ปีที่แล้ว ⁺¹
Yes, working on a video rn on this topic
@arthemos4627 11 หลายเดือนก่อน
Did you find a way to overcome the 1000/2500 search limit?
@YMuhammadyusuf ปีที่แล้ว
hey, may linkedIn block my account if I do web scrapping in this way?
@troy_neilson 3 ปีที่แล้ว
Some very interesting points around click wraps, etc... Keen to see where you take this one.
@LukeBarousse 3 ปีที่แล้ว ⁺¹
I'm interested as well! Still brainstorming 😂 Thanks Troy!
@Major_Data 3 ปีที่แล้ว ⁺⁸
Linkedin jail is no joke! They keep throwing me in the slammer for the same reason, and I dunno how TF to even make a web scraper!
@LukeBarousse 3 ปีที่แล้ว ⁺⁵
That's why you gotta come to the dark side #🐍. 😂🤣
On a serious note, as my number one source of content on LinkedIn, I can't have you in LinkedIn Jail!
@caraziegel7652 3 ปีที่แล้ว ⁺¹
Very cool - i actually wanted to download my connections skills for a data science project, but got too busy with work . . . but i would realy only need a 1-time download, i think? idk, thanks for reminding me of this project idea anyways!
@LukeBarousse 3 ปีที่แล้ว
Glad I could motivate to do a similiar project!! Hope you dug back into this project!
@tacodumpling9282 2 ปีที่แล้ว
Do you happen to know any site that has plenty of uncleaned data? Most of the data are already cleaned from websites like Awesome Public Datasets, Kaggle, and Google Data Search. Thank you!
@LukeBarousse 2 ปีที่แล้ว
I have some listed in the description
@alecubudulecu 2 ปีที่แล้ว ⁺¹
Regarding public scraping. Put the py task on something like a VM or raspberry pi with a VPN.
They will ban your IP at some point.
Change IPs. Repeat.
You will have hundreds of IPs then after a few years can just change VPN or reuse old IPs.
@LukeBarousse 2 ปีที่แล้ว
I like this approach. I’m interested in trying this
@alecubudulecu 2 ปีที่แล้ว
@@LukeBarousse I do this with a few scrapers like this. Where the companies say “no”. I got blocked by IP so I set up a raspberry Pi that runs and I just change the IP whenever a block happens.
@LukeBarousse 2 ปีที่แล้ว ⁺¹
@@alecubudulecu Oh sweet! I have a few raspberry Pi's lying around actually!
@syhusada1130 2 ปีที่แล้ว
Thank you Luke.
I scrapped sports data recently and got banned. But it wasn't automated scrapping, and the ban was not harsh, still, would be wise to be more mindful of the scrapping we do.
@joseluizdurigon8893 2 ปีที่แล้ว ⁺²
Wouldn't setting a time.sleep() for a few seconds help the scrapping? I mean, what characterizes a bot is the speed (and automatically doing stuff obviously) but they see you're a bot because of the speed.
@eldarmammadov9917 ปีที่แล้ว ⁺¹
There is also browser Post info which lets server understand it is bot
@afaqahmed8869 2 ปีที่แล้ว
Sir I am trying to implement your scraping code from your github repository and while installing requirements.txt file ,it is giving me this error -->" ERROR: Invalid requirement: '_ipyw_jlab_nb_ext_conf=0.1.0=py38_0' (from line 4 of requirements.txt)
Hint: = is not a valid operator. Did you mean == ?"
I changed = to ==,but it is still giving an error.Sir is it a mistake on my side or something is wrong with the requirements file.
@pght95 ปีที่แล้ว ⁺¹
by any chance did you share your code on GitHub or somewhere else?
@LukeBarousse ปีที่แล้ว
Yeah I have a few repos shared on github
@godojos ปีที่แล้ว
@@LukeBarousse I didn't see this repo on your Github. Where did you land with deciding whether or not to follow up on this here video?
@LukeBarousse ปีที่แล้ว
@@godojos th-cam.com/video/7G_Kz5MOqps/w-d-xo.html
This video is the follow up!
@ChaseBank_3k 4 หลายเดือนก่อน
Do you have an updated video on this or does it still work?
@thibaudfreyd3834 3 ปีที่แล้ว ⁺¹
Thanks for sharing Luke, I am also building my internet scraper to gather house data, hope I do not run into similar banning issues!! 🤞
@LukeBarousse 3 ปีที่แล้ว ⁺¹
If you build it as close as possible to mimic human behavior, you shouldn't have an issue. Good luck with your scraper!
@DavidCarmonaUX 2 ปีที่แล้ว ⁺¹
I work at Linkedin and learning how to scrape data with octoparse to build data sets for product designers. We have allot of internal tools and I'm curious to combine methods
@LukeBarousse 2 ปีที่แล้ว ⁺¹
Thats interesting about Octoparse, I haven't heard of a lot of people using this solution for scraping; i'll have to check it out. Thanks for sharing David. What websites have you had luck with scraping?
@DavidCarmonaUX 2 ปีที่แล้ว
@@LukeBarousse I'm 100% linkedin focused, so I'm attempting to do something in the enterprise/consumer use-cases for anything professional.... using Airtable as the api for Figma or whatever software product uses. Currently stuck on doing batch url -> image downloads... and how to automate the deletion of data after captured (GDPR related). Thanks for your channel, I'm in a odd spot between design systems and trying to bridge AI or data science into product evolution.
@LukeBarousse 2 ปีที่แล้ว ⁺¹
@@DavidCarmonaUX This project sounds awesome, David! Good luck with it, it seems like you have the hardest part figured out!
@TechNetworkz 2 ปีที่แล้ว ⁺¹
where i can get the watch bro
@LukeBarousse 2 ปีที่แล้ว
amazon! It's a garmin
@mahdali6517 2 ปีที่แล้ว
THanks for sharing the awesome information but I was hoping you would take one project from the start to the finish .
@StopWhining491 2 ปีที่แล้ว
I always tell my students that if they didn't create it that they don't own it. I also remind them that software they didn't write will almost always comes with Terms and Conditions, that boring legalese that few people read; if they click "Agree to Terms and Conditions," they've just likely created a binding contract with the developer. Caveat emptor.
@laxyaberde8829 3 ปีที่แล้ว ⁺²
Hey Hi Luke Add more videos I like your explanation and you are doing nice work.
@LukeBarousse 3 ปีที่แล้ว
Aww thanks so much for this Laxya! Let me see what I can do!
@abhinav2529 3 ปีที่แล้ว ⁺²
Is it hard to work as a data scientist, I mean is it a chill job where you get a lot of time left or is it hectic?
@LukeBarousse 3 ปีที่แล้ว ⁺¹
I'm a data analyst, so I may not be best to answer that
@Pedro_Israel 2 ปีที่แล้ว ⁺²
I started doing a project with the same objective but found it too hard to retrieve the data. Do you have the table with that data??. Anyways, great video, I didnt know it was against policy either.
@LukeBarousse 2 ปีที่แล้ว
Thanks Pedro!
@dataArtists 3 ปีที่แล้ว ⁺²
Great video "Johnny"! You may also want to look into scrapping info that contains any PII (Personally Identifiable Information) that the postings have and its relation to GDPR if that information is from an EU member country.
@LukeBarousse 3 ปีที่แล้ว ⁺⁴
I'm going to have to start going by Johnny now 😂 Thats also a good point on analyzing it from the GDPR perspective... I solely focused on the US in this case but may need to think larger, thanks for this Robert!
@dataArtists 3 ปีที่แล้ว ⁺³
@@LukeBarousse That sounds like a new Data name. Think Larger = Data Superhero (like game boss level).
@jaredalbin5658 3 ปีที่แล้ว ⁺²
8:43 you're not comfortable doing that shit? I get it man, haha
@LukeBarousse 3 ปีที่แล้ว ⁺¹
🤣😂 Still deciding, but I really want that data from LinkedIn 🤷🏼‍♂️
@juanpaolo21yt 3 ปีที่แล้ว ⁺¹
Imnot a pro but do some simple automation or bot at work. You inspired me to study more and do more. Thank you sir! From the Philippines.
@LukeBarousse 3 ปีที่แล้ว
Thanks so much for this!! This is actually inspiring to me as well to hear, so thank you!
@mdhidayat5706 3 ปีที่แล้ว ⁺¹
Thanks for the tips on burner account.
@LukeBarousse 3 ปีที่แล้ว ⁺²
No probs! But I don't know if i'd recommend the burner account 🤣
@Night_Sketching ปีที่แล้ว
Love scraping, but I dont think that not being logging in will somehow void their TOS. The thing is that data available to see on a website without a log in, might be still protected by some law.
@kunalghosh8852 3 ปีที่แล้ว ⁺³
May I know which course did you take to design a bot that you are using for web scraping?
@LukeBarousse 3 ปีที่แล้ว ⁺⁶
Yeah! Python for Web Scraping by Data Camp: lukeb.co/WebScrapingPython
@kunalghosh8852 3 ปีที่แล้ว ⁺¹
@@LukeBarousse thanks! :)
@G3nM 3 ปีที่แล้ว
You are a legend, bro!
@LukeBarousse 3 ปีที่แล้ว ⁺¹
I appreciate that my dude!! 🤙🏼
@matpikachu 2 ปีที่แล้ว
So web scraping is like accessing by an API just dirtier data?
@LukeBarousse 2 ปีที่แล้ว ⁺¹
Yeah, pretty much
@SunDevilThor 3 ปีที่แล้ว
So far, I’ve only been banned from one site, which was from the UK that I would never visit regularly. I was just using it to practice.
@LukeBarousse 3 ปีที่แล้ว
Thanks for sharing this!
@hamdimohamed8913 2 ปีที่แล้ว
is the code open sourced ? can't find any github repo ?
@LukeBarousse 2 ปีที่แล้ว
github.com/lukebarousse/Job_Analysis
@LukeBarousse 2 ปีที่แล้ว
Forgot to lkink it
@hamdimohamed8913 2 ปีที่แล้ว
Thanks i found it in an other video of yours 💜
@Aquaowen 5 หลายเดือนก่อน
This is very interesting!
@ramiboy_y2049 ปีที่แล้ว
I need the next part of the video. Or maybe someone has used scraperapi I really need that data from linkedin.
@LukeBarousse ปีที่แล้ว
check out my how I use python video... I provide the data there
@anirbanpatra3017 2 ปีที่แล้ว
Let Me Give You a Tip.Use Undetected Chromedriver as your webdriver in Selenium.
Still You should use all the ways to avoid detection.
@LukeBarousse 2 ปีที่แล้ว
Thanks for this, i'm going to try this!
@jordanhughes4436 3 ปีที่แล้ว ⁺¹
How much time do you put into these projects including the googling and the learning?
@LukeBarousse 3 ปีที่แล้ว ⁺¹
It really depends. This one I probably have over a 100 hours. It’s not necessary for projects to spend this much time but sometimes I go a little overboard. Ha
@pureffecto 3 ปีที่แล้ว
4:19 are you using VSCode ?
@LukeBarousse 3 ปีที่แล้ว
Yeah! love me some VSCode for python
@johnatef9076 3 ปีที่แล้ว
Is there any site other than google to get a free data analysis course from with certificate ?
@LukeBarousse 3 ปีที่แล้ว ⁺¹
I'm not sure on this one... i'd have too look at it more. Just so you're aware the Google Data Analytics certificate has financial aid available, all you have to do is apply to see if you can get it
@vectoralphaSec ปีที่แล้ว
At least your IP wasnt banned.
@datastack1883 3 ปีที่แล้ว ⁺¹
That is valuable info.
@LukeBarousse 3 ปีที่แล้ว ⁺¹
Aww thanks Priyadhara!
@faustopf-. 2 ปีที่แล้ว
Why nobody shows in their tutorials a real set of data extracted with their so mencioned "method"?
@bboyExia 2 ปีที่แล้ว
I felt this. I usually stay away from scrapping nowadays. And pray to god there's some api or third party unofficial api for data. Cant be bother to invest in creating something crawl only for it get banned or ip hit. Theres another comment on admins hating crawlers. So theres that too.
@acararslan732 3 ปีที่แล้ว
that was the coolest thing i have ever seen
@LukeBarousse 3 ปีที่แล้ว
Ha, thank you!
@BenKlock-k9w 2 หลายเดือนก่อน
You left me hanging at the end. Like what does “this publically available data” mean? It seems like you cut this video off short.
@michelchaghoury870 2 ปีที่แล้ว
i would like to see a tutorial on how to do webscrping with python on amazon, ebay or any other place to get data, can you please make tutorials about these topics please and keep going
@LukeBarousse 2 ปีที่แล้ว ⁺¹
Thanks for this video idea Michel! Let me see what I can do on this topic!!
@michelchaghoury870 2 ปีที่แล้ว
@@LukeBarousse great, thank you, you really give me passion in every single vid, please keep going your content is very helpfull, can you please in the future do more technical or project based vids, once again thank you
@DarkGT 2 ปีที่แล้ว
This remands of stories of how people hire low cost workers from India to solve the I'm Human tests for those automated bots and other shady things.
@LukeBarousse 2 ปีที่แล้ว
Interesting!
@luiszimmermann3196 3 ปีที่แล้ว
Funny coincidence… last semester during a project i was also scraping job data for jobs with the search terms: data governance, data culture etc. and i encountered exactly the same issue. Now this video pops up :D
@LukeBarousse 3 ปีที่แล้ว ⁺¹
I feel your pain then!! ha
@storygenix3099 3 ปีที่แล้ว ⁺¹
You look like the brother of Kalle Hallden
@LukeBarousse 3 ปีที่แล้ว
😂😂
@asteriskconfidential7403 3 ปีที่แล้ว
but I still dont know how LinkedIn discovered that you use scraper. maybe you access too frequently ? scroll too fast? or your access is python request?
@thefamousdjx 3 ปีที่แล้ว
Selenium can be detected
@LukeBarousse 3 ปีที่แล้ว ⁺¹
yeah, I need to look into if I can hide this...
@TheVintageEngineer 3 ปีที่แล้ว
Probably all kinda of stuff. If User Agent isn’t set to common one, it will flag. Some sites use Akamai servers which have all kinds of anti-bot measures. Request per time frame, progressive cookie data, user agent filtering, basic http request header filtering (like if the accepted language isn’t just right).
@thefamousdjx 3 ปีที่แล้ว
@@LukeBarousse Sure. I once created one on linkedin but I only used selenium to login then passed on the session to the requests library to continue the scraping with ease. Using requests helps avoids having to deal with captcha and 'Are you a robot' checks. But I wasnt running it everyday so maybe I probably would have met the same fate
@LukeBarousse 3 ปีที่แล้ว
@@thefamousdjx Thanks for sharing this!
@BhupeshRajShakya 3 ปีที่แล้ว
Once i used to slow down abit the script so that i could avoid recapcha
@LukeBarousse 3 ปีที่แล้ว ⁺¹
Yeah some waits to add time between requests
@KiranSharma-ey6xp 3 ปีที่แล้ว
Nice,
You forgot to attach scrap data csv :D :D
@LukeBarousse 3 ปีที่แล้ว ⁺¹
😜
@Kaboomnz 3 ปีที่แล้ว
Nice work around for the legalities at the end but I have to address something you said. "python is the superior language" is factually incorrect, C# is far better than python, you should use it.
@LukeBarousse 3 ปีที่แล้ว
😂 I was mainly saying it as a joke... I feel all languages have their pros and cons... i'm just a python fan boi
@Kaboomnz 3 ปีที่แล้ว
@@LukeBarousse lol fair enough. I've been using C# for years, it's easier to learn than you might think and the multi-threading is top notch.
@vibe-runewild 3 ปีที่แล้ว
How can I contact you if I want you to
A bot for me?
@LukeBarousse 3 ปีที่แล้ว
I don’t consult, per se, so sorry about this.
@redmashbeats5479 3 ปีที่แล้ว
Thts amazing. Motivates me to do the same
@LukeBarousse 3 ปีที่แล้ว
Heck yeah, glad this motivated you!!
@keifer7813 2 ปีที่แล้ว
Noone and I mean noone makes videos quite like you.
@LukeBarousse 2 ปีที่แล้ว ⁺¹
I really appreciate this! That's my goal with my videos 🙌🏼
@keifer7813 2 ปีที่แล้ว
@@LukeBarousse Keep doing your thing man, look forward to em!
@BLK_O 3 ปีที่แล้ว ⁺¹
Hope you've used a burner account
@LukeBarousse 3 ปีที่แล้ว
🤣
@sukruthav7845 2 ปีที่แล้ว
Can u share the source code?
@LukeBarousse 2 ปีที่แล้ว
It’s linked in the description
@abdelkrimbentorcha6227 3 ปีที่แล้ว
great!
@LukeBarousse 3 ปีที่แล้ว
🤙🏼
@ivinitmittal 3 ปีที่แล้ว ⁺²
👍🏻👍🏻
@LukeBarousse 3 ปีที่แล้ว
🤙🏼🤙🏼
@abcdwerrty 3 ปีที่แล้ว ⁺²
👍
@LukeBarousse 3 ปีที่แล้ว
🤙🏼
@noentry1736 3 ปีที่แล้ว
Dude you look like DR house
@LukeBarousse 3 ปีที่แล้ว
Or does Dr. House look like me... 😜
@MdHamid-xz3pw 6 หลายเดือนก่อน
0:24
@meisuci4708 3 ปีที่แล้ว
Do you have a WhatsApp bot?
@LukeBarousse 3 ปีที่แล้ว
I don't
@Awaksica 3 ปีที่แล้ว ⁺²
First
@LukeBarousse 3 ปีที่แล้ว
Second!
@myfathertaughtmethat 2 ปีที่แล้ว ⁺¹
Hey Luke, Would you be interested in developing a scraping took for us? I have minimal experience in this area, so it needs to be easy to use. I am sure you know that Facebook, LinkedIn, and seek all hold contact details of people, and we are looking to use this bot to help us find candidates for our clients. We will need to be able to search for people with specific Job titles and then capture their email and/or phone number from these sites.
@LukeBarousse 2 ปีที่แล้ว ⁺¹
Sorry I don't do consulting, really just trying to focus on TH-cam Content
@hashdata-official 2 ปีที่แล้ว ⁺¹
Sir I am trying to make a LinkedIn scraper from 10 days but I am not getting a proper guide to do so! I am using beautiful soup to scrap but it is unable to scrap LinkedIn page because of JavaScript! Will you make a video how to make a LinkedIn scraper? Plzzzzz🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏 I will be very thankful to you for this act of kindness ❤️🙏😭
@honneyykhatter 6 หลายเดือนก่อน
the simplest way to scrape linkedin jobs is using a 3rd party linkedin scraper API. I find brightdata and scrapingdog to be the best.
@MdHamid-xz3pw 3 หลายเดือนก่อน
Hotocmt
@ameysonawane 3 ปีที่แล้ว ⁺¹
👍
@LukeBarousse 3 ปีที่แล้ว ⁺¹
🤙🏼

ต่อไป

เล่นอัตโนมัติ

Learn PYTHON to be a DATA ANALYST?!? (or is R enough...)