Python Web Scraping with Beautiful Soup and Regex

Engineer Man

มุมมอง 199 839

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 22 ม.ค. 2025

ความคิดเห็น • 278

@cetrusbr 6 ปีที่แล้ว ⁺⁴⁵⁶
I like your tutorials because u go directly to the content, something rare in youtube these days...
@kalef1234 6 ปีที่แล้ว ⁺¹⁸
Hey guys what's up before we get started smash that subscribe button, like this share it i am giving away a fucking gift card follow the links to my merch watch my ads really helps thanks okay...roll that intro *45 second intro*
@sourabhch3044 3 ปีที่แล้ว ⁺²
So true thank you for putting out the points which matters.
@justinhamilton8647 2 ปีที่แล้ว ⁺¹
Cheers man i used this tutorial to sort through 310000 embed links you’re so awesome
@mixalismcgamer3188 5 ปีที่แล้ว ⁺¹⁹
Dude i watched over 15 videos+ that was recommended and after hours i found this FULLY EXPLAINED.
@kurdmajid4874 3 ปีที่แล้ว ⁺²
he makes it so quick and simple
@kalef1234 6 ปีที่แล้ว ⁺¹³
I felt so powerful as soon as I pulled an array of strings from a random website. Thank you for your great tutorial
@zigginzag584 4 ปีที่แล้ว ⁺²
It helps so much to have someone that matches your personality when learning stuff.
I can't stand when asking someone for instructions on how to do something and they tell me everything
that I can expect and every once i a while throw in the thing I'm supposed to do next.
None of the fluff here. Just context. Every other creator would/has made this subject a 45min+ video
but here I am feeling proficient after just 14 minutes with EM.
Thank you, Sir!
@EngineerMan 4 ปีที่แล้ว ⁺¹
You're welcome buddy!
@mhalton 2 ปีที่แล้ว ⁺¹
13:52
Happiest man!
@EngineerMan 2 ปีที่แล้ว
Oh god I'm not gonna be able to unhear that any time soon.
@bhumikakhiyani4230 4 ปีที่แล้ว
I was struggling to navigate to iterate through second span tag in multiple td tags I.e. (tr[1:]/td[0]/span[1])
I was trying it the whole day.
This is the best tutorial I have seen.
Thank youuuuu.
@CODTALES-KILLSTREAKS 5 ปีที่แล้ว ⁺²
Hey man! I watched this and applied the concepts to a weather site and made a csv of all the sunset / sunrises in 2019! Thank you! Please I love the way you explain things keep making videos sir! I have applied your teaching in a couple videos and it’s great! Learning so much!
@dilshand.5127 6 ปีที่แล้ว ⁺¹⁴
I was able to do this on another leaderboard site, appreciate your work here.
@PS3PCDJ 9 หลายเดือนก่อน
This is THE best beautifulsoup tutorial on the internet.
@xrefor 5 ปีที่แล้ว ⁺¹⁰
Love this presentation. Straight to the point with short and specific explanation. Keep it coming! :)
@robertpearson2143 6 ปีที่แล้ว ⁺²
Been doing something similar for a while but in a much more complicated way. Looking forward to making my life much easier. Thank you!
@enyoc3d 5 ปีที่แล้ว ⁺³
in a sea of youtube tutorials yours is the pearl. thanks!
@TomSilver_42 3 ปีที่แล้ว
Simply brilliantly explained. I have seen few of your videos and I like your style, therefore You have earned another subscriber.
@impossible441 6 ปีที่แล้ว ⁺¹
This is remarkable, very informative and down to the earth - I really love this concise format of yours which is rather contradictory to what most of ppl on yt are providing
@yanggao4878 4 ปีที่แล้ว
Your videos are fast-paced and straight to the point. Thanks!
@ViniciusProvenzano 3 ปีที่แล้ว
Real Nice content! Straight to the point. I’ve played around with beautiful soup a few years ago for an small project, and I just wish this video was around at the time....
@susbedoo 5 ปีที่แล้ว ⁺¹
You are the coolest tech guy I have ever seen on TH-cam
@ladyViviaen 4 ปีที่แล้ว
was trying to scrape modarchive for my project, this is way better than writing the name and id down by hand lmao, thank you!
@Yeezybreezzy 6 ปีที่แล้ว ⁺²
If only I had this tutorial a few years back. Good stuff.
@worsethanjoerogan8061 6 ปีที่แล้ว ⁺¹
Dude you're helping me out immensely with computer science courses
@SusiEzhil 5 ปีที่แล้ว ⁺⁷
wow.. thats the crisp explnation,,, you're the man!!
@kennethmcquade4341 6 ปีที่แล้ว
You're definitely skilled! For anyone watching these videos, don't get discouraged, this takes time. @Engineer Man , Can you talk about the experience of learning how at the beginning of your videos?
@outbackgeek 5 ปีที่แล้ว ⁺¹
Thanks for the video. I like how you take the basics and break it down with really good and practical examples.
@Lu3ck 5 ปีที่แล้ว ⁺²
Your videos are fast but glorious! Love your content man! Thank you! Bless 🙏
@estilen69 6 ปีที่แล้ว ⁺⁹
Using CSS selectors is the way to go, gets rid of nested for loops and is more robust.
@matteomannini1205 3 ปีที่แล้ว ⁺¹
how?
@axelcano1623 6 ปีที่แล้ว
Really nice content! You explain just enough to be clear but not too much that's perfect. Please continue to remind the type of the elements you create, it's very important for beginners.
@rustyelectron 6 ปีที่แล้ว ⁺¹
This video is really a good intro to web scraping.
@ledosilverknight4619 6 ปีที่แล้ว
Some of the best tutors are always straight-forward: down and dirty!
@arturmangabeira9990 6 ปีที่แล้ว ⁺¹
EM you're awesome. i was studying web scraping and this come up. subscribed yesterday to your channel! lol
@EngineerMan 6 ปีที่แล้ว ⁺¹
Nice!
@johnbecker3116 6 ปีที่แล้ว ⁺¹¹
I spent forever teaching myself this last week and now you post this. Kill me now
@stephenrochester6309 6 ปีที่แล้ว ⁺¹
These videos are brilliant. Thanks for all your hard work.
@asdfasdfasdf383 4 ปีที่แล้ว
You go straight to the point. Obviously, you know a lot more in-depth about this topic. Anyway, I like it.
@DrSarge37 6 ปีที่แล้ว ⁺¹⁴
It would be cool to see how to deal with pagination. So you want data from /page=1, /page=2 etc. Etc.
@joefagan9335 4 ปีที่แล้ว ⁺⁴
In your browser go to next page and copy the url of, say, page 2 and go to last to find the last page url. Use that as a template to build the url of each page you want. Loop over them in turn.
@joefagan9335 4 ปีที่แล้ว
John Keymer nope you’re not parsing the page a second time to find the next button. You scrape the current page and then grab the neat page by creating the string for the next url and accessing the next page - just one grab per page.
@andriybortnik8310 6 ปีที่แล้ว ⁺²
This is an awesome video, I actually enjoy the in depth walk through of what your reasoning behind writing code is, step by step. Versus just saying " I did this" and not really explaining anything. On a separate note , I'm looking to get into python, and I have previous code development experience, but It's been a little while, and setting up an environment to start doing some coding is a bit daunting. I'm looking to do more on the machine learning , neural networks side of things. I don't struggle with any of the logic, mathematics, but I know there are many pros/cons of various IDE's . Some have better support for various packages , etc.. I was wondering if you could either make a video on some of this information, or maybe throw a few pointers my way. I would really appreciate that. Otherwise, keep up the great content!!!
@KingEbolt 6 ปีที่แล้ว ⁺³
Let me throw some pointers at you.
0x3A738216
0x6B321970
0x88AC172B
@EluviumMC 6 ปีที่แล้ว ⁺³
I've found that I really like using Microsoft's VS Code (not to be confused with Visual Studio). The IDE has a good clean interface, lots of extension support, and a built-in terminal.
@andriybortnik8310 6 ปีที่แล้ว ⁺¹
@@KingEbolt I can't even get mad at that... Well done
@camaulay 6 ปีที่แล้ว
@@EluviumMC +1 VS Code, switched from Sublime
@PriZ0nM1ke 6 ปีที่แล้ว
Wow these videos are awesome! Direct and concise but understandable!! Well done!
@K2ThaYo 6 ปีที่แล้ว ⁺¹
Beautiful video man! Really valuable information here. As a sysadmin with over 10 years experience, I can state its really clean method of scraping. I was used to use bash scripts for everything but using libraries in python is sooo helpful. It would be a pain in the as in bash with awk, grep, etc. I hope to see more soon
@oromis995 3 ปีที่แล้ว
This content is absolute gold.
@Omar-ic3wc 5 ปีที่แล้ว ⁺³
Exactly what I needed thank you very much!!
@EluviumMC 6 ปีที่แล้ว ⁺⁶
Happy that you've chosen this topic. I've been exploring web scraping and have a script that works pretty well on a site that I frequent. Another awesome tool that can be used to also automate web navigation is the selenium package. But on more of a question-related note, I know the script you just made was pretty simple, and the one I have isn't that complicated, but I've been wondering how one would go about writing an object-oriented script for scraping?
@UchihaAditya 6 ปีที่แล้ว
What are the advantages of selenium over Beautiful Soup?? I have a web-scraping assignment now and was advised to use selenium.
@EluviumMC 6 ปีที่แล้ว ⁺²
Selenium can be used as a web scraper, but I use it more for web navigation and then use beautiful soup to actually get the data I need from the pages once they've been navigated to. I just find beautiful soup to be a more intuitive for extracting the data.
@yixunnnn 6 ปีที่แล้ว
With selenium it is like an automated user, and when you use it, you require a web driver, and you can choose if you want the automated browser to run in the background or not. I recently used selenium because I was trying to request for content behind a microsoft login page, which is loaded using javascript, thus I needed to wait till the content was actually loaded finish before i submit anything. Unlike requests, which instantly retrieves the page content.
@qettyz 5 ปีที่แล้ว ⁺³
These were really good examples, thank you!
@DrChrisCopeland 5 ปีที่แล้ว
how would you modify this for nested div elements in place of table row and cell elements?
@syntaxis5584 6 ปีที่แล้ว ⁺¹
why did you use 'View page source' instead of 'inspect' to find the page structure?
@EngineerMan 6 ปีที่แล้ว ⁺²
I did it because view source represents the content that was delivered to the browser on load whereas inspect represents the content currently on the page. Since the scraper doesn't see anything dynamically generated, view source is best.
@luis96xd 4 ปีที่แล้ว
Wow, I liked this video so much! It was very useful! 😄
You really have helped me a lot, it was well and fully explained, with real life examples
Thank you so much for this tutorial! 👏👏
@FreeDomSy-nk9ue 3 ปีที่แล้ว
How do I combine this with Login? For example, I want to log into my TH-cam account and scrap data from my favorite videos url.
@princepeach_ 4 ปีที่แล้ว
I have an issue though, I’m web scrapping my stats from a website and when my stats update the webscrape doesn’t update it.
@virtualize2424 4 ปีที่แล้ว
How do you scrape something like TH-cam comments (without using TH-cam api)? When I get the html data for a video using requests library, the video's comments are not their in the html data.
@chrisabreu7469 6 ปีที่แล้ว
your videos are a life saver man. keep up the great content
@legioner304 6 ปีที่แล้ว ⁺²
3 searches in the loop - very dirty )
"The speed of software halves every 18 months"
@grantfaith 4 ปีที่แล้ว
ty, saved me an hour of time from all these other videos. holy shit
@recitoprasidha5761 6 ปีที่แล้ว
but how do we scrape the nowdays web that uses javascript framework that if we look at "view page source" we dont see html tag anymore. bcause it is wraped already with js
@kristiyangerasimov6708 3 ปีที่แล้ว
Great video. Stuff like that makes me want to program and develop software until i die.
@ddmin3082 6 ปีที่แล้ว ⁺¹⁰
Awesome video! Can you do one on the requests module please?
@bed781 3 ปีที่แล้ว
Is there a scraping method that can read the javascript content generated?
@affezippel7214 3 ปีที่แล้ว
is there somebody who did extract the data of the golf website, like getting all the names, numbers and emails of the club contacts but without regex instead using beautiful soup? I'm stuck there and would appreciate some help. I also wrote my problem in the EM discord channel in python
@treybailey6752 6 ปีที่แล้ว
Great vid with fantastic content. Would love to see this where you first login in order to get content. Getting the headers set is a challenge.
@EluviumMC 6 ปีที่แล้ว
Using Selenium to do the site navigation to get you logged in is how I worked around getting into a site that requires login credentials prior to scraping.
@ilobuhabib8325 2 ปีที่แล้ว
love your tutorials.
I tried following your method to scrape a site, but the output is empty. when I checked the 'tr' throughout the source code, it has values, but I do not understand why the output is empty.
@ne12bot94 5 ปีที่แล้ว
Just wondering is there way to filter it and remove all the garbage that they send back? Idk?v😐v
@JeanDAVID 5 ปีที่แล้ว
I have difficulties to soup data with some tags in HTML files like . I use soup = BeautifulSoup(myfile, 'html.parser') and all the link tags turned out to be tranformed to . How come
@DirtySocrates 6 ปีที่แล้ว ⁺²
Excellent! Thank you!! Great vid!
@gabrielh5105 4 ปีที่แล้ว
Why can't I find specific content on pages like whatsapp? I would like to fetch the name of a person, and I did as you said, by checking the source and getting the div class, but it simply doesn't appear in the soup
@kingseekerbackup3085 4 ปีที่แล้ว
I use requests and bs4. Never thought of using regex besides pattern searching
@bennieliu3261 6 ปีที่แล้ว
Awesome tutorial man! Can I suggest scraping dynamic pages as the next tutorial. Would be a sweet follow up
@EngineerMan 6 ปีที่แล้ว
Thanks. Part 2 of this is being requested a lot, I need to see what is best to do.
@xppaicyberr 4 ปีที่แล้ว ⁺¹
Great content
@jarodmorris611 6 ปีที่แล้ว
Anyone know of any tutorials on how to escape parsed data for inclusion in a MySQL table? Been getting errors that I'm sure have to do with unicode to UTF-8 Conversion but I have had no luck in finding anything to show how to escape / encode text so it doesn't throw and error when inserting into MySQL.
@chowfatt38 6 ปีที่แล้ว ⁺⁵²
Great video again. I've been playing web scraping a while and I find that most of websites nowadays using javascript rendering quite heavy. Will you make a part 2 for talking about how to web scrape javascript rendering website? And what do you think about another web scraping package, Scrapy? thanks Man
@poidog22 6 ปีที่แล้ว ⁺²
This would be a great follow on. +1
@cruzab3153 6 ปีที่แล้ว ⁺²
Selenium is good and easy....
@trailrider6844 6 ปีที่แล้ว
+2
@tayfun6378 4 ปีที่แล้ว ⁺¹
puppeteer does a good job these days I think
@Megaloplex 4 ปีที่แล้ว
+100
@royslapped4463 2 ปีที่แล้ว
this is perfect for what I needed thank you!
@supalistmain4882 6 ปีที่แล้ว
@Engineer Man , what is your day job? And how did you get into coding? Do you have a CS degree? and.... well instead of more questions, rather just ask whats your background (ito what lead to you adding so much value with these vids)?
@SiegeX1 6 ปีที่แล้ว
Can you go over an example that first requires you to login and then requires you to use a query string with a hash token that changes after every login?
@soldiergaming2722 6 ปีที่แล้ว
That's great and all but what I wanna know is... How the hell did he ctrl + u and get neat html rather than the smooshed together junk I get
@jeyakarankarnan7384 4 ปีที่แล้ว
could you please tell me how to export the data into csv file after this?
@santiagorivera1562 5 ปีที่แล้ว
What is the advantage to using Beautiful Soup over other webscraper packages with Python?
@stephentomaszewski8501 6 ปีที่แล้ว
why don't you have to create an empty array for the variables first and why do they not get printed out each time you iterate over the for loop?
@EngineerMan 6 ปีที่แล้ว
Sorry, I want to answer this but I'm not sure what part of the code you're referring to.
@stephentomaszewski8501 6 ปีที่แล้ว
At 12:30 place and username aren't initialized and the print statement is inside the for loop. Thanks!
@werecow68 5 ปีที่แล้ว
Wondering if you or someone else know why with Python 3.7 installed on Windows and using Visual Studio I get an error that the requests module is not installed? Obvious why I guess but where or how do I get the requests module? TIA
@werecow68 5 ปีที่แล้ว
Replying so others see if they have the same issue. In VS next to the Python 3.7(64-bit) click the icon to the right that is a package and you can install packages from there. :)
@blevenzon 6 ปีที่แล้ว ⁺¹
Wow just found your channel by accident and I’m loving it. Awesome content!! Do you think you can do a vid on Elastic Stack?
@daltonkraklan 2 ปีที่แล้ว
This was so freaking helpful
@lakshancosta 5 ปีที่แล้ว
how can you get a data thats inside a div class without td,tr
@Berghiker 5 ปีที่แล้ว
How do I import BeautifulSoup into Python vers 3.8? It says Module not found error
@eMasterClassAcademy 4 ปีที่แล้ว
pip install bs4
@laxlyfters8695 6 ปีที่แล้ว ⁺⁸
Went through a 30 second hillshire farms ad. Great match youtube
@EngineerMan 6 ปีที่แล้ว ⁺¹³
Google knows you're into web scraping and sliced turkey lol.
@laxlyfters8695 6 ปีที่แล้ว ⁺¹
Engineer Man no lie came back and got an ad for $3 jack box munchie meals. TH-cam thinks your fans are stone while watching your videos
@staynjohnson4221 4 ปีที่แล้ว ⁺¹
The website ( umggaming.com/leaderboards ) now has cloudflare causing the request.get() to give 503 status_code. any solution to this ?
@ronaldowatson89 5 ปีที่แล้ว ⁺³
thank you so much for this amazing tutorial, i would like to ask what do we do if the site i want to scrap require to be logged in btw this got recap
@joefagan9335 4 ปีที่แล้ว
Usually, you can login first. Leave it open in your browser and scrape away.
@alfredleppanen6796 4 ปีที่แล้ว
Hey great video! Lets say in your last leaderboard example, I would like to get notified when the leaderboard has changed, so to say when something changed on the site. I have built a script where I can see the HASH change, but I cant output what actually changed on to website, do you have any tips to how to monitor what actually changed on the website?
@donaldandmijung 2 ปีที่แล้ว
great tutorials! do you have a tutorial on scraping with a function( ) using beautiful soup
@BackTesterLive 3 ปีที่แล้ว
Does anyone know how to display extracted data on my own website ?
@braulioramirez3463 6 ปีที่แล้ว
I copied and pasted the code into the example py program but I get the following error when I run it:
Traceback (most recent call last):
File "scrape1.py", line 1, in
import requests
ImportError: No module named requests
What am I doing wrong?
@FM-tq2gs 6 ปีที่แล้ว
Use the command "pip install requests" in your terminal.
@stefandevos1520 6 ปีที่แล้ว
love your tutorials man
@TheFakeVIP 6 ปีที่แล้ว
I've been using this to scrape pcpartpicker. Only problem is, at least as far as I know, the only way to get the html is with a phantomjs script, as it's loaded dynamically. Any thoughts on how I can do this all directly in one python script?
@LiamHz 6 ปีที่แล้ว ⁺¹
You can use ghost.py [1] to interact with JS on websites
[1] github.com/jeanphix/Ghost.py
@JDRudie-ec4xq 6 ปีที่แล้ว
what auto complete package are you using for atom?
@StrangeIndeed 4 ปีที่แล้ว
I wanted to scrape 4channel. I wanted to get all the thread divs. But I got nothing. It took me 15 minutes to realize that all the divs are initialy empty, and JavaScript injects them when the page loads. And when we use request, it just downloads HTML, without running JS. Lesson learned
@JeroenTrappers 6 ปีที่แล้ว
Good video. Personally, i like using node with dom module and write css queries to extract what i want.
@magicyvan 2 ปีที่แล้ว
loved it ! Efficiency and very clear for a beginner. Would be great to have the login part, and why not sending the extraction into a csv file ;) I subscribe ;)
@LarsHolmVV46 4 ปีที่แล้ว
That was beautiful not to say absolutely excellent. Man ,,,,,
@Viruhemanth 5 ปีที่แล้ว
carefully he's a hero
@kylemichaelreaves 4 ปีที่แล้ว
Super helpful, thank you.
@SL3APYH3AD11 5 ปีที่แล้ว
When I print data, i just get an empty list. HELP
@reneepaz8077 6 ปีที่แล้ว
Hi love your channel, would like some advise, I want to scrape dat from a table that is dynamically generated by a website based on a user input, i.e. not static.
The website does not have a downloadable pricelist csv file for their products, so what it does is based on the criteria that I enter it will generate a table in html format, also due to the massive amount of data available, the table has multiple pages. All I want from the table is get the UPC number and the price of the items so that I could use that data into my product analyzer software.
@DevastaingDj 6 ปีที่แล้ว
Awesome! Kudos! Very helpful. Thanks man!
@ozoikeobinna8116 6 ปีที่แล้ว
I was looking for a software to scrape emails from website and ended up here. I don't even know python and which software you were using. Where do i start now ?
@nicememe999 6 ปีที่แล้ว
Yes! A great tutorial on web scraping! Now I got some ideas on some websites I could scrape for data...
What kind of real-world applications could this be used for? With websites providing APIs with the data nicely packaged in JSON format, it seems like getting data via APIs seems to be the better (or at least the most common way) to do this. Are there any situations where web scraping would be better?
@impossible441 6 ปีที่แล้ว
I guess that any kind of scientific literature databeses use webscraping (i.e. google scholar)
@EluviumMC 6 ปีที่แล้ว
Webscraping should be a last resort. Getting data via an API is much better.

ต่อไป

เล่นอัตโนมัติ

Web Scraping using Beautiful Soup - Python Web Scraping for Beginners