Hi Johan 😀 You'll need to tackle this with a conditional statement, where videos would be saved under the ".mp4" extension and images under the ".png" extension mentioned in the end of the video. Let me know if you were able to figure it out! 😊if not - I can film a quick tutorial showing how to do it 😉
@@PythonSimplified please do! Also, how many images should I expect in my folder? I get a TypeError on the last for loop. TypeError: cannot use a string pattern on a bytes-like object I'm guessing its because of video format.
@@PythonSimplified Hi!! Congrats for the perfect content!! I've spent my day studying for a master data science project and you've been helping me a lot :) I had the same problem today with profiles with some specific photo types, videos and reels.. I couldn't save an image and I got the same error mentioned above.. Could you help us please? Thanks!! ;)
@@paulameneses2306 thank you so much dear! 😁 Sure, I'll look into it over the weekend and adjust the code to include a conditional statement for videos 😉 we'll be in touch!
@@TheJohanHalim Yes, you are absolutely correct Johan!😊 You get this type of error when trying to save a collection of images (video) as a single .png or .jpeg image, it's due to an incorrect format. The amount of images you should expect differs from one computer to another, depending on the size/scale of the display. The code in this tutorial would get you the number of images that results in a single scroll event. And as Instagram uses a dynamic language - the more you scroll, the more images are loaded to the page. If you'd like to include several scroll events - checkout my community post, where I include additional resources, a detailed article and code examples on how you can expend this bot: th-cam.com/users/postUgwVQazZhNNqwdghhdh4AaABCQ I'll get back to you after the weekend with a solution to your video question 😉
by the way the whole thing of her in the right side of the screen is planned out, she has her hair there to hide that she's wearing something hoping to appear nude so people will click and it will go viral or something 😂 Wish her the best of luck though! No hate! 😘
This content is really great. Thank you for sharing it. Years ago I used to do web scraping back when there was a lot less JS and interactivity but haven't done it in a long time. This video got me back into it. Keep it up!
Thank you for this tutorial!. I am currently learning python on datacamp and haven't learned or seen any real world applications. I am definitely going to try this out and add what I learned from this video to my skillset.!
Thank you so much for explaining and showing every basic steps in details ! Lots of beginners like me get stuck in setup steps that can seem obvious to experienced developpers. For exemple thank you for explaining and showing all the download and setup Chrome driver steps. Even on some big websites pass quickly these basics setup steps. I was stuck but thanks to you I made it ! Thanks again !!!
Thank you so much Faizmohammad! 😁 I've just posted a brand new video on Selenium, this time webs craping Facebook: th-cam.com/video/SsXcyoevkV0/w-d-xo.html It's some sort of a series! just with a bunch videos on other subjects in between 😄
Cool vid Maria! But on the days there isn't a "not now" button, the whole code grinds to a holt. To know how to add a function where if the "not now" button is present then code clicks on it and if it isn't present then the code skips to the step would be awesome!!
@@davidliu7246 thanks, I found it a while ago using try and except. now i just need help creating a chatbot. At the moment using nlp, spacy, textblob and a ton of for-loops
Great content! Very clear and useful. Btw you don't need to add the local path of the webdriver as long as you have it in your Environment PATH. It looks over there by default. Also, by the end of the video you can get rid of the counter variable if you use enumerate.
Wow, thank you Yaniv!! This is fantastic - we can save it there once and never worry about it again! !!👏👏👏 I'll just sit down in shame and be impressed with your super-efficient coding skills 😂😂 אגב!! אני ממש שמחה לראות שחברה׳ ישראלים מצטרפים לחגיגה, ועוד עם כאלה עצות נדירות!! תודה רבה יניב, שיחקת אותה! 😀
You explained Selenium very clear. Can you also explain in a video on how to prevent to be detected as a bot? I read many post on stack overflow but Selenium still got detected as a bot, even on the first page load.
Tutorial was very helpful, but I did run into that same issue you mentioned about having to hit enter more than once in the search bar. Even with multiple instances of the send_keys / ENTER command, that part wouldn't work. What I decided to do was call time.sleep(2) a couple of times between hitting ENTER, and it took a total of three instances of hitting ENTER to get by. Even then, the file grab was happening too quickly, and it took the thumbnails of the people in my Instagram stories... so I called for one more time.sleep to give the next page time to load, and it worked! Your tutorials are great - I just found them and appreciate the concise, helpful videos!
Hi Matt, thank you so much for your amazing feedback! 😁 You totally nailed it with the time.sleep() command! I actually just published a tutorial on Medium that tackles it, as many people were running into the exact same issue with the ENTER command (and would be very irresponsible of me to ignore that hahaha): medium.com/analytics-vidhya/web-scraping-instagram-with-selenium-b6b1f27b885 Or you can skip the detailed explanations and just check out the source code on GitHub (I will update it in the description of the video shortly): github.com/MariyaSha/WebscrapingInstagram/blob/main/WebscrapingInstagram_completeNotebook.ipynb So my solution was quite similar, however, I've done this through 2 ENTER presses and 3 time.sleep(5) waits, so maybe try to extend the wait for a bit more than 2 seconds and then you can get rid of the third ENTER command :) Thank again and have a fantastic week!
@@PythonSimplified that's actually really exciting that I came to the same conclusion you did. Even better still that you found a way to cut down on clicks. Thanks again for the tutorials!
Very good tutorial. You could use in for loop, the enumerate to avoid the counter assignment. for counter, image in enumerate (images): save_as = os.[path.join(path, keyword[1:] + str(counter) + ‘.jpg’) wget.download(image, save_as)
Hello ,I really like you work, just started to do some web scrapping and you tutorial was of a great help for me , you are organized, your explanation is perfect clear and easy to follow, just one thing I noticed and I already fixed but I wanna know if there is other way around. the problem is the search box don't appear if the chrome screen size is big so we just get the side bar with the search symbol which need to be pressed to open the search box, which I tried to figure out how to make it but couldn't. so I just added a screen size (driver.set_window_size(740,500)) to make sure the search box will appear automatically. If you know how to fix it the normal way that would be nice, Thank you
Great video. Thank you so much. Mariysha you are a very good teacher. I hope that you can learn the correct accent on the specific word ( `at-tri-bute ), emphasis on the first syllable, because it is so often when speaking about Python and objects. You routinely mistakenly swap or mispronounce the verb attribute ( at-`trib-ute ) and the noun attribute ( `at-tri-bute ). I understand that you are not a native English speaker, and please try to take note of this and improve the accent on this word. In Python a class or object `at-tri-bute is a noun, aka "method" or bound function. Great video. Keep cranking them out.
Jupyter Notebook is a very handy interface, where we can process our code cell by cell. It is very similar to Google Colab, and we can access it directly from our Anaconda terminal with a very basic command - "jupyter notebook". If you wanna find out more about this interface, please check out one of my first tutorials, where I explain all about it in detail: th-cam.com/video/jp_3NOKHn9c/w-d-xo.html So generally, when you're using a traditional IDE, it runs all your code at once, while notebooks like Jupyter allow you to separate your program into sections and run each of these sections independently. I find it to be very handy when teaching/learning, and I highly recommend to give it a try 😉
Thank you Leonardo! 😀 I used my_input.clear() to make sure the input is empty before we type anything in, it's not a necessary step with Instagram but it's very handy when sending your keys to an input which already has text in it 😊 It will remove the existing text and only include the string of your following send_keys command
Thank you for the video. Very clear and straight forward. Im having trouble with the searchbox.send_keys(Keys.ENTER) command. I tried ENTERing twice but still doesnt work. Any sugestions?
Thank you so much Luisgui 😁 Try the code I've just posted on my Github, you can solve it with time.sleep(seconds), it's in the "search keywords" section: github.com/MariyaSha/WebscrapingInstagram/blob/main/WebscrapingInstagram_completeNotebook.ipynb I actually just finished working on a Medium article about this where I explain everything in detail, I'm just waiting for my publication to approve it and then I'll send you a link 😉
Thanks for this info. I am able to download only 58 images at one go, is it possible to download all images around 2000 with this at one go ? Can you please advise.
Hi Pranshu! yes, you are able to download as many images as you need! 😀Just implement the scroll event inside a loop, just like this (scrolling 10 times to the bottom of the batch): for j in range(0,10): driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(5) //wait 5 seconds The more you loop, the more images you scrape! I have a video premiering soon where I explain it in detail, check out "scroll to the bottom of the page" in the timestamps if you miss it 😉
This video is great thanks but i haven't been able to do the first step of opening chrome and Instagram. Maria, can you please do a video explaining what you do before, with Selenium. The parts of importing etc.... I downloaded selenium but still get a traceback.
Hi Andres! :) I find that "pip install" doesn't always work, so I install everything with "conda install" when possible. If you are also using Anaconda, try: "conda install -c conda-forge selenium" If you are using another command line software let me know what it is and what kind of traceback are you getting and we can go from there 😉
@@PythonSimplified Thanks for your help! I am still struggling. I am using Selenium and here I detailed everything and still no solution www.reddit.com/r/learnpython/comments/kfn0wm/a_unique_no_module_named_selenium_problem_its_not/
Mariya, thank you so much for the tutorial! What would you suggest to those who actually needs to delve deeper and get the data from each post (likes, reposts, account name and so on)?
Hi Alexandra! you're welcome! 😁 First of all, the most important suggestion is creating a "dummie account" (one that is not associated with your actual account). Since you're about to scrape so much information - it's best to avoid being detected as a bot (which sometimes may result in your account being deleted). Second, check out my community post regarding the new and improved resources for Instagram scraping (a commenting bot is included there, so this will save you lots of time): th-cam.com/users/postUgwVQazZhNNqwdghhdh4AaABCQ Third, you'll need to understand the common blockers you'll be facing when traversing the DOM. I have 3 of the main and most annoying blockers apprehended in my Linkedin automation tutorial, so have a quick look before you spend hours trying to interact with hidden elements or dealing with "click intercept" exceptions: th-cam.com/video/XdFUpFUDt88/w-d-xo.html Lastly - work very closely with the developer tools! The rule of thumb is - if you can do something with your keyboard and mouse - you can do it with Selenium as well! No matter which exceptions you're facing, you can always bypass them as long as you can perform the same action manually! Good luck and let me know if it helped! 😀
Great video. It'd be cool to see one on Insta scraping using GET requests instead of Selenium, it's much faster. There's a good article on Diggernaut about it. Anyway, thanks, keep em coming!
@@PythonSimplified Love to see how you go with it! I've been stuck on it for the last 12 hours :'( I'm getting different static HTML returned from requests.get() than shown in the Chrome Dev Tools. Great channel btw, looking forward to more content!
@@chris_burrows Thank you for suggesting! I'll check it out :) I'll start advertising properly sometime in the near future. For now I take it easy, trying to focus on improving my filming/editing abilities before I go down that road :D
Great video! Just a query, will this go against the current terms and conditions of Instagram which might lead to your personal account getting banned for obvious reasons?
Hi Geoffey, check out this vlog of mine, I talk about this in detail: th-cam.com/video/f0B6RdVGcM8/w-d-xo.html Always scrape from a dummy account! Not your personal one! :)
Thank you so much Saqib! 😀 I have a new Selenium tutorial premiering in 35 minutes: th-cam.com/video/TXdgMkf9gP0/w-d-xo.html We're expanding the Linkedin messaging bot to seem much more human than it should be, I highly recommend to check it out! 😁
very nice tutorial I must say its easy to follow and thanks for all your hard work showing all of us, Just a question could you use a mixture of selenium and beutifullsoup to automate the login and parse the html? just a thought because it all works but the page loads takes forever
@@PythonSimplified so glad to hear back from you. I have a little query regarding scraping. And its getting horrendous. I want to scrape some details from the *info* section of the profiles in a group. Is it doable? Your response will be like a lifesaver
@@smalirizvi8026 the rule of thumb is - anything you can target with your mouse and keyboard - you can target with Selenium! 😃 Some platforms have more blockers than others, I suggest watching my Linkedin Bot tutorial where I show how to handle 3 very annoying blockers such as "click intercept" and hidden elements: th-cam.com/video/XdFUpFUDt88/w-d-xo.html If you run into specific errors let me know, but as long as you're working closely with the Developer Tools and allow enough time for the elements to load with time.sleep() - the sky is the limit! 😁
@@PythonSimplified I watched your vid. It was far above my knowledge and expertise. Actually I am trying to extract every profile's information in some groups on *Facebook* for my Machine Learning work. So far, I think that I should watch your video of applying selenium on Facebook. And I will convert python code with java whenever I face some blockers there as well as use time.sleep(). Do u think that this should be fine?
You have literally earned my sub! 💎💎 Thank you so much for replying and taking notice of my problem. Looking forward to you as I am pretty much done with my attempts with it
Thanks for the lesson, very appreciated, my beginner's question would be, how to download to csv descriptions and publication dates of Instagram posts?
If you're using a notebook interface (such as Jupyter) - you can run the login cell only once and then keep the webdriver browser window open. You'll be logged in up until the point you close the browser, so you can keep running other cells as much as you'd like! Your browser will not forget you've already enters the username/password 😉 However, if you use a code editor of some sort and run your code via terminal - you'll need to log in time and again, as the entire code of your program must be executed - which is quite annoying 😔 That's why I love using Jupyter for Selenium stuff! you can run each the code cell by cell - rather than the entire code all at once, and it definitely saves lots of time and effort! 😀
If you use running code from terminal I would recommend to save cookies to JSON file (google for "webdriver store cookies to json") and restore these cookies every time you need them again (Instagram session is good for quite a long time). Me personally I started using instalyze.me scraper service after spending weeks and weeks on managing my own scraping solution on python, it's just too much hassle due to instagram anti-scraping techniques
@@PythonSimplified Yeah for sure, it can be quite annoying and convenient to use Jupyter in that regards. I ended up downloading cookies and having the script upload cookies at the start so it can immitate being continuously logged in when I open a new driver window.
These days, a bot like that is equivalent to winning the lottery!!! 🤣 I would probably hold off with sharing the code until at least I get my own PS5 hahaha (I'm dying to play Cyberpunk!!! and it's not the same game on PS4, it's more buggy than Goat Simulator!!) But from what I understand though, the stock runs out in-store before the websites get to update it, so your best bet is to know a guy in Best Buy and get him to put it aside for you when the stock arrives 😎 These bot people made it so much harder for regular folks to get new products, I strongly oppose this type of practice, I think it brings more harm than benefit.... but maybe I'm just old-fashioned 😊
@@PythonSimplified nicely put. I hate the scalping thing, however, since you basically need their technology to compete with them, I decided to make my own - just started running best buy, target, and gamestop bots. Will report back if it actually gets through! I just wanted to help a friend who's looking for one, rather than be a scalper grinch. Merry Christmas to you and fellow coders!
I'd like to say that I loved it you're amazing and please keep on it, I'm happy too because English isn't my mother language and I understood you very well 😊.
Thank you Vinicius! I'm so happy to hear that! 😁😁😁 English isn't my first language either, so I'm always trying to use simple words whenever possible (the complicated words are also much harder to pronounce, I sound very Russian when I do this hahahaha) Thank again and Merry Christmas!! 😊
Thank you! 😁 It's also available on Medium with a few improvements: medium.com/analytics-vidhya/web-scraping-instagram-with-selenium-b6b1f27b885 I'm also currently working on a website, where I'll post even a more updated version of the Medium article, where we'll be able to scrape the full-size images rather than thumbnails, and tackle more issues with the ENTER button 😀 Stay tuned!
I can see this being more useful scrapping general data, such as QTY of followers, following, and QTY of posts, for people doing marketing, that would be useful. Maybe build a script that scrapes that data on a set frequency of time and dump in on a spreadsheet, so you can measure growth over time.
So I've been using Selenium for about 2 years now, one thing I know how to do but I've always had a hard time with was cookie manipulation. Do you think you could do a video in which you show us how to save/store and reload cookies, maybe even do a selenium launch with another profile! That would be super cool, would love you to cover that!
Hi Aaron, thank you for the super cool suggestion! 😊 What kind of purpose would such a tool have? is it used for tracking the web sessions of a particular user? I just can't seem to imagine a good intention behind tracking/storing cookies, but let me know!... If it's harmless I'd give it a try! 😃
@@PythonSimplified Its mostly harmless but, imagine you need to run a test 5000 times on a website and each time you have to login to the page. Or you login 1 time save the cookies and just load them every-time you go to the page. It also makes selenium look more human, so less chances of a captcha. It's pretty easy to load your chrome profile into a selenium driver but just using the cookies has always been difficult for me haha.
Hi Anna, there's actually a better way than double enter! Check out my community post where I included the improved code, it's better to concatenate the url to search for your term 😉 And thank you so much! 😃
Hello thank you so much this helps me a lot, while following this code I had one error, instead of writing ENTER you can use RETURN so it will work properly. This solve my error.
Hi Vijaykumar! 😃 to my understanding, the value 10 in the above example represents the timeout value (in seconds). Meaning - if this element was not detected within 10 seconds, you'll get a TimeoutException. You can definitely adjust it to any value you'd like! Selenium documentation keeps it at 10, but I've seen other examples with 5 and 15 seconds, it all depends on how long it takes the given page to load 😊
Hey Mariyasha! First of all, this was an amazing tutorial. I was trying to do some scraping with the posts that open up in the explorer page in the Instagram desktop website. It seems that the elements for these posts are created in the DOM when they open after you click on them and are destroyed from the DOM when you close them. Hence, it is difficult to locate these elements and I've been getting the "No Such Element: Unable to locate element" exception. Is there a workaround for this situation in Selenium?
great stuff, as always, question: it`s difficult to choose with wich to start my first scraping (mechanical soup, butifull soup, selenium ...) goal is to automate txt messaging using a service that is behind my personnal account on internet and behind user:password authentification. Suggestion ?
Personally, I'd always go for Selenium! Beautiful and Mechanical Soup are only good for very simple XML sites... we rarely seem to encounter these in recent years 😉
what happened if the post is not an image? IG lets you upload videos.
Hi Johan 😀
You'll need to tackle this with a conditional statement, where videos would be saved under the ".mp4" extension and images under the ".png" extension mentioned in the end of the video.
Let me know if you were able to figure it out! 😊if not - I can film a quick tutorial showing how to do it 😉
@@PythonSimplified please do! Also, how many images should I expect in my folder? I get a TypeError on the last for loop. TypeError: cannot use a string pattern on a bytes-like object
I'm guessing its because of video format.
@@PythonSimplified Hi!! Congrats for the perfect content!! I've spent my day studying for a master data science project and you've been helping me a lot :)
I had the same problem today with profiles with some specific photo types, videos and reels.. I couldn't save an image and I got the same error mentioned above..
Could you help us please?
Thanks!! ;)
@@paulameneses2306 thank you so much dear! 😁 Sure, I'll look into it over the weekend and adjust the code to include a conditional statement for videos 😉 we'll be in touch!
@@TheJohanHalim Yes, you are absolutely correct Johan!😊
You get this type of error when trying to save a collection of images (video) as a single .png or .jpeg image, it's due to an incorrect format.
The amount of images you should expect differs from one computer to another, depending on the size/scale of the display. The code in this tutorial would get you the number of images that results in a single scroll event.
And as Instagram uses a dynamic language - the more you scroll, the more images are loaded to the page. If you'd like to include several scroll events - checkout my community post, where I include additional resources, a detailed article and code examples on how you can expend this bot: th-cam.com/users/postUgwVQazZhNNqwdghhdh4AaABCQ
I'll get back to you after the weekend with a solution to your video question 😉
I never imagined that python learning could have this much glamour.
hahaha
Very good comment
it was either fake Gamer Girl, OnlyFans, or this. But there is a lot of competition in those other areas so she went with this.
by the way the whole thing of her in the right side of the screen is planned out, she has her hair there to hide that she's wearing something hoping to appear nude so people will click and it will go viral or something 😂 Wish her the best of luck though! No hate! 😘
@@johnames6430 What a hater.
Me saw the thumbnail and click it
Me (after 10 mins) : ooh! It's a programming tutorial
hahahaha indeed! 🤣
lol 😂
I love your style of knowledge sharing. You made it simple enough to understand by someone like me who is just beginning to learn python. Thanks!
This content is really great. Thank you for sharing it. Years ago I used to do web scraping back when there was a lot less JS and interactivity but haven't done it in a long time. This video got me back into it. Keep it up!
Compliment from a fellow girl coder, this video was super informative and entertaining and you are obviously bright and talented!
i don't know why i am watching it instead of listening to music but the way she teaches is real fun!!
Thank you for this tutorial!. I am currently learning python on datacamp and haven't learned or seen any real world applications. I am definitely going to try this out and add what I learned from this video to my skillset.!
Thank you so much for explaining and showing every basic steps in details !
Lots of beginners like me get stuck in setup steps that can seem obvious to experienced developpers.
For exemple thank you for explaining and showing all the download and setup Chrome driver steps.
Even on some big websites pass quickly these basics setup steps.
I was stuck but thanks to you I made it ! Thanks again !!!
*Lol , I Almost Forgot I came here to Learn Python! haha Stunning Looks!*
Thank you! 😆 I may have went a bit overboard on this video XD
@@PythonSimplified
Super hot supet smart :))
@@PythonSimplified Well, I think you were overdressed...
this and arjancodes are by far my favorite channels!
Very difficult. She's a world-class beauty, yet providing world class teaching on a very important topic.
I was wandering why don't you make full series of Selenium. Your Teaching is like Perfect :) I literally enjoyed a lot. Thankyou so much .
Thank you so much Faizmohammad! 😁
I've just posted a brand new video on Selenium, this time webs craping Facebook:
th-cam.com/video/SsXcyoevkV0/w-d-xo.html
It's some sort of a series! just with a bunch videos on other subjects in between 😄
Cool vid Maria! But on the days there isn't a "not now" button, the whole code grinds to a holt. To know how to add a function where if the "not now" button is present then code clicks on it and if it isn't present then the code skips to the step would be awesome!!
Omg please!!🥹
add a timeout to when waiting for "not now" to appear, and then put it in a try/catch block.
@@davidliu7246 thanks, I found it a while ago using try and except. now i just need help creating a chatbot. At the moment using nlp, spacy, textblob and a ton of for-loops
This channel is underrated, Change my mind!
Great content! Very clear and useful.
Btw you don't need to add the local path of the webdriver as long as you have it in your Environment PATH. It looks over there by default.
Also, by the end of the video you can get rid of the counter variable if you use enumerate.
Wow, thank you Yaniv!! This is fantastic - we can save it there once and never worry about it again! !!👏👏👏
I'll just sit down in shame and be impressed with your super-efficient coding skills 😂😂
אגב!! אני ממש שמחה לראות שחברה׳ ישראלים מצטרפים לחגיגה, ועוד עם כאלה עצות נדירות!! תודה רבה יניב, שיחקת אותה! 😀
@@PythonSimplified חחח ממש לא ציפיתי תשובה בעברית! אבל באמת תודה על הסרטון זה עזר לי להבין הרבה דברים. הPATH היה סתם משהו קטן. תמשיכי כך!
@@piriwo תודה רבה, will do! :)
חשבתי אני הישראלי היחיד פה😅
learned some good web scraping practices here like waiting for elements to be clickable, clearing the input boxes, etc. thanks!
You are the most intelligent and beautiful teacher of all.
You explained Selenium very clear. Can you also explain in a video on how to prevent to be detected as a bot? I read many post on stack overflow but Selenium still got detected as a bot, even on the first page load.
Mariya, I love you. Thanks, you gave me what I was looking for since 3 days
OMG you're the best ! I was hitting my head on the wall. In reality you showed it in a way simple way. Thank you
Tutorial was very helpful, but I did run into that same issue you mentioned about having to hit enter more than once in the search bar. Even with multiple instances of the send_keys / ENTER command, that part wouldn't work.
What I decided to do was call time.sleep(2) a couple of times between hitting ENTER, and it took a total of three instances of hitting ENTER to get by. Even then, the file grab was happening too quickly, and it took the thumbnails of the people in my Instagram stories... so I called for one more time.sleep to give the next page time to load, and it worked!
Your tutorials are great - I just found them and appreciate the concise, helpful videos!
Hi Matt, thank you so much for your amazing feedback! 😁
You totally nailed it with the time.sleep() command! I actually just published a tutorial on Medium that tackles it, as many people were running into the exact same issue with the ENTER command (and would be very irresponsible of me to ignore that hahaha):
medium.com/analytics-vidhya/web-scraping-instagram-with-selenium-b6b1f27b885
Or you can skip the detailed explanations and just check out the source code on GitHub (I will update it in the description of the video shortly):
github.com/MariyaSha/WebscrapingInstagram/blob/main/WebscrapingInstagram_completeNotebook.ipynb
So my solution was quite similar, however, I've done this through 2 ENTER presses and 3 time.sleep(5) waits, so maybe try to extend the wait for a bit more than 2 seconds and then you can get rid of the third ENTER command :)
Thank again and have a fantastic week!
@@PythonSimplified that's actually really exciting that I came to the same conclusion you did. Even better still that you found a way to cut down on clicks. Thanks again for the tutorials!
Very good tutorial. You could use in for loop, the enumerate to avoid the counter assignment.
for counter, image in enumerate (images):
save_as = os.[path.join(path, keyword[1:] + str(counter) + ‘.jpg’)
wget.download(image, save_as)
Perfect intro to Selenium! Very nice video! Thanks again Mariya!
Thank you Chiranjeeb! I told you you gonna like Selenium! 😁
@@PythonSimplified Yep! You were correct!
I thought I didn't know English but now I think I do. Incredible articulation!!!
thank you soo much you just saved me! im gonna rock that interview
That's awesome to hear Arthur! Good luck on your interview! 😀
with yours videos I`ve been deployed my first flask app
Hello ,I really like you work, just started to do some web scrapping and you tutorial was of a great help for me , you are organized, your explanation is perfect clear and easy to follow, just one thing I noticed and I already fixed but I wanna know if there is other way around.
the problem is the search box don't appear if the chrome screen size is big so we just get the side bar with the search symbol which need to be pressed to open the search box, which I tried to figure out how to make it but couldn't.
so I just added a screen size (driver.set_window_size(740,500)) to make sure the search box will appear automatically.
If you know how to fix it the normal way that would be nice, Thank you
She saved that "Purrrfect" for this moment.
Попав на эту страницу, я сперва подумал что ошибся... Умные девушки восхитительны!
Спасибочки Симург! 😃
А я просто в восторге, от кудо все знают что я по-русски говору??
Наверно наши наших везде узнают! 🤣
@@PythonSimplified Да!, это магия xD
Great video. Thank you so much. Mariysha you are a very good teacher.
I hope that you can learn the correct accent on the specific word ( `at-tri-bute ), emphasis on the first syllable, because it is so often when speaking about Python and objects. You routinely mistakenly swap or mispronounce the verb attribute ( at-`trib-ute ) and the noun attribute ( `at-tri-bute ). I understand that you are not a native English speaker, and please try to take note of this and improve the accent on this word.
In Python a class or object `at-tri-bute is a noun, aka "method" or bound function.
Great video. Keep cranking them out.
This gender is always so organized.!!! A good session it was with so much clarity.
Thank you, I'm glad you found it helpful! 😃
i started knowing why i am actually watching your video after the introduction :)
wow thats amazing Mariyasha, i like your way of teaching, its very helpful for me. Both are in same boat upcoming future data scientist
Thank you so much Vikram, I'm glad I could help! 😄
Best video seen about web scraping and automation. But i really couldn't figure out this jupiter thing. What is jupiter?????
Jupyter Notebook is a very handy interface, where we can process our code cell by cell. It is very similar to Google Colab, and we can access it directly from our Anaconda terminal with a very basic command - "jupyter notebook".
If you wanna find out more about this interface, please check out one of my first tutorials, where I explain all about it in detail:
th-cam.com/video/jp_3NOKHn9c/w-d-xo.html
So generally, when you're using a traditional IDE, it runs all your code at once, while notebooks like Jupyter allow you to separate your program into sections and run each of these sections independently. I find it to be very handy when teaching/learning, and I highly recommend to give it a try 😉
And thank you!! 😁😁😁
(sorry, I should have started from this but I got carried away with the explanation hahahaha)
studying with you is a excitement
GENIUS!!!!! I really liked your video, you were able to solve the concerns I had and no one else could solve.
THE BEST!!!
Hello Mariya. Congratulation for video, I liked very much. A question: why you used .clear()?
Thank you Leonardo! 😀
I used my_input.clear() to make sure the input is empty before we type anything in, it's not a necessary step with Instagram but it's very handy when sending your keys to an input which already has text in it 😊
It will remove the existing text and only include the string of your following send_keys command
@@PythonSimplified Tanks Mariya.
Thank you for the video. Very clear and straight forward. Im having trouble with the searchbox.send_keys(Keys.ENTER) command. I tried ENTERing twice but still doesnt work. Any sugestions?
Thank you so much Luisgui 😁
Try the code I've just posted on my Github, you can solve it with time.sleep(seconds), it's in the "search keywords" section:
github.com/MariyaSha/WebscrapingInstagram/blob/main/WebscrapingInstagram_completeNotebook.ipynb
I actually just finished working on a Medium article about this where I explain everything in detail, I'm just waiting for my publication to approve it and then I'll send you a link 😉
This is the coolest thing I have learnt today
Awesome! I'm glad I could help! 😁
Thanks for this info. I am able to download only 58 images at one go, is it possible to download all images around 2000 with this at one go ? Can you please advise.
Hi Pranshu! yes, you are able to download as many images as you need! 😀Just implement the scroll event inside a loop, just like this (scrolling 10 times to the bottom of the batch):
for j in range(0,10):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5) //wait 5 seconds
The more you loop, the more images you scrape!
I have a video premiering soon where I explain it in detail, check out "scroll to the bottom of the page" in the timestamps if you miss it 😉
This video is great thanks but i haven't been able to do the first step of opening chrome and Instagram. Maria, can you please do a video explaining what you do before, with Selenium. The parts of importing etc.... I downloaded selenium but still get a traceback.
Hi Andres! :)
I find that "pip install" doesn't always work, so I install everything with "conda install" when possible. If you are also using Anaconda, try:
"conda install -c conda-forge selenium"
If you are using another command line software let me know what it is and what kind of traceback are you getting and we can go from there 😉
@@PythonSimplified Thanks for your help! I am still struggling. I am using Selenium and here I detailed everything and still no solution www.reddit.com/r/learnpython/comments/kfn0wm/a_unique_no_module_named_selenium_problem_its_not/
Mariya, thank you so much for the tutorial! What would you suggest to those who actually needs to delve deeper and get the data from each post (likes, reposts, account name and so on)?
Hi Alexandra! you're welcome! 😁
First of all, the most important suggestion is creating a "dummie account" (one that is not associated with your actual account).
Since you're about to scrape so much information - it's best to avoid being detected as a bot (which sometimes may result in your account being deleted).
Second, check out my community post regarding the new and improved resources for Instagram scraping (a commenting bot is included there, so this will save you lots of time):
th-cam.com/users/postUgwVQazZhNNqwdghhdh4AaABCQ
Third, you'll need to understand the common blockers you'll be facing when traversing the DOM. I have 3 of the main and most annoying blockers apprehended in my Linkedin automation tutorial, so have a quick look before you spend hours trying to interact with hidden elements or dealing with "click intercept" exceptions:
th-cam.com/video/XdFUpFUDt88/w-d-xo.html
Lastly - work very closely with the developer tools! The rule of thumb is - if you can do something with your keyboard and mouse - you can do it with Selenium as well! No matter which exceptions you're facing, you can always bypass them as long as you can perform the same action manually!
Good luck and let me know if it helped! 😀
@@PythonSimplified it helped a bit! Thank you :)
Great video. It'd be cool to see one on Insta scraping using GET requests instead of Selenium, it's much faster. There's a good article on Diggernaut about it. Anyway, thanks, keep em coming!
Challenge accepted!! 😎
Get requests would be the next module I'll cover in the scraping lessons!
Thank you Chris! :)
@@PythonSimplified Love to see how you go with it! I've been stuck on it for the last 12 hours :'( I'm getting different static HTML returned from requests.get() than shown in the Chrome Dev Tools. Great channel btw, looking forward to more content!
also, you should make a Discord, it's a great way to consolidate a community and seems like you're building one quickly.
@@chris_burrows Thank you for suggesting! I'll check it out :)
I'll start advertising properly sometime in the near future. For now I take it easy, trying to focus on improving my filming/editing abilities before I go down that road :D
The most beautiful women in TH-cam who teaches python.
Can't concentrate 😂
I thought the same :D
u dont need to scrap data u actually scrapped my heart
When I see and listen this girl I'm fell the happiness...
Great video! Just a query, will this go against the current terms and conditions of Instagram which might lead to your personal account getting banned for obvious reasons?
Hi Geoffey, check out this vlog of mine, I talk about this in detail: th-cam.com/video/f0B6RdVGcM8/w-d-xo.html
Always scrape from a dummy account! Not your personal one! :)
you can make web-scrapping look and sound fun
This tutorial is so sweet like you. Thank you so much Mariya ❤️
Thank you so much Saqib! 😀
I have a new Selenium tutorial premiering in 35 minutes:
th-cam.com/video/TXdgMkf9gP0/w-d-xo.html
We're expanding the Linkedin messaging bot to seem much more human than it should be, I highly recommend to check it out! 😁
Thanks for the video ... super nice ... i am struggling scraping comments from instagram using selenium ... any video on that ?
Incredibly great explanations 🔥🔥. loved the video.
She is the reason why i programm in Python
Waao.Very helpful 🙏🙏❤️More videos like these please 😊
very nice tutorial I must say its easy to follow and thanks for all your hard work showing all of us, Just a question could you use a mixture of selenium and beutifullsoup to automate the login and parse the html? just a thought because it all works but the page loads takes forever
All boys' fav teacher!
99.9% of programmers are boys anyways! 😃
@@PythonSimplified so glad to hear back from you. I have a little query regarding scraping. And its getting horrendous.
I want to scrape some details from the *info* section of the profiles in a group.
Is it doable?
Your response will be like a lifesaver
@@smalirizvi8026 the rule of thumb is - anything you can target with your mouse and keyboard - you can target with Selenium! 😃
Some platforms have more blockers than others, I suggest watching my Linkedin Bot tutorial where I show how to handle 3 very annoying blockers such as "click intercept" and hidden elements:
th-cam.com/video/XdFUpFUDt88/w-d-xo.html
If you run into specific errors let me know, but as long as you're working closely with the Developer Tools and allow enough time for the elements to load with time.sleep() - the sky is the limit! 😁
@@PythonSimplified I watched your vid. It was far above my knowledge and expertise.
Actually I am trying to extract every profile's information in some groups on *Facebook* for my Machine Learning work.
So far, I think that I should watch your video of applying selenium on Facebook. And I will convert python code with java whenever I face some blockers there as well as use time.sleep().
Do u think that this should be fine?
You have literally earned my sub!
💎💎
Thank you so much for replying and taking notice of my problem.
Looking forward to you as I am pretty much done with my attempts with it
Thanks for the lesson, very appreciated, my beginner's question would be, how to download to csv descriptions and publication dates of Instagram posts?
oh..sorry I just saw the description of the video, where is the link to the webinar))
You should check if the "Not Now" prompts are even presented before assuming you will need to click that... twice.
Excellent class.
I need to do something more.
Do you offer consulting?
Thanks
I particularly like the background music. Great tutorial!
This is awesome.
For webdriver, is it possible to keep Instagram logged in? Or will I have to run the script to keep logging it in each time.
If you're using a notebook interface (such as Jupyter) - you can run the login cell only once and then keep the webdriver browser window open. You'll be logged in up until the point you close the browser, so you can keep running other cells as much as you'd like! Your browser will not forget you've already enters the username/password 😉
However, if you use a code editor of some sort and run your code via terminal - you'll need to log in time and again, as the entire code of your program must be executed - which is quite annoying 😔
That's why I love using Jupyter for Selenium stuff! you can run each the code cell by cell - rather than the entire code all at once, and it definitely saves lots of time and effort! 😀
If you use running code from terminal I would recommend to save cookies to JSON file (google for "webdriver store cookies to json") and restore these cookies every time you need them again (Instagram session is good for quite a long time). Me personally I started using instalyze.me scraper service after spending weeks and weeks on managing my own scraping solution on python, it's just too much hassle due to instagram anti-scraping techniques
@@PythonSimplified Yeah for sure, it can be quite annoying and convenient to use Jupyter in that regards. I ended up downloading cookies and having the script upload cookies at the start so it can immitate being continuously logged in when I open a new driver window.
@@jamesk6124 hey guys, how to save cookis ? Any codes ?
Thanks a lot for all these gifts from you.
You're welcome, enjoy! 😀
hi, thanks it really helps, but do you have any video on how to get all the post information?
You are amazing in every way! Thank you for this useful tutorial.
Great video. it's possible to get all the comments of a post??
You explain each step very clearly, thanks for your effort
I'm late with this suggestion, but if you come out with a "bot to buy ps5" webscrape video right now, it could be huge!
These days, a bot like that is equivalent to winning the lottery!!! 🤣
I would probably hold off with sharing the code until at least I get my own PS5 hahaha (I'm dying to play Cyberpunk!!! and it's not the same game on PS4, it's more buggy than Goat Simulator!!)
But from what I understand though, the stock runs out in-store before the websites get to update it, so your best bet is to know a guy in Best Buy and get him to put it aside for you when the stock arrives 😎
These bot people made it so much harder for regular folks to get new products, I strongly oppose this type of practice, I think it brings more harm than benefit.... but maybe I'm just old-fashioned 😊
@@PythonSimplified nicely put. I hate the scalping thing, however, since you basically need their technology to compete with them, I decided to make my own - just started running best buy, target, and gamestop bots. Will report back if it actually gets through! I just wanted to help a friend who's looking for one, rather than be a scalper grinch. Merry Christmas to you and fellow coders!
loved your style of teaching and Accent.
I'd like to say that I loved it you're amazing and please keep on it, I'm happy too because English isn't my mother language and I understood you very well 😊.
Thank you Vinicius! I'm so happy to hear that! 😁😁😁
English isn't my first language either, so I'm always trying to use simple words whenever possible (the complicated words are also much harder to pronounce, I sound very Russian when I do this hahahaha)
Thank again and Merry Christmas!! 😊
@@PythonSimplified no problem, where are you from?
Could you please make a video explaining how to understand boxplot charts?
Merry Christmas 🙂
You are so glamourous and after that the way you teach.
Nice webscraping methods
Great work
Thank you! 😁
It's also available on Medium with a few improvements:
medium.com/analytics-vidhya/web-scraping-instagram-with-selenium-b6b1f27b885
I'm also currently working on a website, where I'll post even a more updated version of the Medium article, where we'll be able to scrape the full-size images rather than thumbnails, and tackle more issues with the ENTER button 😀
Stay tuned!
😀ok
Great explanation mariya👍🏻
gurl this saved my life
Love the tut, you're a smart lady, no need to use gimmicks for clicks.
I agree but if that's what she likes and It will give her subs I dont see why not.
OMG!!! your channel is perfect , thanks for this class !!!!
Thank you so much Matheus!! Glad you liked it! 😁
This channel has super easy tutorials on how to do it: th-cam.com/channels/YvGiDV1JfJTpphxtKd7r_A.htmlfeatured
this is an amazing video! I would like to know if we can get a list of users who liked a post? thanks
I can see this being more useful scrapping general data, such as QTY of followers, following, and QTY of posts, for people doing marketing, that would be useful.
Maybe build a script that scrapes that data on a set frequency of time and dump in on a spreadsheet, so you can measure growth over time.
Solid tutorial! You're great at teaching!! Thanks
Thank you Adam, glad you liked it! 😀
So I've been using Selenium for about 2 years now, one thing I know how to do but I've always had a hard time with was cookie manipulation. Do you think you could do a video in which you show us how to save/store and reload cookies, maybe even do a selenium launch with another profile! That would be super cool, would love you to cover that!
Hi Aaron, thank you for the super cool suggestion! 😊
What kind of purpose would such a tool have? is it used for tracking the web sessions of a particular user?
I just can't seem to imagine a good intention behind tracking/storing cookies, but let me know!... If it's harmless I'd give it a try! 😃
@@PythonSimplified Its mostly harmless but, imagine you need to run a test 5000 times on a website and each time you have to login to the page. Or you login 1 time save the cookies and just load them every-time you go to the page. It also makes selenium look more human, so less chances of a captcha. It's pretty easy to load your chrome profile into a selenium driver but just using the cookies has always been difficult for me haha.
@@BitCloud047 challenge accepted! I'll look into that, sounds very useful! 😃
Why I didn't have such teacher in my university for c++, I wouldn't skip any lab :)
i want to be your student 😆 you are the best teacher i have ever seen
You are amazing)) really grateful for double enter tip
Hi Anna, there's actually a better way than double enter! Check out my community post where I included the improved code, it's better to concatenate the url to search for your term 😉
And thank you so much! 😃
@@PythonSimplified woohoo, thank you so much
you are the best Maria Sha .LOL
What I mostly learned is a very good workflow to get info and use it. tnx :D
Puurrrrfect pun (probably not intended) at 22:40 😆
Hello thank you so much this helps me a lot, while following this code I had one error, instead of writing ENTER you can use RETURN so it will work properly. This solve my error.
simply amazing, big thanks. love you
Thank you so much my friend! 😁
Wish I had known about this tool earlier, sounds very useful
Great tutorial, I always learn something new, thanks for sharing
Crystal clear and good pedagogy !
WebDriverWait(driver, 10)
What do we mean by 10. And is it constant or subject to our choice? How to decide it.
Hi Vijaykumar! 😃 to my understanding, the value 10 in the above example represents the timeout value (in seconds). Meaning - if this element was not detected within 10 seconds, you'll get a TimeoutException.
You can definitely adjust it to any value you'd like! Selenium documentation keeps it at 10, but I've seen other examples with 5 and 15 seconds, it all depends on how long it takes the given page to load 😊
Hey Mariyasha! First of all, this was an amazing tutorial. I was trying to do some scraping with the posts that open up in the explorer page in the Instagram desktop website. It seems that the elements for these posts are created in the DOM when they open after you click on them and are destroyed from the DOM when you close them. Hence, it is difficult to locate these elements and I've been getting the "No Such Element: Unable to locate element" exception. Is there a workaround for this situation in Selenium?
Stale element exception
Thanks, Ma'am for this... Helps too much
Too much is my favourite quantity! 😊 Thank you, V!
Very good. Congratulations!
Thank you David! 😁
Cypress rocks, you should try it.
Sending an " Amazing" from Brazil here. Amazing.
Thank you so much Marcos!! Greetings from Canada! 😁
great stuff, as always, question: it`s difficult to choose with wich to start my first scraping (mechanical soup, butifull soup, selenium ...) goal is to automate txt messaging using a service that is behind my personnal account on internet and behind user:password authentification. Suggestion ?
Personally, I'd always go for Selenium!
Beautiful and Mechanical Soup are only good for very simple XML sites... we rarely seem to encounter these in recent years 😉
This is amazing. Thank you very much.
I love your video so much! How do we scrape instgram influencers using this method?
is it possible to scrap info on who commented under certain posts ?love your video btw