Easy Web Scraping in Python using Pandas for Data Science

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 พ.ย. 2024

ความคิดเห็น • 123

  • @KenJee_ds
    @KenJee_ds 4 ปีที่แล้ว +42

    I didn't know about this pandas functionality! Great video!

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +8

      Wow, it's Ken Jee! Thanks for the comment and kind words! I also subscribe to your channel, great content by the way, especially the 6-part DS project from scratch series.

    • @KenJee_ds
      @KenJee_ds 4 ปีที่แล้ว +5

      @@DataProfessor Thanks! I am loving your stuff as well. I need to start using colab more. Keep up the good work, the tutorials are very helpful!

    • @karthiavenger4577
      @karthiavenger4577 4 ปีที่แล้ว +1

      You great bro Down to earth

  • @muhammadjamalahmed8664
    @muhammadjamalahmed8664 4 ปีที่แล้ว +7

    Please don't stop making videos. These videos really helps alot.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      Thank you, glad it was helpful!

  • @monicadesai7928
    @monicadesai7928 4 ปีที่แล้ว +4

    Great Explanation of each step....right from opening file to end....because sometimes as a newbie we find difficult to which file to use from github also.....Thank you ....Great Video!

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Wow thanks for the encouraging words, glad you’ve found the video helpful 😊

  • @HVjugo
    @HVjugo 3 ปีที่แล้ว +1

    I used this before, but I didn't knew that you can select the table using the brackets, awesome! Thanks for the video!

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว

      Glad it's helpful, thanks for watching!

  • @kwanpakshing
    @kwanpakshing 3 ปีที่แล้ว +2

    The video is great. But the screen text us way too small to read. Suggest that you can enlarge the font or reduce the white space in the screen to make the video no e readable

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว

      Thanks for the suggestion, greatly appreciate it, yes in recent videos I have increased the font size.

  • @usmanafridi9668
    @usmanafridi9668 3 ปีที่แล้ว +6

    Amazing! I am totally new to web scraping. I tried to scrape the website using beautiful soup library for 4 days now, but I can't get past the basics. You have extremely simplified it for me. For instance, I just scraped data from Wikipedia about the list of countries and their population and got the whole table in the first attempt. Thank you so much! I wonder if this can be used for other pages like LinkedIn, Glassdoor data collection? Because there are no tables there. Professor, thank you so much once again!

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +2

      Glad to hear that the video was helpful! For non-tabular pages you may have to use beautifulsoup and/or selenium

  • @melshae8630
    @melshae8630 5 หลายเดือนก่อน

    Wow your video is the best , it took me forever to run this .This video helped me in 5 min. Thank you !!!

  • @da_ta
    @da_ta 4 ปีที่แล้ว +4

    Great well explained clear and excellent quality of sound. Thanks for doing this keep it up!

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Thanks for the encouragement 😃

  • @givansot4581
    @givansot4581 3 ปีที่แล้ว +1

    thanks a lot. I am doing a machine learning project and do web scraping in the same code...thanks this is better

  • @nickolaisimmons4638
    @nickolaisimmons4638 2 ปีที่แล้ว +3

    Wow this is a great video! Very well organised!

  • @TcRiverrat18
    @TcRiverrat18 2 ปีที่แล้ว

    Excellent work breaking this down. I have only used R, but this seemed incredibly intuitive. Thank you!

  • @prashant381
    @prashant381 2 ปีที่แล้ว +1

    A query, in row 12 , why are we using .index along with df.drop ? why wouldn't df.drop work without it ?

  • @rogerwprice
    @rogerwprice 4 ปีที่แล้ว +3

    Fabulous - it's soooo easy when you know how!

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Thanks for watching Roger, absolutely agreed with that 😃

  • @vyacheslavgorkunov3790
    @vyacheslavgorkunov3790 4 ปีที่แล้ว +3

    Thx for the video, was really helpful. I wish u more subscribers, man ;)

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Thanks for the support! 😃

  • @Moonlight-jx2sj
    @Moonlight-jx2sj 4 ปีที่แล้ว +2

    Amazing! your video helped me with my 1st homework in Data Mining. And also thinking to jump into data science, so Thank you so much! Like and Subscription!

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      Glad I could help! And welcome to Data science!

  • @randyluong6275
    @randyluong6275 4 ปีที่แล้ว +4

    this tutorial gets my subscription. Thank you Professor. :)

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Wow, glad to hear that, welcome aboard 😃

  • @engr.inigo.silva2000
    @engr.inigo.silva2000 2 ปีที่แล้ว +1

    Bravo Data Professor, nice lecture!

  • @soufianelamsiah4337
    @soufianelamsiah4337 3 ปีที่แล้ว +1

    what would be best for comparing prices between competitors?

  • @luciferkhusrao
    @luciferkhusrao 4 ปีที่แล้ว +2

    Awesome work by the hero! Keep teaching like this

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      Thanks for the encouragement 😃

  • @danniliu2544
    @danniliu2544 3 ปีที่แล้ว +3

    Hi Data Professor, thanks for this video. It's very helpful. I'm a newbie starting out in data science and web scraping. Just wondering can you use pandas functionality for scraping data that are not laid out in table? and how would you do that? could you perhaps create a video on scraping non tabular data if you haven't already?

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      Great question, to web scrape non-tabular data you can look into using beautiful soup and also selenium libraries for Python

    • @danniliu2544
      @danniliu2544 3 ปีที่แล้ว +1

      @@DataProfessor thank you for the pointer, much appreciated!

    • @Panucci75
      @Panucci75 3 ปีที่แล้ว

      Exactly the question I was gonna ask. Thanks.

  • @amoahs7779
    @amoahs7779 3 ปีที่แล้ว +2

    Hi professor I truly enjoy your videos and have learnt a lot may God keep you successful in life.
    A question that's been on my mind is what laptop do you use as I really like the keyboard sound when you type unless you are using a external keyboard.
    Is it possible for you to show us a set-up of your desk ?
    Kind regards

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      Hi, I'm using a MacBook Pro (2016) and yes the keyboard feel is good on this laptop although being a bit flat which is a good thing as it allows minimal effort in moving from one button to the next.

  • @cllim80
    @cllim80 4 ปีที่แล้ว +3

    Thank you for the clear explanation !

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      A pleasure! Thanks for watching 😃

  • @manishabheemanpelly3580
    @manishabheemanpelly3580 3 ปีที่แล้ว +2

    Thank you so much for this concept it was really time saving one!

  • @pauloreis8868
    @pauloreis8868 4 ปีที่แล้ว +3

    Hi, Professor! Thank you for the contents you brings to us, it really helps! \o/
    Lately, I've been asking myself: How important is web scraping for a data scientist? How often do you web scrape?
    I just started learning it, I'll keep going and I wanted to know your thoughts about its relevance.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +2

      Hi Paulo, webscraping comes in handy when you want to create your own dataset from available data on the internet. For example, you want to analyze the salary of data scientists from glassdoor database then you can do that with webscraping. Hope this helps 😃

  • @sangpark7656
    @sangpark7656 3 ปีที่แล้ว +2

    Hi Professor does the original data need to be a html file to start with? Does the original data always need to have a table to extract data?

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      Yes to both questions, that’s the limitation of this approach. Other than that selenium + beautifulsoup is a good combo to look into.

    • @sangpark7656
      @sangpark7656 3 ปีที่แล้ว +1

      I see. Thank you very much for the guidance!!@@DataProfessor

  • @fazlaynur4509
    @fazlaynur4509 3 ปีที่แล้ว +2

    Thanks bro, for your nice tutorials

  • @spacebird9430
    @spacebird9430 4 ปีที่แล้ว +2

    hey professor, thankyou for the content.
    but i was wondering when we are scrapping by just passing the link how does it know to only read data from the table and not any other information.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      Hi, the function will detect HTML syntax. The syntax for tables in HTML is and the read_html() function finds these to figure out that they are tables and extracts the data.

  • @legacylifey182
    @legacylifey182 3 ปีที่แล้ว +1

    Thank you so much for this concept it was really helpful respect !

  • @shankaricharan510
    @shankaricharan510 7 หลายเดือนก่อน +1

    Thanks a lot - this helped a lot.

  • @lolsucks3599
    @lolsucks3599 2 ปีที่แล้ว

    Is there an api for sports results? or you have to do it via web scraping?

  • @nourarifi2642
    @nourarifi2642 4 ปีที่แล้ว +2

    thank you for your video my question if there are many tables in so many pages (20000 page) what should I do ???

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      The pandas read_html function is suitable for a simple webpage with relatively few tables. For more complex and large volume of pages I would recommend to look into beautifulsoup and selenium.

  • @lucianodomingues2290
    @lucianodomingues2290 4 ปีที่แล้ว +2

    Great video Professor!

  • @wisjnujudho3152
    @wisjnujudho3152 2 ปีที่แล้ว

    this is exciting. i love pandas

  • @salikmalik7631
    @salikmalik7631 4 ปีที่แล้ว +2

    Really awesome.. Data Professor

  • @nowdevoted1649
    @nowdevoted1649 4 ปีที่แล้ว +1

    Superb, let me bring you some more guys to your channel

  • @aniwahidaabdulrahim2538
    @aniwahidaabdulrahim2538 4 ปีที่แล้ว +1

    Hello Professor, I would like to suggest you to publish a video about RSelenium which use with Selenium Webdriver for automation system testing :D Hope it may benefits others. This is just my humble suggestion.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Great suggestion! I have played around with Selenium for Python and have found it pretty powerful. What I made so far was a short script that can take screenshots of my youtube channel's page (or any webpage).

  • @sanjj_1
    @sanjj_1 3 ปีที่แล้ว

    f strings are more readable compared to the .format() method

  • @mj7146
    @mj7146 4 ปีที่แล้ว +2

    Great content !
    Any idea on how I can scrape data for example from linkedin Jobs Postings. I found Octoparse for this, any ideas?

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +3

      Thanks Mert for the kind comment. pandas works only for tabular data from webpages. For linkedin posts, we'll probably have to use beautiful soup for that. I might make a future video about that, will put it into the to-do list.

    • @mj7146
      @mj7146 4 ปีที่แล้ว +1

      Data Professor thank you 🙏

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      @@mj7146 A pleasure!

    • @oguguaonyinyechi4980
      @oguguaonyinyechi4980 4 ปีที่แล้ว +1

      @@DataProfessor Hi Data Professor, we are still expecting this :grin:

  • @piyushyadav7162
    @piyushyadav7162 ปีที่แล้ว +1

    Hi! ken jee, I try your code of web screping on kaggle but I'm getting
    RLError: error.
    i try to solve but i cannot resolve ...please give me your suggestions

    • @DataProfessor
      @DataProfessor  ปีที่แล้ว

      Hi Piyush,
      The pandas library allows scraping webpages that have tabular data such as from Wikipedia. It is really limited to those with a predefined table format. To scrape webpages I'd recommend looking into selenium and beautifulsoup

  • @tannyamishra9291
    @tannyamishra9291 2 ปีที่แล้ว

    Can you please explain how to read all the retrieved urls

  • @Troglodyte2021
    @Troglodyte2021 4 ปีที่แล้ว +2

    A great tutorial!

  • @AmitKumar-hm4gx
    @AmitKumar-hm4gx 3 ปีที่แล้ว

    Do you know if we can use this to scrape sites built with dynamic JS, and how do we do this if we have to login ?

  • @raphaellutz2693
    @raphaellutz2693 2 ปีที่แล้ว +2

    Very nice video

  • @RyanLoh
    @RyanLoh 2 ปีที่แล้ว

    Can you also use df2019(df2019[‘Age’] == ‘Age’) to find the ages containing the word ‘Age’?

  • @badraboufirasse433
    @badraboufirasse433 4 ปีที่แล้ว +2

    Very helpful thank you!

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Thanks Badr for the kind words!

  • @markslima1557
    @markslima1557 2 ปีที่แล้ว +1

    very cool thanks!

  • @kennykern6292
    @kennykern6292 4 ปีที่แล้ว +2

    This helped thanks!

  • @blankmedia01
    @blankmedia01 4 ปีที่แล้ว

    Hey I tried using the code on Wikipedia to scrape tables on Wikipedia. When it comes to scraping on place with loads of other data and i just want to pull the table alone is there a method for that? As with current code im pulling whole page. And I just want the playoff stats... i think I'm supposed to creat dictionary then assign it to a dataframe but I dont know how when it comes to urls and websites.

  • @argiepoul7457
    @argiepoul7457 3 ปีที่แล้ว +1

    What are the prerequisites to watch this tutorial? I know some python, is this ok?

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      Yes, beginner’s level of Python is sufficient to follow along.

  • @narongtumsri-ubol1737
    @narongtumsri-ubol1737 3 ปีที่แล้ว +2

    thank for knowledge

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว

      A pleasure, thanks for watching

  • @kalyanprasad4069
    @kalyanprasad4069 4 ปีที่แล้ว

    How do we deal when we encounter the error "HTTP Error 403: Forbidden" while reading url with Pandas? How should we proceed in this case?
    Kindly advise.

  • @vaasudhfp2874
    @vaasudhfp2874 3 ปีที่แล้ว +1

    not working for other sites i did it for tripadvisor nothing came

  • @priyalshah8869
    @priyalshah8869 2 ปีที่แล้ว

    How do I keep the url that the coloum tm has in my dataframe?

  • @ekoatm1914
    @ekoatm1914 3 ปีที่แล้ว

    Matur nuwun sanget sedulur....

  • @XoreLP
    @XoreLP 3 ปีที่แล้ว

    Why did you use string.format instead of String concatination

  • @sameermehdi3143
    @sameermehdi3143 2 ปีที่แล้ว +1

    Thankyou so much sir

  • @shwetaredkar734
    @shwetaredkar734 4 ปีที่แล้ว +2

    Informative.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      Thanks Shweta for the kind comment!

  • @Papiii_benz
    @Papiii_benz 3 ปีที่แล้ว +2

    Thanks !

  • @moatasimashraf6818
    @moatasimashraf6818 3 ปีที่แล้ว +1

    (ImportError: lxml not found, please install it)
    I got this error. what is the solution?

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว

      Hi, you can install lxml via pip install lxml

    • @moatasimashraf6818
      @moatasimashraf6818 3 ปีที่แล้ว +1

      @@DataProfessor Done it, thank U

  • @jojushaji3010
    @jojushaji3010 4 ปีที่แล้ว +2

    Ure awesome sr

  • @harshitsharma8131
    @harshitsharma8131 2 ปีที่แล้ว

    what if there is no table on a web page ??

  • @tareqmahmud3902
    @tareqmahmud3902 3 ปีที่แล้ว +1

    You look like jomatech's big brother :O

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      Haha, I get that a lot. Joma and I should do a collab video 😆

    • @tareqmahmud3902
      @tareqmahmud3902 3 ปีที่แล้ว +1

      @@DataProfessor But Sir I learned a week's lesson from one of your 10 minute video. I can't be more grateful to you. Thank you.

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      @@tareqmahmud3902Thanks, glad to hear that they’re helpful! 😊

  • @saulo_foot
    @saulo_foot 3 ปีที่แล้ว

    Every link turns into a df. How can I concatenate all the dfs?

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      Hi, dfs can be concatenated using the pd.concat() function, you can play around with axis=0 or axis=1 depending on how you want to combine the dfs (side by side or stacked on top of the other)

  • @lyhuutai3339
    @lyhuutai3339 3 ปีที่แล้ว

    how to save df to excel ? please

  • @qi8983
    @qi8983 2 ปีที่แล้ว

    Awesome

  • @mootaz3944
    @mootaz3944 3 ปีที่แล้ว +1

    i try it on ur channel ( just for testing lol )

  • @alexwatson6370
    @alexwatson6370 4 ปีที่แล้ว +1

    Don't name your variables str or you will shadow the string builtin

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      You're right, many thanks for pointing that out, why did I do that. I've changed it to url_link now.

  • @ishpandey7886
    @ishpandey7886 4 ปีที่แล้ว

    Is this useful for every situation?
    I am trying to fetch data from glassdoor but this method is not working
    Link: "www.glassdoor.co.in/Job/bengaluru-data-analyst-jobs-SRCH_IL.0,9_IC2940587_KO10,22.htm"