Scraping Data from a Real Website | Web Scraping in Python

แชร์
ฝัง
  • เผยแพร่เมื่อ 10 ก.ค. 2023
  • Take my Full Python Course Here: bit.ly/48O581R
    In this Web Scraping tutorial we are going to be scraping data from a real website!
    GitHub Code: bit.ly/442kIVi
    ____________________________________________
    SUBSCRIBE!
    Do you want to become a Data Analyst? That's what this channel is all about! My goal is to help you learn everything you need in order to start your career or even switch your career into Data Analytics. Be sure to subscribe to not miss out on any content!
    ____________________________________________
    RESOURCES:
    Coursera Courses:
    📖Google Data Analyst Certification: coursera.pxf.io/5bBd62
    📖Data Analysis with Python - coursera.pxf.io/BXY3Wy
    📖IBM Data Analysis Specialization - coursera.pxf.io/AoYOdR
    📖Tableau Data Visualization - coursera.pxf.io/MXYqaN
    Udemy Courses:
    📖Python for Data Science - bit.ly/3Z4A5K6
    📖Statistics for Data Science - bit.ly/37jqDbq
    📖SQL for Data Analysts (SSMS) - bit.ly/3fkqEij
    📖Tableau A-Z - bit.ly/385lYvN
    Please note I may earn a small commission for any purchase through these links - Thanks for supporting the channel!
    ____________________________________________
    BECOME A MEMBER -
    Want to support the channel? Consider becoming a member! I do Monthly Livestreams and you get some awesome Emoji's to use in chat and comments!
    / @alextheanalyst
    ____________________________________________
    Websites:
    💻Website: AlexTheAnalyst.com
    💾GitHub: github.com/AlexTheAnalyst
    📱Instagram: @Alex_The_Analyst
    ____________________________________________
    All opinions or statements in this video are my own and do not reflect the opinion of the company I work for or have ever worked for

ความคิดเห็น • 274

  • @jorge.roques5533
    @jorge.roques5533 2 หลายเดือนก่อน +44

    Honestly I love that you include your missteps in your tutorials for several reasons. It makes coding seem more human, it also shows us that even content creators and great programmers can have missteps that they need to go back and fix which is usually edited out of other tutorial videos. Not to mention there might be people having the same issues without understanding why and you explain it so its almost a mini tutorial on debugging and your programmer thought process. Overall it was an easy 25 minutes to spend watching this. Thank you.

    • @nocturnalcb
      @nocturnalcb 2 หลายเดือนก่อน

      Exactly😁

  • @aaronklingensmith159
    @aaronklingensmith159 6 หลายเดือนก่อน +42

    Alex: when I needed to learn SQL for my first analyst job as a career changer, you were there with videos to help me do so. Now I'm in a role that is using more python and once again, you're there! Really appreciate all the work you are putting into creating content to help people!

    • @--Manoj007
      @--Manoj007 6 หลายเดือนก่อน

      Can you tell me that this playlist is useful for analyst

  • @Charlay_Charlay
    @Charlay_Charlay 5 หลายเดือนก่อน +24

    12:21 I literally stopped when i couldn't figure out why i was getting extra titles when i pulled the titles. I'm so glad that you showed your Rookie mistake. Everyone please watch Alex's videos in full before stopping the video. Thank you for showing your mistakes.

    • @chrille91
      @chrille91 3 หลายเดือนก่อน +2

      In fact, YOUR approach is the correct way of solving such issues!
      Trying to figure out the error on your own is the ACTUAL learning taking place!
      Always try for yourself first, before you have a look at the solution. Otherwise you might fall victim to the fake-learning trap.

  • @francescab1413
    @francescab1413 7 หลายเดือนก่อน +9

    I'm so glad you make mistakes and show us where to check if something goes wrong! It's my main problem when I have to work on my own after a tutorial, I mess up and don't ever know where to start to clean up my mess.

  • @Kicsa
    @Kicsa 9 หลายเดือนก่อน +5

    I saw all the videos for this playlist and I am getting to this last one, I haven't felt so happy to learn in a while, thank you for your work and help!

  • @EKTurduckin
    @EKTurduckin 7 หลายเดือนก่อน +24

    Last year I got a job as a BI Analyst and I've been watching your stuff here and there. This video is hands down one of the best videos I've watched of yours.
    I had to take multiple tables, pivot them, and label them with the table name and this video 100% helped me get there. I had run into my own set of issues, but not far removed from your sections of mistakes, so thank you for not letting those hit the cutting room floor.
    Anyway, keep up the great work and thanks so much!

  • @sj1795
    @sj1795 5 หลายเดือนก่อน +16

    This was one of my FAVORITE projects in your series so far! It was SUPER interesting and HELPFUL/USEFUL. I can see using this info for many future projects.
    P.S. I LOVE that you included the "rooky mistake" because that is definitely something I would do and then NOT be able to figure out for an hour. These included "mistakes" are such valuable lessons for people in your audience like me. :) P.P.S. I really appreciate how you summarize what we do in each video/project at the end. It's these extra details that make your instruction = A+, not just an A. Also, thank you for including the index = False. As always, THANK YOU ALEX!! You ROCK!

  • @saudtechtips8674
    @saudtechtips8674 3 หลายเดือนก่อน +2

    my mind is blown after watching the whole video i didnt imagine this could be done by python.i have to watch it again!what a person you are Alex!

  • @neronova1176
    @neronova1176 11 หลายเดือนก่อน +5

    Thanks, Alex!
    This was a really helpful lesson and project. This helped me get a better understanding of web scrapping and restructuring the data. Now, I feel confident in applying this to a project I've been working on.

  • @eatersdaily
    @eatersdaily 4 หลายเดือนก่อน

    dude it's awesome ! just keep teaching. short, empty of long stories, useful and update data! that's all i want always.

  • @Nomuz32
    @Nomuz32 10 หลายเดือนก่อน +19

    Hi Alex, thank you a lot for all the videos. I'm currently doing a change of career to data analyst, and you are giving me more than just a little help with all your courses. Thanks for all

  • @traetrae11
    @traetrae11 9 หลายเดือนก่อน +2

    Thank you for doing this Alex. I learned a lot and followed along while watching this series so that I could learn how to do this as well. Now all I need to do is practice, practice, practice.

  • @sojourner5294
    @sojourner5294 2 หลายเดือนก่อน

    Completely quick, efficient and clear, really appreciate your effort and content Alex ! Thank You !

  • @noob4head
    @noob4head 9 หลายเดือนก่อน +4

    Thank you for this video with a extremely clear explanation. I always wonder why my college professors can't explain something as clearly as some people on TH-cam can.

  • @leonardnewbill793
    @leonardnewbill793 11 หลายเดือนก่อน +1

    Super excited to finish the lesson! Thank you sir. I appreciate it!

  • @raphael.dev13
    @raphael.dev13 11 หลายเดือนก่อน +10

    Hey Alex!
    Thanks for the great video as always!
    Could you do a video on the repercussions and impact on the Data Analyst career now that OpenAI released their GPT Code interpreter?

  • @izzyvickers6258
    @izzyvickers6258 7 หลายเดือนก่อน +2

    You made this wayyyy easier than I thought it would be! Worth a sub from me sir!

  • @user-xb7og2ls5s
    @user-xb7og2ls5s 9 หลายเดือนก่อน +1

    Thank You so so much for this video, Alex! It was super useful and easy to follow!

  • @Autoscraping
    @Autoscraping 5 หลายเดือนก่อน

    A fabulous video that has been of great help in orienting our new collaborators. Your generosity is highly valued!

  • @dhanienugroho4323
    @dhanienugroho4323 8 หลายเดือนก่อน +1

    Thanks for the tutorial! I just found the channel and I like the way you explain it!

  • @blackwidow2899
    @blackwidow2899 8 หลายเดือนก่อน

    Wow, Alex I totally enjoyed this. You make it so easy to understand. Now I need to go through your pandas tutorial and learn data manipulation. Thanks for being there!

  • @prasad_create2687
    @prasad_create2687 6 หลายเดือนก่อน +3

    Thank you, I learnt basics of python yesterday(had learnt C+ 8 yrs back so it was easy to relate) and I am a mechanical engineer but want to get into Product. This video was useful to learn and will modify it for other websites hopefully. Thanks again!

  • @cityoflaredoopendatadivisi9197
    @cityoflaredoopendatadivisi9197 12 วันที่ผ่านมา

    very helpful video. love the troubleshooting as you go, and simple explanation of how you're working through this. thank you.

  • @YourYTHUB
    @YourYTHUB 8 หลายเดือนก่อน +2

    Hey Alex, thank you so much for ur effort,,,its a really super helpful series 🙏

  • @pritamlaskar7265
    @pritamlaskar7265 9 หลายเดือนก่อน +1

    Thank you so much! Very clear and well explained!

  • @louisamkeyakala9420
    @louisamkeyakala9420 11 หลายเดือนก่อน +4

    the way i was waiting for this video😂..thank you Alex

  • @margotonik
    @margotonik 4 หลายเดือนก่อน

    I loved this!!! Very good practice I enjoyed working in this project including the mistakes. Is always good to know that having errors doesn't make myself an idiot and is part of the process. Thank you so much for everything Alex I am sure we all love you as well!!

  • @boeingpete
    @boeingpete 3 หลายเดือนก่อน

    Excellent. Great video. Everything explained clearly and in a way I could follow. Thanks so much.

  • @SupCortez
    @SupCortez 6 หลายเดือนก่อน +3

    Just finished google data analyst certification, you about to help me make my portfolio look phat with scraping my own data before I do my whole hypothesis and data vis

  • @mikeg4691
    @mikeg4691 10 หลายเดือนก่อน +3

    I found out why the class names were different. It seems to be a common issue. Someone explained it on Stack Overflow,
    "The table class wikitable sortable jquery-tablesorter does not appear when navigating the website until the column is sorted. I was able to grab exactly one table by using the table class wikitable sortable."

  • @jeet611_
    @jeet611_ 9 หลายเดือนก่อน +1

    Thanks alot Alex it helped me alot to explore this Webscraping and thanks for making this interesting and on point

  • @anthonygordon5052
    @anthonygordon5052 11 หลายเดือนก่อน

    Thanks for the videos as usual Alex !

  • @yunusaprianus736
    @yunusaprianus736 9 หลายเดือนก่อน +1

    I'm done with the tutorial today and end with awesome successful, i'm facing some trouble since i use different site but yeah, my scraping going well!
    Thank you so much!

  • @gabinkundwa7215
    @gabinkundwa7215 9 หลายเดือนก่อน +1

    Thank you Alex, I am new to web scrapping and this video was helpful to me! Keep the good work!

    • @gameaddict3068
      @gameaddict3068 5 หลายเดือนก่อน

      Check out my chanel for nice web scraping tools

  • @oanhkieunguyen156
    @oanhkieunguyen156 11 หลายเดือนก่อน

    Thanks so much for this video! I firstly understand the principle and the way to scrap data :)

  • @proud_indian0161
    @proud_indian0161 หลายเดือนก่อน

    Great Tutorial, Got what i was looking for thanks

  • @ZeuSonRed
    @ZeuSonRed 9 หลายเดือนก่อน +1

    This was from the Greatest Videos I have Ever seen Thank you! Very Much! 🙃🙃🙃🙃🙃🙃😊

  • @ashutoshranjan4644
    @ashutoshranjan4644 18 วันที่ผ่านมา

    I like your way of teaching. Looking forward to learn from you.
    Thanks for making such content

  • @ibrahimmohamoudbile3424
    @ibrahimmohamoudbile3424 10 หลายเดือนก่อน

    You’re a ‘God sent’ my g

  • @MudassarAli-bx2pf
    @MudassarAli-bx2pf 8 หลายเดือนก่อน

    Excellent Work Sir!!! I really Appreciated your work believe me You are a great mentor!

  • @whitey9933
    @whitey9933 4 หลายเดือนก่อน

    Thanks for the tutorial,
    Was always told not to add to a dataframe row by row (probably slower for much larger data),
    so I appended to a list and created a Dataframe off that - pd.DataFrame(company_list, columns=world_table_titles).set_index(['Rank'])

  • @richardtorrenueva5512
    @richardtorrenueva5512 7 หลายเดือนก่อน

    I love this. Thank you Alex.

  • @ibikunleadekiitan9882
    @ibikunleadekiitan9882 10 หลายเดือนก่อน

    Thanks Alex for making me a great value to the world

  • @ghimirepujya
    @ghimirepujya 10 วันที่ผ่านมา

    I really salute your work . Thank you.

  • @tsubame1412
    @tsubame1412 2 หลายเดือนก่อน

    Thanks, this video is really helpful for me at this moment !

  • @kuiwang3614
    @kuiwang3614 2 หลายเดือนก่อน +1

    fantastic lesson, very clear

  • @vamshikrishnareddyLingam
    @vamshikrishnareddyLingam หลายเดือนก่อน

    one word Beautiful video it actually helped to get the client

  • @Vikash-the-analyst
    @Vikash-the-analyst หลายเดือนก่อน

    Honestly, very informative and this help me very well to learn this topic. Explanation of every code is very useful. Thanks for making this informative video.

  • @artemboichenko743
    @artemboichenko743 11 หลายเดือนก่อน +14

    Hi Alex! Super helpful video, thank you! One detail though: Growth index is not always positive. We may see in the wiki table negative and positive values are present in that column. Instead of using ‘-‘ for negative value, that table uses small triangles. Could you show us how to manage that - to convert those triangles into positive or negative values accordingly?

    • @ridanaeem1012
      @ridanaeem1012 6 หลายเดือนก่อน

      hey, any workaround for this?

    • @pawledz
      @pawledz 5 หลายเดือนก่อน

      I am sure that there is a better way to handle this, but this will work:
      df = pd.DataFrame(columns = world_table_titles)
      df
      column_data = table.find_all('tr')
      for row in column_data[1:]:
      row_data = row.find_all('td')
      row_table_data = [data.text.strip() for data in row_data]
      if row.find_all('span')[1]['title'] == 'Decrease':
      row_table_data[4] = "-" + row_table_data[4]
      length = len(df)
      df.loc[length] = row_table_data

  • @sumanhachappa2822
    @sumanhachappa2822 8 หลายเดือนก่อน

    fantastic way of explaining things

  • @moviesprobe6220
    @moviesprobe6220 11 หลายเดือนก่อน +1

    Much needed video ❤

  • @martinbolio257
    @martinbolio257 หลายเดือนก่อน

    Very very useful! Great video.

  • @adiyansfuntime
    @adiyansfuntime 2 หลายเดือนก่อน

    This is a fun project. Thanks for this.

  • @iSky950
    @iSky950 5 หลายเดือนก่อน

    Very nice video Alex thanks for sharing! (I love that it's "live" and you make mistakes too, it's more human this way!)

  • @sgntsids
    @sgntsids 12 วันที่ผ่านมา

    Going through this series for a personal project, such wonderful content! For the class tags, it seems like when there's a space, bs4 ignores the 2nd "part". For instance, in my project I'm seeing the element and I just need to ignore the "list-unstyled" part for the soup.find to work.
    Didn't read through all the comments here so you might have already figured that out and shared, but wanted to comment anyway. Cheers!

  • @Mvjesty23
    @Mvjesty23 11 หลายเดือนก่อน +1

    I’m going to do this today! Thank you Alex 😄

  • @anirudh7150
    @anirudh7150 3 หลายเดือนก่อน

    Thank you so much. It was really helpful

  • @sjb_s2003
    @sjb_s2003 หลายเดือนก่อน

    this was really helpful, thankyou

  • @ebamybass19
    @ebamybass19 10 หลายเดือนก่อน +2

    Thank you Alex Frebeg ❤❤

  • @prempatkar2372
    @prempatkar2372 10 หลายเดือนก่อน +1

    Hey Alex,
    It was a great video and I did find it to be very helpful and intresting . I would like to ask one question can we also do it for the second table and can we get the same table under the same excel csv file?

  • @nguyenhuyhoangk18hcm37
    @nguyenhuyhoangk18hcm37 4 หลายเดือนก่อน

    I am really like your project! I appreciated you

  • @assettemirkhan7087
    @assettemirkhan7087 11 หลายเดือนก่อน

    Hi Alex, thanks for the video, it is very helpful

  • @Larocaxx
    @Larocaxx 7 หลายเดือนก่อน

    We love you too Alex ♥ thank you for such great videos

  • @vnrd9
    @vnrd9 8 หลายเดือนก่อน

    thank you so much, super helpful

  • @UtiaGaxton
    @UtiaGaxton 11 หลายเดือนก่อน

    Thank you sir. You got me going

  • @Anuj_Hindu_10k
    @Anuj_Hindu_10k 11 หลายเดือนก่อน

    Wow, amazing video sir....Thanks you

  • @Nalla-perumal
    @Nalla-perumal 4 หลายเดือนก่อน

    Simply Wow!!! handsoff!

  • @Photoshop729
    @Photoshop729 7 หลายเดือนก่อน +2

    So far on my web scraping journey I don’t know if web scraping is any faster than just manual copy paste unless you have repeated scrape requests of the same site or structure

  • @ZeeshanAli-ds1tm
    @ZeeshanAli-ds1tm 3 หลายเดือนก่อน

    A question. How we can scrape 'td' and 'th' at the same time within same tbody < tr tags.

  • @user-mh1ch3mq7h
    @user-mh1ch3mq7h 8 หลายเดือนก่อน +1

    Interesting class!!

  • @yoshitamanavi530
    @yoshitamanavi530 10 หลายเดือนก่อน

    I just have one comment, You are the best Alex 🤩

  • @anuradhamondal1601
    @anuradhamondal1601 9 หลายเดือนก่อน

    02:26 lol.. as a beginner to this and already overwhelmed with all information i recently learned, it is exactly what i would had thought!

  • @benjaminkaitany4464
    @benjaminkaitany4464 2 หลายเดือนก่อน

    Amazing tutorial

  • @efeberke681
    @efeberke681 7 หลายเดือนก่อน

    So helpful!

  • @ayushsinghrawat1409
    @ayushsinghrawat1409 20 วันที่ผ่านมา

    I hands on to my 1st scrapping experience with your sir

  • @wesrocha3293
    @wesrocha3293 8 หลายเดือนก่อน

    Amazing, thanks!

  • @akshaybharadwaj
    @akshaybharadwaj 4 หลายเดือนก่อน

    This is super helpful! Thanks so much!

    • @matrixnepal4282
      @matrixnepal4282 4 หลายเดือนก่อน

      brother, did 'th' worked in you case? while i was doing it, it shows all the numbering in th too. I will really appreciate you help if you reply

  • @shivamprajapati65
    @shivamprajapati65 3 หลายเดือนก่อน

    very helpful!

  • @atrallzerhas3157
    @atrallzerhas3157 11 หลายเดือนก่อน

    great video.Thank you

  • @KhushiSingh-vo9nf
    @KhushiSingh-vo9nf 4 หลายเดือนก่อน

    thanks a lot for guiding us

  • @joeche7461
    @joeche7461 7 หลายเดือนก่อน +1

    Thanks a lot for the video.

  • @stingray3565
    @stingray3565 10 หลายเดือนก่อน

    Great video. Thank you...

  • @ucthanhchu3688
    @ucthanhchu3688 27 วันที่ผ่านมา

    nice video! thanks

  • @naimmomin5811
    @naimmomin5811 5 หลายเดือนก่อน

    So I just had this one question and this is at 12:27 -> Even if you were to switch the soup.find_all('th') to table.find_all('th'). Shouldnt it return the same thing as the last one. Since all the tables are from the same class? and they all also use for the headers

  • @paullemaron5258
    @paullemaron5258 4 หลายเดือนก่อน +1

    Hey Alex, I am so proud of the amazing job you are doing, thank you for the amazing project, I am studying for a job interview tomorrow and I know I will ace it coz Alex is my teacher.

    • @markchinwike6528
      @markchinwike6528 3 หลายเดือนก่อน

      Hello. How did it go with the interview? Just to help us transition into the industry.

    • @paullemaron5258
      @paullemaron5258 3 หลายเดือนก่อน

      @@markchinwike6528 Hello sir, I had the interview and it was a success, It majorly focused on SQL and the skills here are more than enough. I have the second interview in two weeks from now.

  • @donovanmurray7005
    @donovanmurray7005 5 หลายเดือนก่อน +1

    Thank you!

  • @gabrielledatascience
    @gabrielledatascience หลายเดือนก่อน

    If anyone is having issues around 13:31 when we state the dtaaframe columns, try adding
    , dtype='object'
    after world_table_titles so that the data type of the column headers can be set. mine had that issue and thought that I could share :)

  • @frybait0626
    @frybait0626 11 หลายเดือนก่อน +1

    There is something wrong with the table2. Table2 only contains 20 rows of data, up until the point of for loop for Table2 is correct. Its outputting 20 rows of data but once you call df, then its outputting 100+. Something is wrong in there. Upon checking the CSV file once the data for table 2 has been saved, rows are being repeated over and over. I think there must be something wrong with the for loop I guess.

  • @prezgrounds6170
    @prezgrounds6170 8 หลายเดือนก่อน

    Thanks for this video helped me a lot. When I tried to pull the table headers only worked with tr not th. This might help others with the same issue

  • @dakshbhatnagar
    @dakshbhatnagar 10 หลายเดือนก่อน

    Hey Alex, Can you do a selenium scraping tutorial? It would help a lot to scrape dynamic websites.

  • @cbacca2999
    @cbacca2999 24 วันที่ผ่านมา

    Hi Alex. In the Wikipedia revenue table there is a minus sign in some of the revenue rows. This is actually an extended ascii n-dash or m-dash which will appear as another character. Look for a funky character in those rows in the output. I work in the print industry and this is an inappropriate use of the n- or m-dash for us.

  • @himanshubisht5023
    @himanshubisht5023 11 หลายเดือนก่อน +2

    Hello Alex Sir!
    Thanks for the great video, super helpful as always!
    Could you do a video on how to convert PDF file to excel in python | OR | Data extraction from PDF File.
    It will be really really helpful to me and other student/fresher...

  • @ezhankhan1035
    @ezhankhan1035 4 หลายเดือนก่อน

    Really helpful, thanks! You explain this muuuuch better than in the IBM Python Course haha.

    • @matrixnepal4282
      @matrixnepal4282 4 หลายเดือนก่อน

      brother, did 'th' worked in you case? while i was doing it, it shows all the numbering in th too. I will really appreciate you help if you reply

    • @ezhankhan1035
      @ezhankhan1035 4 หลายเดือนก่อน

      ​@@matrixnepal4282Did you do table.find_all('th')? I think Alex also made a similar mistake initially by doing soup.find_all('th'). Should be ON the 'table'

  • @sergiysergiy8875
    @sergiysergiy8875 4 หลายเดือนก่อน

    That was a good one! Thx

    • @matrixnepal4282
      @matrixnepal4282 4 หลายเดือนก่อน

      brother, did 'th' worked in you case? while i was doing it, it shows all the numbering in th too. I will really appreciate you help if you reply

  • @jmc1849
    @jmc1849 3 หลายเดือนก่อน

    Hi Alex (as if!)
    Thanks for all the content

  • @AtharvChaulkar
    @AtharvChaulkar 5 หลายเดือนก่อน +1

    Perfect 🫶❤

  • @uncaged3076
    @uncaged3076 2 หลายเดือนก่อน

    Thank you 🙏🏿

  • @Kaura_Victor
    @Kaura_Victor 2 หลายเดือนก่อน

    Funny Alex posted this on my birthday last year. 🤭🙈😅

  • @hibayassir3118
    @hibayassir3118 หลายเดือนก่อน

    I haven’t been following the series , i just want to start implementing this project is there anything i need to do beforehand?

  • @melissalopezdecastilla9817
    @melissalopezdecastilla9817 8 หลายเดือนก่อน

    Hey Alex! It seems like I am encountering an issue where the cell number increases, but you're not seeing the expected output in Jupyter Notebook. This happens after running the code of the 3rd cell ' print(soup)', I am not seeing the printed data in the cell output and only getting an increment in cell numbers. Why is that happening and how can I fix it?