Render Dynamic Pages - Web Scraping Product Links with Python

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ก.ค. 2020
  • Thanks to Stuart for sending this site in! I enjoyed this scraping challenge.
    This video will show a simple method that can help with dynamically loaded content. I use the requestes-html library to render the page in the background quickly and efficiently, and scrape all the product links from the html DIV using the XPATH selector. I loop through each link to get all the product information.
    Coming in part 2 - pagination and functions to tidy up the code.
    -------------------------------------
    twitter / jhnwr
    code editor code.visualstudio.com/
    WSL2 (linux on windows) docs.microsoft.com/en-us/wind...
    -------------------------------------
    Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases
    mouse amzn.to/2SH1ssK
    27" monitor amzn.to/2GAH4r9
    24" monitor (vertical) amzn.to/3jIFamt
    dual monitor arm amzn.to/3lyFS6s
    microphone amzn.to/36TbaAW
    mic arm amzn.to/33NJI5v
    audio interface amzn.to/2FlnfU0
    keyboard amzn.to/2SKrjQA
    lights amzn.to/2GN7INg
    webcam amzn.to/2SJHopS
    camera amzn.to/3iVIJol
    gfx card amzn.to/2SKYraW
    ssd amzn.to/3lAjMAy
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 170

  • @JohnWatsonRooney
    @JohnWatsonRooney  4 ปีที่แล้ว +15

    Keyboard too loud? I've been using my mech kb again.. Is it too distracting?

    • @11hamma
      @11hamma 4 ปีที่แล้ว +5

      i think its fine, at least i didnt get distracted

    • @11hamma
      @11hamma 4 ปีที่แล้ว +3

      @Vishal Gupta that website is using a Javascript to load the content.
      But first try using the library explained in this video by John. It looks like you can get the work done through it.
      (i haven't used it myself so cant vouch for it) Anyhow is this library fails, you can definitely use selenium and get your work done. Selenium opens up the page in some of its browser and then load the page there which loads all of the page contents and in fact gives you the option of clicking at a particular web element.
      A tip: just load the page by selenium library. Then pass source code of that page into the bs4 also know as BeautifulSoup library and scrap the site in normal way from there on. It's essential because selenium's methods for extracting information out of website takes a lot amout of time and bs4 is much faster instead and has better error handling.

    • @Neil4Speed
      @Neil4Speed 4 ปีที่แล้ว

      Not at all, makes it feel like you are working away!

    • @dimaua1830
      @dimaua1830 2 ปีที่แล้ว

      I enjoy the sound. It's like in hackers in the movies :)

    • @kavehyarohi2886
      @kavehyarohi2886 2 ปีที่แล้ว

      kind of enjoyed it !

  • @schlotto
    @schlotto ปีที่แล้ว +4

    THANK YOU for this video and all the others. I am learning web scraping to gather data for my PhD thesis and you have helped me make such great progress in just a few days. :)

  • @xilllllix
    @xilllllix 2 หลายเดือนก่อน +1

    i'm going through ALL of your videos and just finished this one! learning so much it's incredible!

  • @ottomanasina1254
    @ottomanasina1254 3 ปีที่แล้ว +5

    Amazing explanation skills! Everything was clear. One of the greatest video for web scraping so far! Good job, Good luck!!

  • @tsay214
    @tsay214 3 ปีที่แล้ว

    What does first=True do?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      with requests-html "find" always returns a list, but using first=True forces it to return only a single item, the first element it finds that matches your find criteria

    • @tsay214
      @tsay214 3 ปีที่แล้ว

      @@JohnWatsonRooney got it, thanks. On to pt2!

  • @kewl201
    @kewl201 3 ปีที่แล้ว +4

    Man this is some amazing content. So glad i found your channel! Definitely earned a subscribe.

  • @agsantiago22
    @agsantiago22 2 ปีที่แล้ว +1

    Lifesaver! Thank you so much! Wish you the best of luck with your channel!

  • @stuarthoughton3517
    @stuarthoughton3517 4 ปีที่แล้ว +1

    Brilliant, John!!! Makes complete sense now. Thank you! 👏🏻

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว +1

      Thanks for sharing the site Stuart I enjoyed this one!

  • @Neil4Speed
    @Neil4Speed 4 ปีที่แล้ว +2

    Great video John as always - Thanks!

  • @farhadkhan3893
    @farhadkhan3893 ปีที่แล้ว +1

    Awesome!, I was searching for such type of scraping , and I found

  • @kavehyarohi2886
    @kavehyarohi2886 2 ปีที่แล้ว +1

    You are a truly life saver. great great video. thanks mate

  • @mia_bobia_
    @mia_bobia_ หลายเดือนก่อน

    this was super useful! I have a project rn that needs to scrape on many pages that need renderer. This looks much more lightweight than what I'm using rn (selenium)

  • @dobcs3236
    @dobcs3236 ปีที่แล้ว +1

    You are a great and creative person...keep going champ.

  • @mohamadalhamawi6437
    @mohamadalhamawi6437 2 ปีที่แล้ว +1

    very helpful tutorial , thank you for your efforts

  • @gitgosc7075
    @gitgosc7075 ปีที่แล้ว +1

    great as always, thanks!

  • @Aaron-qn1gu
    @Aaron-qn1gu 3 ปีที่แล้ว +1

    When I use Xpath, in products (on a different site, but same principles) terminal keeps returning 'None', the site is gwt based, would that affect xpath from working?

  • @samibdh
    @samibdh 3 ปีที่แล้ว +1

    Thank you man really useful !!

  • @edcoughlan5742
    @edcoughlan5742 4 ปีที่แล้ว +5

    I can get data from static websites using scrapy with relative ease, but I always come unstuck when I try the same with dynamic websites; I might give "html_requests' a go instead of my usual scrapy-selenium combo...Thanks for the video! 👊👊👊

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว +1

      Glad you liked it - give it a go. I believe scrapy-splash is an add on for scrapy that can reload dynamic pages but I’m yet to try it

  • @paulblart8262
    @paulblart8262 2 ปีที่แล้ว

    bravo sir, you gave me my eureka moment 👏

  • @imfinitiamusic.4632
    @imfinitiamusic.4632 2 ปีที่แล้ว +1

    You are the best, subscribed

  • @bagia1000
    @bagia1000 3 ปีที่แล้ว +1

    Hi, I tried your code on other website, but when I arrived at print(products) part, it returns 'NoneType' object. The code get no url. What should I do?. I tried to use the user-agent, but also return nothing

  • @ubaidkhan-rr3ow
    @ubaidkhan-rr3ow 3 ปีที่แล้ว +1

    Thank you sir. This make sense to me

  • @itstisn
    @itstisn 2 ปีที่แล้ว

    Hi, thank you so much for your video. I want to ask how to scrape multiple review page in one product? I get confuse

  • @nostalgeomusic
    @nostalgeomusic 3 ปีที่แล้ว +1

    Great video and easy to follow for a noob like me! Appreciate it :D

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      :D thank you

    • @nostalgeomusic
      @nostalgeomusic 3 ปีที่แล้ว +1

      @@JohnWatsonRooney Do you have any videos focusing on if statements and/or keyword lists such as changing results, for example;
      Junior = Entry Level
      Early Professional = Entry Level
      Graduate = Entry Level
      etc...

  • @user-et2lr8qf3o
    @user-et2lr8qf3o 3 ปีที่แล้ว +1

    Great job keep it up keep useful

  • @daddy_eddy
    @daddy_eddy 2 ปีที่แล้ว +1

    Super!!! I appreciated!

  • @royteicher
    @royteicher 2 ปีที่แล้ว +1

    Hi John and everyone, I'm having trouble with the html.render() method, I'd appreciate any help.
    First time the method runs, it downloads chromium. After I ran it, 3 red lines were printed (Downloading Chromium & stuff I can't remember), I felt like it took too long (more than 10 minutes), so I stopped the program.
    Now when I try to run a the method, the script just get stucked, I mean, it is running, but never continues to the lines after the html.render method. No errors are raising, the script simply never finishes to run.
    I tried to pip uninstall requests-html and reinstall it but I'm getting the same not indicative result.
    How can I troubleshoot this problem? I'm excite to work with requests-HTML and letting for of Selenium for standard rendering needs, but I can't.
    Thanks a lot for anyone who cares enough to give it a try.

  • @vincentamus
    @vincentamus 3 ปีที่แล้ว +1

    Hey John, great videos. Thank you so much for it! I wanted to ask, how can I scrape multiple categories(Categories like /computers, /headphones, /monitors/, /keyboards/), do you have any video or idea for that?
    Thanks for your content!

    • @navindubimsara9157
      @navindubimsara9157 6 หลายเดือนก่อน

      Hi bro, Did you find any technique to scrape multiple categories? Please let me know.

  • @GainzJPN
    @GainzJPN ปีที่แล้ว +1

    Thanks, again super easy to follow!

  • @Mr.AIFella
    @Mr.AIFella 5 หลายเดือนก่อน

    Thank you so much. Your video is going to help me a lot in a project that I'm going to start. One question if you don't mind, when I want to gather text but there is a part of the text is appearing and there is a[ click for more] ~>hyperlink, that prevents the text from being fully copied to the csv file. Do you have a hint or suggestions? I appreciate your help in advance

  • @user-vw3qz6ii9s
    @user-vw3qz6ii9s 10 หลายเดือนก่อน

    Amazing video. I'm wondering how can we scrape all the pictures for the product if they are rendered dynamically (like in a slideshow)

  • @alexdiaz4371
    @alexdiaz4371 2 ปีที่แล้ว

    can't install requests-html, any ideas? I'm using windows and the error jumps with lxml , tried to install lxml and got same error

  • @neginbabaiha9287
    @neginbabaiha9287 7 หลายเดือนก่อน

    Very clearly explained. May I ask if there is a GitHub repo containing the code that you used in the video?

  • @alaaabdullah2648
    @alaaabdullah2648 ปีที่แล้ว

    I am trying using website the data shown after write in input field , otherwise the html element empty what should I use? I am using pyautogui to fill the field but I don’t know how to read the data

  • @janlisowski5396
    @janlisowski5396 3 ปีที่แล้ว

    hmm the website I am trying to scrape returns status code 429... and I haven't even started scraping. Do you know what could be causing it?

  • @ssh6467
    @ssh6467 4 ปีที่แล้ว +1

    Thank you♥️♥️ you are BEST💪

  • @narjesatia
    @narjesatia 3 ปีที่แล้ว

    Hi john , trying to run the code , i got this error with render ; AttributeError: 'Future' object has no attribute 'html' ...any help please , didn't find in google .Thanks ,

  • @PhilipRhoadesP
    @PhilipRhoadesP ปีที่แล้ว

    Nice! - is there a way of doing this for the _currently displayed page_ ? - on a YT video page I want to scrape all the recommended videos and their titles from that page . .

  • @alessioturcoliveri9840
    @alessioturcoliveri9840 ปีที่แล้ว +1

    Hi John is it possible to parse the requests-html response with bs4? I've tried passing response.text when making a bs4 Soup but it returns None.
    Can somebody help me?

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว

      Hi, yes it is - I’m sure I’ve covered that before. It’s quite a useful method. Try printing the html before making the soup and check is it what you were expecting to see

  • @charisthawhite2793
    @charisthawhite2793 3 ปีที่แล้ว +2

    Hello John,
    if i add command r.html.render(sleep=1) the output be "Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.", i am anything on google, no clue, any idea?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      Hiya! Are you running it in a jupyter notebook or similar? The way they work conflicts with the render function - try running it in vs code or similar and that should work

    • @charisthawhite2793
      @charisthawhite2793 3 ปีที่แล้ว

      @@JohnWatsonRooney its running on vs code, but i got new error
      python .\coba.py
      Traceback (most recent call last):
      File ".\coba.py", line 19, in
      print(r.html.xpath("//div[@class='span6']/h1", first=True).text)
      AttributeError: 'NoneType' object has no attribute 'text', can you tell me where do i go wrong?

  • @benoitdefays578
    @benoitdefays578 3 ปีที่แล้ว +1

    Hi, first tanks a lot for your tutorial. I have a question, i generate my csv file , but my separator are ',' how can i change the separator ?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      Sure, after the csv file name, add in sep=“ “ and put in what separator you want to use

  • @momq112233
    @momq112233 4 ปีที่แล้ว +1

    nice video 👌 and keep going

  • @bunyaminsahiner9060
    @bunyaminsahiner9060 3 ปีที่แล้ว

    While trying to get product links on the category page of the site I work for, it also takes an extra 2 links I don't want for each product. How can I remove these links that I don't want or only one word exists in the links I want, how can I get links with only that word?

  • @rohangadgil4527
    @rohangadgil4527 ปีที่แล้ว

    With requests_html, when I print the soup, I am getting the message - you are not authorized... in the page html. I tried loading the page manually , it worked , so my IP isnt blocked. Can anyone help me with this.

  • @anirbanpatra3017
    @anirbanpatra3017 8 หลายเดือนก่อน

    Can You explain when should we use what??
    I generally prefer sticking to selenium for all my needs.

  • @justinames5439
    @justinames5439 2 ปีที่แล้ว +1

    John: when I follow your code, @ "for item in products.absolute_links:, although I specify, e.g. 'div.product-subtext', the iteration only returns the item.text, (the link text of item) and not the sub-text of the item. This is true of price, name, and so-forth. Can you explain this behavior?

  • @youcannotsaypopandforgetth7609
    @youcannotsaypopandforgetth7609 3 ปีที่แล้ว +1

    Hey john awesome video (like always). I have a question, in terms of speed would you recommend a splash or request_html?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +2

      I haven’t done any proper speed tests but they do essentially the same thing so I think it would be marginal. Requests-html has the benefit of being a python package so if that works for your needs I’d use that. Splash has the benefits of scripting though- video to come!

    • @youcannotsaypopandforgetth7609
      @youcannotsaypopandforgetth7609 3 ปีที่แล้ว

      @@JohnWatsonRooney Thanks, this helps so much.

  • @christinahachem6649
    @christinahachem6649 2 ปีที่แล้ว

    hello i'm having a chromium related error when i want to render an html page can you pls tell me how can i fix it?

  • @LogansRunnersVideo
    @LogansRunnersVideo 3 ปีที่แล้ว

    Trying to recreate on a similar e-commerce website and print(products) from 4:57 gives None type. Any suggestions why?

    • @tokoindependen7458
      @tokoindependen7458 3 ปีที่แล้ว

      Just print html source code, look if u looking out there

  • @thetransferaccount4586
    @thetransferaccount4586 ปีที่แล้ว

    good explanation

  • @christenw.1726
    @christenw.1726 ปีที่แล้ว +1

    Can a modified version of this work on scraping links listed inside a live chat feed?

    • @JohnWatsonRooney
      @JohnWatsonRooney  ปีที่แล้ว

      That’s not something I’ve tried but yes I think so

  • @mylordlucifer
    @mylordlucifer 2 ปีที่แล้ว +1

    thanks for learning

  • @richu-21
    @richu-21 3 ปีที่แล้ว +1

    How can we use threading while scraping thousands of website links?

  • @pinkypromisesx3
    @pinkypromisesx3 3 ปีที่แล้ว +2

    What's the difference between using requestes-html vs. scrapy or selenium?

    • @z.heisenberg
      @z.heisenberg 9 หลายเดือนก่อน

      selenium is a tool used for different purpose its by product is EXCELLENT ease in web scraping..its been 2 yrs though

  • @itsmehemant7
    @itsmehemant7 ปีที่แล้ว

    Hey john, After struggling with stackoverflow I am here finally..."response.html.render(sleep=3)" is giving error in django view (i .e There is no current event loop in thread 'uWSGIWorker1Core8') .....can you help me how to solve this??

  • @simpleffective186
    @simpleffective186 ปีที่แล้ว

    what can i do if the xpath search doesn't find anything?

  • @patweru7471
    @patweru7471 ปีที่แล้ว

    Good one, Any idea to do the same for laravel baes??

  • @GabrielMendes-jy4mp
    @GabrielMendes-jy4mp 3 ปีที่แล้ว

    John, I've done some code web-scraping dynamically like you in this video. But it's taking too much time because for every product it has to open its page. Is it common, is there a faster way for doing this?

    • @abel4776
      @abel4776 ปีที่แล้ว

      There's threading and async (which is confusing and difficult).

  • @johnmurray6405
    @johnmurray6405 2 ปีที่แล้ว +1

    I've followed you code to the tee. It locks up both at Pycharm and VScode at the render statement (r.html.render(sleep=1)). I literally have to close both programs to get them to run again. Any ideas? Great video though.

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว

      If it’s the first time running the render method it should download headless chrome - I’m guessing it’s getting stuck there. Maybe try removing requests_html and reinstalling it

  • @kooy2254
    @kooy2254 3 ปีที่แล้ว

    Hi John, I am one of your fans. I really wonder how did you learn these techniques before? I am currently in a status that don't know how to be a self-taught web scrapper. In other words, I don't know how to learn from a myriad of knowledges on the internet. But fortunately, I found you

  • @by_westy
    @by_westy ปีที่แล้ว

    i tried that in jupyter and it gave me this error message: **'Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.'**

  • @Dome8
    @Dome8 7 หลายเดือนก่อน

    You missed an explanation: what circumstances should you use xpath v div.?

  • @mohammadkhosrotabar5658
    @mohammadkhosrotabar5658 2 ปีที่แล้ว

    after use render I got this error: There is no current event loop in thread 'Thread-5 (process_request_thread)'

  • @leoyuanluo
    @leoyuanluo 3 ปีที่แล้ว +1

    you. are. awesome!

  • @kerteradih3721
    @kerteradih3721 11 หลายเดือนก่อน

    Could you do a solid for me, I’ve suffered trying to scrape this site

  • @sandilemfazi8624
    @sandilemfazi8624 2 ปีที่แล้ว +1

    Hey John, very helpful video, but I keep having this one issue when I try to render the url, I get this error message: RuntimeError: Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.

    • @alokyathiraj
      @alokyathiraj 2 ปีที่แล้ว

      Were you able to fix it? I'm having the same problem

    • @jaredspilky9699
      @jaredspilky9699 2 ปีที่แล้ว

      @@alokyathiraj Im having a similar issue as well

    • @donnaperyginathome
      @donnaperyginathome ปีที่แล้ว

      I can't get this to work either. I think maybe the library needs to be updated.

  • @abhilash93v
    @abhilash93v 3 ปีที่แล้ว +1

    Fantastic demonstration.Would love to know how can we use this module to submit forms or logins

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      Sure that’s a good idea , I will look into it

    • @abhilash93v
      @abhilash93v 3 ปีที่แล้ว

      @@JohnWatsonRooney Looking forward to it..Easing login efforts in flash enabled sites such as gmail or any.Any references now would be much helpful for me in my project!

  • @agsantiago22
    @agsantiago22 2 ปีที่แล้ว +1

    OMG! I would like to hit the "like" button a million times!

  • @surendratamang8848
    @surendratamang8848 4 ปีที่แล้ว +1

    Sir how will you deal with infinite scrolling if can't find easy

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว

      that's a bit more tricky without browser automation (selenium) we can use "r.html.render(url, sleep=1, scrolldown=x)" - where x is the ammount of times to page down. Not ideal but might work

  • @abhishekkamuni9971
    @abhishekkamuni9971 4 ปีที่แล้ว +1

    Nice video, but according to you which python webscraper takes less resources like memory etc?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว +1

      If the website is html (no JavaScript) requests and bs4 will be the lightest in my opinion. The method in this video is slower due to the render process but still good for smaller projects - selenium is the slowest and not really designed for scraping but does work when needed

    • @tokoindependen7458
      @tokoindependen7458 3 ปีที่แล้ว

      @@JohnWatsonRooney this information absolutely u must explain in single video, from fastest method and the slowest one, thx sir,

  • @olafecub
    @olafecub 3 ปีที่แล้ว

    Genial el video, no conocia opcion, normalmente usaba bs4

  • @nikhilsaikondapaneni6657
    @nikhilsaikondapaneni6657 4 หลายเดือนก่อน

    using render for first time i haven't been able to install any thing and its giving me error

  • @fsamobby
    @fsamobby 3 ปีที่แล้ว +2

    hi, i'm trying to retrieve the data (the list of employers /vacations initiated by jquery code) from the Canadian job bank, i made "get" request but won't be able to get inner response payload data www.jobbank.gc.ca/jobsearch/jobsearch?searchstring=&locationstring=&sort=M, i can see this payload in the firefox developer tool but failed to find the proper python library and methods to get it, is there any way other than selenium to accomplish this task? I am at the very beginning of the path of learning programming and would be grateful for any help or advice on what to read or watch to figure it out. thanks.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว +1

      I think you might need to use the same approach as my sports stats video - using postman to replicate the request made by your browser, the copy that over to your python code

    • @fsamobby
      @fsamobby 3 ปีที่แล้ว

      @@JohnWatsonRooney ok, ill try this out. thanks anyway)

  • @Nope-12485
    @Nope-12485 2 ปีที่แล้ว +1

    Nice video - minus the try/catch with no specific exception. I know this is a tutorial, but that’s a bad habit to share. Regardless, thank you for the content.

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 ปีที่แล้ว +1

      Thanks, and yes you are absolutely right, I don’t do that anymore!

  • @AlejandroKarlitos
    @AlejandroKarlitos 4 ปีที่แล้ว +1

    Thank Bro

  • @abhishekkamuni9971
    @abhishekkamuni9971 4 ปีที่แล้ว +2

    Can you login a website using requests-html?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว

      You can yes, you can post to the server - I have an older video on my channel where I cover the basics of this if you are interested

  • @stern7658
    @stern7658 2 หลายเดือนก่อน

    dude update this code I try to run the request_html but all it state is that it need chromium to work but the thing is I have chromium on my machine even the binary file too. why when I run it, it attempt to down chromium which I already have and fail to find it. I try this already a few months back now I retuen to the same I even uninstall and install everything but tge same problem.

  • @engineerbaaniya4846
    @engineerbaaniya4846 4 ปีที่แล้ว +1

    Awesome

  • @engineerbaaniya4846
    @engineerbaaniya4846 4 ปีที่แล้ว +4

    Amazing sir please keep posting videos like this we will help u to increase subscriber number

  • @linxx1184
    @linxx1184 2 ปีที่แล้ว

    Hi John, I watched this video many times, you're great at explaining. However, I am getting this error "Navigation Timeout Exceeded: 8000 ms exceeded" when r.html.render(sleep=1) I even bumped up the sleep time. Please help.

    • @pranit449
      @pranit449 2 ปีที่แล้ว +1

      Try using timeout=(number you'd like for more than 8s) instead of sleep. worked for me

    • @linxx1184
      @linxx1184 2 ปีที่แล้ว

      @@pranit449 thanks for the advice, it worked with timeout=30 and also added keep_page=True

    • @marco-3942
      @marco-3942 ปีที่แล้ว

      Hi , can you help me ??

  • @user-lj8cq1ki3w
    @user-lj8cq1ki3w ปีที่แล้ว

    can you do scrapping video for tracton gyan website?

  • @Neil4Speed
    @Neil4Speed 4 ปีที่แล้ว

    Hope you don't mind me asking but I have been banging my head against this one for a few hours... but I am trying to pick up only a specific url from a container (the container has non product URL's).
    :
    "
    from requests_html import HTMLSession
    import pandas as pd
    import time
    url = 'www.fragrancenet.com/fragrances'
    s = HTMLSession()
    r = s.get(url)
    r.html.render(sleep=1)
    products = r.html.xpath('//*[@id="resultSet"]', first=True)
    print(products.absolute_links)
    "
    I am only looking for the p-tags under Result set called:
    Any help would be super appreciated, thanks again John.

  • @artabra1019
    @artabra1019 4 ปีที่แล้ว +1

    what is better bs4 or html.xpath ???

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว

      learn to use both but generally if i can i use BS4

  • @dyegoborges9985
    @dyegoborges9985 3 ปีที่แล้ว +1

    it doesnt work with aliexpress

  • @papusa9878
    @papusa9878 2 ปีที่แล้ว +1

    Ohhhh nice I use an API that uses this method

  • @dickyindra4923
    @dickyindra4923 2 ปีที่แล้ว

    hi sir, can you fix this problem :
    AttributeError: 'NoneType' object has no attribute 'text'
    Thanks, btw nice vid

  • @itsmehemant7
    @itsmehemant7 ปีที่แล้ว +1

    oops...You are legend...........I am blind...This is also in docs on top layer 😂(I think I need some sleep)

  • @sinamobasheri3632
    @sinamobasheri3632 4 ปีที่แล้ว +1

    🖤👌🏻

  • @MrGarrincha11
    @MrGarrincha11 3 ปีที่แล้ว +1

    Hello, can you do scraping on this page : stats.nba.com/teams/transition/
    I want to compare playtype team1 percentile on offense (also the frequency) against team2 percentile on defense. can you help me, please?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 ปีที่แล้ว

      Hi! Yes I can scrape that site - I have a video coming this week that scrapes a site simliar that you will be able to apply to this site too. JR

    • @MrGarrincha11
      @MrGarrincha11 3 ปีที่แล้ว +1

      @@JohnWatsonRooney Great! Thank you for the really quick answer!

  • @jaydecanon1314
    @jaydecanon1314 11 หลายเดือนก่อน

    you shouldn't be john rooney, you should be john legend

  • @samuelricard3895
    @samuelricard3895 3 ปีที่แล้ว

    when I am trying to type r.html.render() I get this Unresolved attribute reference 'html' for class 'Response'

  • @ashishtiwari1912
    @ashishtiwari1912 3 ปีที่แล้ว

    Cider | 4.0% | 44 cl
    Trying this: info=r.html.find('div.Select an element with a CSS Selector:',first=True).text
    The output shows:AttributeError: 'NoneType' object has no attribute 'text'

    • @splashoui3760
      @splashoui3760 3 ปีที่แล้ว

      probably you choose your class incorrectly, that's why you have no elements in your output. Non type means you have no result( empty array).

  • @sydpao2224
    @sydpao2224 2 ปีที่แล้ว +1

    The accent, where are you from?

  • @saeeahmed5213
    @saeeahmed5213 ปีที่แล้ว

    🥰🥰🥰🥰

  • @barguybrady
    @barguybrady 4 ปีที่แล้ว

    So, when copy the Xpath, I get this as a result:

    • @barguybrady
      @barguybrady 4 ปีที่แล้ว

      /html/body/div[7]/div[4]/section/div[10]/div[3]/div[2]/div[2]/div[1]/ul[2]

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว

      Are you using chrome or Firefox? That looks like the “full xpath” option, as opposed to just the “xpath”. I am planning to do a video on xpaths to clear it up a bit more

    • @barguybrady
      @barguybrady 4 ปีที่แล้ว

      @@JohnWatsonRooney inspector in Firefox, which leads me to think, then, that there's a difference btw Chrome and Firefox ?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 ปีที่แล้ว

      There shouldn’t be but I have seen different results from both

  • @hammadrafique7313
    @hammadrafique7313 ปีที่แล้ว

    Great, but i took a lot of time for rendering

  • @meme_me
    @meme_me 2 ปีที่แล้ว +1

    used the same code and It didn't work for me, I changed the website to my desired one and I get a bunch of errors... :(

  • @signin7740
    @signin7740 3 ปีที่แล้ว +1

    Beerwulf is not a dynamic site....LOL

  • @CodeGlintHub-kn9fx
    @CodeGlintHub-kn9fx หลายเดือนก่อน

    Didn't you get any other site instead of bear website?
    Why are you promoting harmful things?

  • @msyahdan183
    @msyahdan183 ปีที่แล้ว

    i have a problem with this code produk = r.html.xpath('/html/body/div[4]/div[2]/div[2]/div[2]/div[1]/div/div[2]',first=True)...the result is None or []..how to fix it?

    • @RedSpark_
      @RedSpark_ ปีที่แล้ว

      I'm having the same problem, did you find a solution?
      Thanks