Golang web scraper with Colly | Tutorial

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 พ.ย. 2024

ความคิดเห็น • 63

  • @jatindersinghaujla
    @jatindersinghaujla 2 ปีที่แล้ว +1

    Thanks very useful video in very short time. It helps me to create web scraper in few minutes

    • @DivRhino
      @DivRhino  2 ปีที่แล้ว

      Thank you for stopping by! Glad the video was helpful 🙌

  • @DevBishwasBh
    @DevBishwasBh 3 ปีที่แล้ว +2

    Thanks a lot, sister. God bless you.

    • @DivRhino
      @DivRhino  3 ปีที่แล้ว +2

      Thank you for your kind comment! Hope you have a good day.

  • @iliatalebzade8751
    @iliatalebzade8751 2 ปีที่แล้ว +2

    Awesome content
    straight to the point explanation
    and LOVELY thumbnails lol
    thanks!

    • @DivRhino
      @DivRhino  2 ปีที่แล้ว +1

      Thank you for your lovely comment! Happy to have you here :)

  • @natanael4883
    @natanael4883 2 ปีที่แล้ว +1

    you helped me a lot!
    Greetings from Brazil!
    -Nate

    • @DivRhino
      @DivRhino  2 ปีที่แล้ว

      Thanks for your comment, Nate! So glad you found the video helpful! 🙌

  • @codingwithtony
    @codingwithtony ปีที่แล้ว +1

    Thank you from Vietnam

    • @DivRhino
      @DivRhino  ปีที่แล้ว

      Thank you for being here! 🙌🇻🇳

  • @brimmed
    @brimmed 3 ปีที่แล้ว +3

    Thank you. I'm trying to learn go and i finally had a real world thing i need to code up so started looking here after finishing the basic tutorials. Normally i would just whip something up in python as the pages i'm scraping is maybe 10 pages and trying to imput that date into json format to import into another software. hopefully i can get it working this weekend.

    • @DivRhino
      @DivRhino  3 ปีที่แล้ว +2

      Hello Shaun, hope you're able to pick up some tips from the video. :) The tutorial assumes that you're working with one page, but you have mentioned that your use case would have about 10 pages or so. The Colly docs have an example that scrapes nested pages, so hopefully it will be useful. Here's the link: github.com/gocolly/colly/blob/master/_examples/coursera_courses/coursera_courses.go . Good luck with your weekend project!

  • @isharmaharjan9814
    @isharmaharjan9814 4 ปีที่แล้ว +7

    This was so informative and understandable. Thank you!

    • @DivRhino
      @DivRhino  3 ปีที่แล้ว +2

      Glad it was helpful!

  • @hendrikbonthuys9190
    @hendrikbonthuys9190 4 ปีที่แล้ว +1

    Thanks for this useful video. I look forward to seeing more Go tutorials from you in future. Best, HB

    • @DivRhino
      @DivRhino  4 ปีที่แล้ว +2

      Thank you for the kind comment! Glad you found the video useful. :)

  • @AlexandrStepanov-y4g
    @AlexandrStepanov-y4g 4 ปีที่แล้ว +2

    Очень хорошая подача материала!🔥

  • @arenazo
    @arenazo 4 ปีที่แล้ว +3

    Thanks. This is useful!

  • @ashkanp1
    @ashkanp1 2 ปีที่แล้ว

    Useful tutorial 🌹🌹

    • @DivRhino
      @DivRhino  2 ปีที่แล้ว

      Thank you for your support 🙏❤️

  • @imteyazraja6138
    @imteyazraja6138 3 ปีที่แล้ว +2

    Wow simply great 👍

  • @utkarshagrawal6060
    @utkarshagrawal6060 3 ปีที่แล้ว +2

    Hey I am very new to Golang, just one question:-
    Why did you put collector. visitor at the end? First, we are making the request to it, then we are parsing it with a collector.OnHTM? So it should be at the top?
    Or it's just first, we are defining the structure of everything that what we need, what tag we need to parse and then we define the vision URL?

    • @DivRhino
      @DivRhino  3 ปีที่แล้ว +2

      Hello Utkarsh Agrawal, thanks for stopping by. And thanks for the question. The OnHTML method is registering a callback function, but it's not executing it. We are just setting it up. The Visit method starts the collecting job at the URL we have passed to it. It will also go and call the callback function we had registered by using OnHTML. Hope that helps.

  • @perceptiondigitale7446
    @perceptiondigitale7446 2 ปีที่แล้ว +1

    Bonjour, merci beaucoup, est ce que la façon de faire a changé en 2022 ?

    • @DivRhino
      @DivRhino  2 ปีที่แล้ว

      This should still work in 2022. If you run into any problems, please let me know in a comment here 🙏

  • @abiodun6897
    @abiodun6897 3 ปีที่แล้ว +2

    Thank you, do you think you can create a course for beginners?

    • @DivRhino
      @DivRhino  3 ปีที่แล้ว +2

      Hello SweepStakes, thank you for taking the time to leave a message. :)
      I believe there are so many great beginner Go courses out there, so I probably may not create one.
      If you're looking for a free course right here on TH-cam, you may want to check out th-cam.com/video/YS4e4q9oBaU/w-d-xo.html

    • @abiodun6897
      @abiodun6897 3 ปีที่แล้ว +2

      @@DivRhino thank you very much ❤️

    • @DivRhino
      @DivRhino  3 ปีที่แล้ว +2

      @@abiodun6897 You're welcome

  • @stefanbogdanovic590
    @stefanbogdanovic590 3 ปีที่แล้ว +2

    Amazing. Thank you. You are awesome.

    • @DivRhino
      @DivRhino  3 ปีที่แล้ว +2

      Thank you for your kind comment :)

  • @srttkx
    @srttkx 3 ปีที่แล้ว +2

    Do more videos like this 🔥🔥🔥

    • @DivRhino
      @DivRhino  3 ปีที่แล้ว +1

      Thank you for your comment!

  • @ammiekoenigstein875
    @ammiekoenigstein875 3 ปีที่แล้ว +2

    Hello, the tutorials are especially great,first of all, thank you for sharing;
    Chinese users can't access TH-cam channel in China due to network proxy problem, I want to share your tutorials to Chinese users, can I do that,?
    looking forward to your reply, thanks.

    • @DivRhino
      @DivRhino  3 ปีที่แล้ว +1

      Thank you for your comment. Unfortunately I’m not too sure how to help Chinese viewers access TH-cam. Outside of using a VPN, I do not have any ideas.

  • @AmitKumar-qv2or
    @AmitKumar-qv2or 3 ปีที่แล้ว +2

    mam can you say how to scrape data from a SPA site, which is built on vanillaJS and cant run on js disabled mode??? how to set headers and get the xht links from network tabs???
    thank you mam

    • @DivRhino
      @DivRhino  3 ปีที่แล้ว +1

      Hello. Thank you for your question. You might want to look into something like puppeteer to help you scrape an SPA. Hope that helps.

  • @myrachoantonio8832
    @myrachoantonio8832 3 ปีที่แล้ว +2

    thank you

    • @DivRhino
      @DivRhino  3 ปีที่แล้ว

      And thank you!

  • @hass89
    @hass89 4 ปีที่แล้ว +2

    Thanks Sir

    • @DivRhino
      @DivRhino  4 ปีที่แล้ว +1

      You're welcome, sir

    • @oceansblue692
      @oceansblue692 4 ปีที่แล้ว

      She is a lady, not Sir.

  • @abdurashidgaybullayev5170
    @abdurashidgaybullayev5170 ปีที่แล้ว

    do you have a video tutorial on authorization

    • @DivRhino
      @DivRhino  ปีที่แล้ว +1

      Hello there! Thank you for your question. I do not currently have any video tutorials on authorisation.

  • @thinkcrypto8643
    @thinkcrypto8643 4 ปีที่แล้ว +1

    hello, how would i extract specified data from a website and store in Excel

    • @DivRhino
      @DivRhino  4 ปีที่แล้ว +1

      Hi there. Excel is able to read CSV files, so one way you could do it would be:
      1. convert the JSON to CSV (using a package like json2csv)
      2. then import the converted CSV file into Excel
      Hope this helps.

  • @amitgupta4in
    @amitgupta4in 4 ปีที่แล้ว +2

    good

  • @adithyarajendran2930
    @adithyarajendran2930 3 ปีที่แล้ว

    helo mam , i have doubt in making a web scrapper and write it to csv file ..is there any way i can message you privetly..it would be of great help for me

    • @DivRhino
      @DivRhino  3 ปีที่แล้ว

      Hello Adithya, Go has an “encoding/csv” package you can check out.
      pkg.go.dev/encoding/csv
      I found this example that you can also take a look at here:
      golangdocs.com/reading-and-writing-csv-files-in-golang

    • @adithyarajendran2930
      @adithyarajendran2930 3 ปีที่แล้ว

      @@DivRhino Can I MSG u personally mam it is a simple question but I don't know how to do..it would be great pleasure if u help ❤️

    • @DivRhino
      @DivRhino  3 ปีที่แล้ว

      @@adithyarajendran2930 I am not currently able to provide assistance on an individual basis. However, I've added a couple of links above that could be helpful.
      One tip would be to complete the tutorial and then replace the JSON parts with CSV.

    • @adithyarajendran2930
      @adithyarajendran2930 3 ปีที่แล้ว

      @@DivRhino ok mam will do it thank you

  • @oceansblue692
    @oceansblue692 4 ปีที่แล้ว +2

    Great Video, except i am running into errors that not too sure how to resolve.
    1= .\main.go:25:4: undefined: log
    collector.OnHTML(".factList li", func(element *colly.HTMLElement){
    factID, err := strconv.Atoi(element.Attr("id"))
    if err != nil {
    log.Println("Could not get id")
    2= .\main.go:31:8: undefined: factId
    factDesc := element.Text
    fact := Fact{
    ID: factId,
    Description: factDesc,
    }
    Hope i can hear back from you.

    • @oceansblue692
      @oceansblue692 4 ปีที่แล้ว

      Resolved. I needed to :
      include the log in the import
      correct a typo
      and install GiT

    • @DivRhino
      @DivRhino  4 ปีที่แล้ว

      Hello there, could you put your code in a gist or git repo? Might be easier to help you out that way :)

    • @DivRhino
      @DivRhino  4 ปีที่แล้ว +1

      For the first error, can you check to see if you imported the "log" package?
      For the second error, you declared it as "factID" (with a capital ID) but you called it using "factId" (small letter 'd')

    • @oceansblue692
      @oceansblue692 4 ปีที่แล้ว

      @@DivRhino So glad to hear back from you.
      github.com/OceansBlue2017/GoLang/blob/main/main
      Much Appreciated.

    • @DivRhino
      @DivRhino  4 ปีที่แล้ว

      @@oceansblue692 Hello there, you have accidentally misspelled "factretriever" in your AllowedDomains on line 21. Let me know if that fixes it

  • @LEKIPE1
    @LEKIPE1 2 ปีที่แล้ว

    How to not overwrite the json file content each time

    • @DivRhino
      @DivRhino  2 ปีที่แล้ว +1

      Hey there, you can avoid overwriting the file by making sure the filename is different every time. For example, you can consider appending a timestamp to the JSON filename.

  • @LEKIPE1
    @LEKIPE1 2 ปีที่แล้ว

    how to scrape if there is no ID

    • @DivRhino
      @DivRhino  2 ปีที่แล้ว +1

      Hi there. You can determine the structure of your struct based on the shape of the data you will be scraping. So you can inspect the HTML to see the structure.