Great stuff Steve, really enjoying your content. What route would you advise to go from an unstructured individual pdf to a structured json output with the help of an api?
Amazing - thank you for making this available to everyone! Have you experimented with a different approach like creating separate text files for each URL? I've seen a few developers say it helps the GPT or Assistant return more accurate/relevant results compared to one large file.
Hey Steve, question for you: does the crawler also create a log file so that I can go look and see which URL's had a crawling error? Or is there a way to view my console log files in general?
This is great, thanks! So how do I find the correct selector for my page? I am inspecting the page and trying out some names for the selector I think is correct, but is does seem to be because I am getting errors. I am inspecting your site to see what you take as selector. When I try that "div" name under "main" as you do, it does seem to work for me for other pages. Any ideas? Thanks
Wondering if there is some kind of tool similar to this that we could potentially crawl thousands of client email and awnser that will help to build an nosql database to be query by an ai..
@Steve8708 - thank you for this great video. I am trying to use this for a website that is not public. How can I customize the crawler to log into a site and crawl non-public information? The crawler shut down with this message - Reclaiming failed request back to the list or queue. page.title: Execution context was destroyed, most likely because of a navigation. The output-1.json file had just one entry with the following html" Sign in Can’t access your account?
Try it yourself: www.builder.io/blog/custom-gpt
Non-stop top shelf content, thanks Steve.
Next up, how to build a top shelf
This is sick!!! Been wanting this since GPT came out
I just tried it. It works like a charm. Amazing work!
Nice video! Thanks for making these awesome videos
Great stuff Steve, really enjoying your content. What route would you advise to go from an unstructured individual pdf to a structured json output with the help of an api?
Awesome stuff!
Amazing - thank you for making this available to everyone! Have you experimented with a different approach like creating separate text files for each URL? I've seen a few developers say it helps the GPT or Assistant return more accurate/relevant results compared to one large file.
Fast and clear!
Legend! Thanks so much
Thanks it worked! This is awesome🎉
You are amazing, thank you so much
❤ thanks for the tip
Thats great
Hey Steve, question for you: does the crawler also create a log file so that I can go look and see which URL's had a crawling error? Or is there a way to view my console log files in general?
Man you are so generous
would love to include images so the GPT can look at screengrabs in the docs
Yep, this bangs.
How do you know what to put for the selector?
i tried to crawl laravel documentation, but i dont know what to fill in selector
This is the sauce right here.
do you think you can help me with setting up something similar for a unique use case?
Would like to integrate this open ai created assistant to my website. Any specific video or article to explore ???
This is great, thanks! So how do I find the correct selector for my page? I am inspecting the page and trying out some names for the selector I think is correct, but is does seem to be because I am getting errors. I am inspecting your site to see what you take as selector. When I try that "div" name under "main" as you do, it does seem to work for me for other pages. Any ideas? Thanks
I'm having the same problem.
same@@vegangaymerplays
Does this work on sites that have pdf knowledge base? Or only scrapes the body html?
Wondering if there is some kind of tool similar to this that we could potentially crawl thousands of client email and awnser that will help to build an nosql database to be query by an ai..
How does this work with links locked behind auth?
what if the website needs authorization? is it possible? Thank you
@Steve8708 - thank you for this great video. I am trying to use this for a website that is not public. How can I customize the crawler to log into a site and crawl non-public information? The crawler shut down with this message - Reclaiming failed request back to the list or queue. page.title: Execution context was destroyed, most likely because of a navigation. The output-1.json file had just one entry with the following html" Sign in
Can’t access your account?
Terms of use Privacy & cookies ..."
Does it handle docs behind auth?
It uses a headless browser so can be customized to do that
Is it possible to prevent gpt to use this information that we supply? my company wouldn't want to make the information public for example
To my knowledge yes. While it will be shared with OpenAI, they say they don’t use any of the data for training purposes
What is the difference between doing this, and just creating a custom gpt that can scrape a site directly? GPT builder can search the web!
GPT Builder can’t crawl a specific website.
@@mrloofer72 But for what use cases i can use this?
Not a very good scraper if it has to rely on finding the selector for the website first.
is there any scraper without manually select dom?
adding Next.js 14 + shadcn... Hhhhh