Helllllll yes! I just had this idea to upload all my documents to gpt to make an assistant, then realized I could do this with websites and found you! Learning SO MUCH CODING coming from an art background and having a blast! You described it perfectly for a noob like me :)
Finally I found someone who can explain it all to me. Thank you for this video and the scraper! Also, stick with AI and your channel will blow up. You already have some momentum, just add some tags for AI, and I also like to put the title first in the description and then as the first tag, it seemed to really help my other channel Proshaper. You already have great engagement and the thumbnails are good, stick with this!
No. The Private LLM would need to have a context window sufficiently large to handle the input, and a mechanism for parsing and making use of it (OpenAI automatically chunks up and indexes the input behind the scenes). Would GDPR be a concern if you're scraping non PII-data though?
Thank you for the response! I would like to build a private repo that can handle PII and other sensitive data (such as agreements with customers). I have some code locally in python that can break down PDFs into small text chunks and that also removes (should at least) PII data. Im not super familiar with python and it would be interesting to deepen my knowledge with node as I mainly code with typescript I know that we have tried out some local models internally and I believe that we are building our own private model as well. Its always interesting to hear yours and others thoughts for inspiration@@ralfelfving
Hey, i like the video thanks for sharing. I have a question and concern about one topic. I did the same process you did and got it. But im thinking that responses from the GPT is so slow for embedding this to a website chatbot. Is there any way to make it faster?
hi Ralf can this answer the client if i attach it to a telegram bot to answer the client wtih the corrospondin url if found for example if the client ask to give them the aritcles about the some object in my website the bot will give them the links for that
Helllllll yes! I just had this idea to upload all my documents to gpt to make an assistant, then realized I could do this with websites and found you! Learning SO MUCH CODING coming from an art background and having a blast! You described it perfectly for a noob like me :)
It's because I'm a noob like you :) Glad it helped!
Finally I found someone who can explain it all to me. Thank you for this video and the scraper!
Also, stick with AI and your channel will blow up. You already have some momentum, just add some tags for AI, and I also like to put the title first in the description and then as the first tag, it seemed to really help my other channel Proshaper. You already have great engagement and the thumbnails are good, stick with this!
Thanks for all the love, and the tips. I've heard tags don't do much for the algorithms but I suppose it doesn't hurt to try!
Love this Ralf! Very helpful
Glad you liked it Jack!
Thank you for the video! Have you looked at any GDPR compliant options/ private LLMs that could be used in combination with this tutorial?
No. The Private LLM would need to have a context window sufficiently large to handle the input, and a mechanism for parsing and making use of it (OpenAI automatically chunks up and indexes the input behind the scenes).
Would GDPR be a concern if you're scraping non PII-data though?
Thank you for the response! I would like to build a private repo that can handle PII and other sensitive data (such as agreements with customers). I have some code locally in python that can break down PDFs into small text chunks and that also removes (should at least) PII data. Im not super familiar with python and it would be interesting to deepen my knowledge with node as I mainly code with typescript
I know that we have tried out some local models internally and I believe that we are building our own private model as well. Its always interesting to hear yours and others thoughts for inspiration@@ralfelfving
Hey, i like the video thanks for sharing.
I have a question and concern about one topic. I did the same process you did and got it. But im thinking that responses from the GPT is so slow for embedding this to a website chatbot. Is there any way to make it faster?
hi Ralf can this answer the client if i attach it to a telegram bot to answer the client wtih the corrospondin url if found for example if the client ask to give them the aritcles about the some object in my website the bot will give them the links for that
Very helpfull, thank you!!!
Glad it helped, and thanks for letting me know! :)
Hi, the code is working but i am only getting 22 total bytes. even after using your example and other sites. what could be the error ?
Did you check that the site has a sitemap.xml? If it doesn't, it work.
I used the same site you used @@ralfelfving
node sitemapscraper.js
22 total bytes
Archiver has been finalized and the output file descriptor has closed.
Can this work on a website with pagination?
If the paginated pages appear in the sitemap, yes. Go to the websites sitemap and check if they're there.
Great! thank you for your answer@@ralfelfving
This was cool.
Thanks!