In my experience it does a pretty job, specially if you pick up some good fine-tuned model to generate structured data, like what Groq fine tuned and its available in HF,
In many cases, scraping data that is publicly accessible on the internet is considered legal. However, the purpose of using a crawler is not just to scrape other people's websites. Now, many website owners, online magazines, bloggers, TH-camrs, and even individuals with an Instagram account want to crawl their own information. Enterprise companies, having many private or public websites, forum or similar sources that they use the help of crawlers that can extract all the data. For instance, they use their own fine-tuned large language models internally, and they allow other companies to use their trawlers. As a result, they need an open-source engine to use for data enrichment and running their own servers. I believe that everybody should be able to have their own language model fine-tuned by their own personal data to democratize AI. That means everybody should be able to extract the information they have in a proper way. In the not-so-distant future, I envision everyone being able to crawl through their own messages from social media accounts and emails, and have control over them. What they do with that information is up to their own units at their own stages. This is one of the most interesting uses of trawlers.
I wonder whats' untitled 1 to 34? do you have access to AGI???
The problem at the end was that it was array of objects and it appeared as multiple objects without enclosing square brackets
man i cant figure this out always some IndentationError or SyntaxError: incomplete input
Can it scrape all pages from the website?
Is llama 3.2 eb model running locally enough to do the scrapping? How to do it?
In my experience it does a pretty job, specially if you pick up some good fine-tuned model to generate structured data, like what Groq fine tuned and its available in HF,
Hi bro could you please upload it using local models or if you can provide any links also helpful, Thanks.
I'll try to put together something bro!
@1littlecoder thanks bro once uploaded that would be helpful for many students 😁
Nice, for ollama llm, how can we setup on craw4ai, any tutorial?
Will work on it soon!
@@1littlecoder Thanks, looking forward
It simply support ollama, in the `provider` property you just pass `ollama/MODEL_NAME`
Can this method crawl all the information on the website?
I was thinking the same but it seems is mainly to build datasets.
@@d.d.z. yup
Yes you can do it! Just pass in the field names and how you want them back
Is it legal to crawl?
In many cases, scraping data that is publicly accessible on the internet is considered legal. However, the purpose of using a crawler is not just to scrape other people's websites. Now, many website owners, online magazines, bloggers, TH-camrs, and even individuals with an Instagram account want to crawl their own information. Enterprise companies, having many private or public websites, forum or similar sources that they use the help of crawlers that can extract all the data. For instance, they use their own fine-tuned large language models internally, and they allow other companies to use their trawlers. As a result, they need an open-source engine to use for data enrichment and running their own servers. I believe that everybody should be able to have their own language model fine-tuned by their own personal data to democratize AI. That means everybody should be able to extract the information they have in a proper way. In the not-so-distant future, I envision everyone being able to crawl through their own messages from social media accounts and emails, and have control over them. What they do with that information is up to their own units at their own stages. This is one of the most interesting uses of trawlers.
Nice and easy.
❤
🔥