GPT scrapes + answers from any sites (ft. Chromadb, Trafilatura)

Samuel Chan

มุมมอง 6 611

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 20 ต.ค. 2024

ความคิดเห็น • 24

@TheShreyas10 18 วันที่ผ่านมา
Hey can you please share the repo, I cant find it on your git
@arnaudlacour1188 ปีที่แล้ว ⁺²
when I try this exact thing I get an error that GPTChromaIndex is not in llama_index, can you think of a reason why?
@SamuelChan ปีที่แล้ว ⁺¹
Yes! When this lesson was published the latest version of LlamaIndex was 0.5.7.
2 months later it’s now 0.6.x.
So you can downgrade to the 0.5.7 version to follow along or just use a new environment and then pip install -r requirements.txt from the GitHub repo.
I’m in the middle of upgrading the codebase to the latest version but admittedly have limited time between my day job, so we’ll see! :)
@arnaudlacour1188 ปีที่แล้ว
@@SamuelChan very awesome of you to reply so quickly! Much appreciated, thank you!
@MrNootka ปีที่แล้ว ⁺¹
love your tutorials thanks!
Tip: if you could please make your cam smaller & circular would be a great upgrade to your videos :)
@SamuelChan ปีที่แล้ว
Good tip! And relatively easy to implement! Thank you! :)
@SivaKumar-of7mu 15 วันที่ผ่านมา
I also cant find repo on your git
@moreshk ปีที่แล้ว
Another great video in this series!
@SamuelChan ปีที่แล้ว
Thank you! 🙏🏼
@noualiibrahimyassine1336 ปีที่แล้ว
Great tutorial, thank you.
Question: in my terminal window i'm getting only question/answer, i'm not getting the other additional informations like llm token usage, sentenceTransformer, pytorch device, etc... How can i get those informations ?
@SamuelChan ปีที่แล้ว
Thank you!
You can do logging many different ways and I showed them in many videos later on in this series. For example, in the "building a GPT-powered journal system"
th-cam.com/video/OzDhJOR5IfQ/w-d-xo.htmlsi=SZXzbH1hLeJ0QFzH
I use the following technique to wrap the returned results.
import logging
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
LangChain also has its own tracking utilities:
from langchain.callbacks import get_openai_callback
with get_openai_callback() as cb:
result = llm("Your query")
print(cb)
The context manager (cb) that is printed returns:
Tokens Used: 42
Prompt Tokens: 4
Completion Tokens: 38
Successful Requests: 1
Total Cost (USD): $0.00084
@noualiibrahimyassine1336 ปีที่แล้ว
@@SamuelChan Thank you !
@utkarshpandey8967 ปีที่แล้ว
I am not able to use GPT ChromaIndex in python 3.10 can you suggest an alternative for this
@SamuelChan ปีที่แล้ว
What does "not able to use" means? did you fork from the github repo? if you install the dependencies it will work with python 3.10 (and I try to keep it up to date with every major version update of LangChain and LlamaIndex) -- cant see any reason why it wont work.
@llmia-n2x ปีที่แล้ว ⁺¹
Please can you make similar video with open source (free) LLM ?
@SamuelChan ปีที่แล้ว ⁺¹
LangChain & LLM tutorials (ft. gpt3, chatgpt, llamaindex, chroma)
th-cam.com/play/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS.html
I have a lot of videos where I use open source LLMs from huggingface. I also have a video that shows how to use a locally-hosted LLM on your machine! Check out the playlist above! :)
@llmia-n2x ปีที่แล้ว ⁺¹
@@SamuelChan Thanks à lot. I'll check
@8eck ปีที่แล้ว
This Trafilatura is able to read javascript websites? I mean, can it read react-based websites?
@SamuelChan ปีที่แล้ว
Depends on whether the react side uses SSG (static site generation), SSR (server side rendering) or CSR (client), it works like any other web crawler / scraper :)
@8eck ปีที่แล้ว
@@SamuelChan naah, i was talking exactly about non SSR or static generated.
@8eck ปีที่แล้ว
@@SamuelChan Guess it can read only non-js content.
@SamuelChan ปีที่แล้ว ⁺¹
Yeah not with Trafilatura I don’t think
I think for those cases you can use an automation tool like Selenium to do a wait, wait for 1 second till content has loaded, and then retrieve. If div id not found, wait another 1 second etc in a while loop with break statement?
@ramp2011 ปีที่แล้ว
Thank you for the video. I just checked your github and I do not see the code copied over. Could you please copy over this code there? Thank you
@SamuelChan ปีที่แล้ว
Hey its here in the GitHub repo!
github.com/onlyphantom/llm-python/blob/main/6_team.py

ต่อไป

เล่นอัตโนมัติ

Locally-hosted, offline LLM w/LlamaIndex + OPT (open source, instruction-tuning LLM)