This Isn't Just A Chatbot (OpenAI Should Be Scared...)

Theo - t3․gg

มุมมอง 122 574

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 19 ก.พ. 2024
I heard nvidia was doing some chat bot stuff, but Chat With RTX ended up being much more interesting than I expected. Retrieval-augmented generation (RAG) is a fascinating new technique and I'm curious how we see it adopted over time. Compared to ChatGPT and ollama, this is very different.
"insert statement about Tensorflow for SEO reasons here"
Sources:
www.nvidia.com/en-us/ai-on-rt...
github.com/NVIDIA/trt-llm-rag...
github.com/ollama/ollama
Check out my Twitch, Twitter, Discord more at t3.gg
S/O Ph4se0n3 for the awesome edit 🙏
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 251

@yowwn8614 3 หลายเดือนก่อน ⁺⁴⁹¹
That start with linus is makes my heart drop.
@H-Root 3 หลายเดือนก่อน ⁺²²
Same point 😂
When I saw him I thought I clicked the wrong notification for a split second
@berkeozkir 3 หลายเดือนก่อน ⁺¹
Same! Wasn't expecting that.
@danielfernandes1010 3 หลายเดือนก่อน ⁺²
Wow somehow I missed it lol
@MelroyvandenBerg 3 หลายเดือนก่อน
hahhaha.. there we go Linus.
@blacktear5197 3 หลายเดือนก่อน
Jajajaja a ver si os enteráis que el OpenSource es una de los mayores engaños de las multinacionales para que trabajéis a cambio de una camiseta
@MrSofazocker 3 หลายเดือนก่อน ⁺²⁴¹
ONLY addition, it HAS to support markdown.
Imagine, just setting this to your obsidian vaults folder path and boom, you can chat with your second brain 🤯
@Sanchuniathon384 3 หลายเดือนก่อน ⁺³⁴
I need this yesterday
@personinousapraham3082 3 หลายเดือนก่อน ⁺¹¹
The way these models work means it almost definitely already does to some extent, since there a decent amount of markdown in the training process for the models used for both generation and document embedding (making the search part work). At that point it's mostly just a matter of prompt tuning
@z_0968 3 หลายเดือนก่อน ⁺⁹
What if I told you that already exists?
- There is an Ollama plugin to chat with your notes. Although this is more single note specific, it's free.
- There is also a plugin called Smart Connections, but you need a OpenAI API key for that. This one is note global, it will create embeddables (vector files created from your notes) from your notes. And then you can chat with all your notes.
@cabaucom376 3 หลายเดือนก่อน ⁺⁷
So I’m working on this… anyone who sees this comment what are some must have’s for a local first AI knowledge vault?
@carvierdotdev 3 หลายเดือนก่อน ⁺¹
❤ oh man that's actually a very good thinking..
@hugazo 3 หลายเดือนก่อน ⁺¹⁷⁶
Finally one use for my 4080 that doesn't involve crying trying to play cities skylines 2
@RT-. 3 หลายเดือนก่อน ⁺²
Wut? It lags on that game?
@Rock48100 3 หลายเดือนก่อน ⁺¹⁴
@@RT-. Yeah that game is beyond a mess
@hugazo 3 หลายเดือนก่อน ⁺¹
The game is broken@@RT-.
@GetUrFunnyUp 3 หลายเดือนก่อน ⁺³
Isnt that a cpu based game
@oo--7714 3 หลายเดือนก่อน ⁺¹
@@GetUrFunnyUpyes
@YomiTosh 3 หลายเดือนก่อน ⁺⁸³
Hey Theo, just wanted to point out a few inconsistencies. RAG doesn’t train a model, it indexes the text files in a vector database and uses word similarity to look up relevant text. So the model, such as llama2 or mistral is unchanged but it is able to add context and make the retrieved text more conversational.
There are loads of great AI/RAG projects other than Ollama out in the git seas too. Many not quite as simple or easy to use though.
Thanks for all the great videos. Already subscribed ;)
@cryptogenik 3 หลายเดือนก่อน ⁺³
Came here to say this, and also - your bias is showing :P
@rendezone 3 หลายเดือนก่อน ⁺⁴
Shamelessly mentioning here the product/startup I work for: Qdrant is an open source vector database which excels at RAG setups. I created the JS SDK :P which offers fully-typed REST and gRCP clients
@lukeweston1234 3 หลายเดือนก่อน
@@rendezone What does it offer that PGVector for instance does not?
@martinkrueger937 3 หลายเดือนก่อน
@YomiTosh by any chance do you know which RAG system/framework is giving out the best performance?
@MrLenell16 3 หลายเดือนก่อน ⁺¹⁴⁰
It's not training the model just doing RAG. Retrieval is basically querying for relevant docs based on semantic similarity basically doing a sql query which a vectors in the where clause
@poipoi300 3 หลายเดือนก่อน ⁺¹²
Yep thanks for that, was about to comment something similar. Words are coordinates, yo.
@Imp0ssibleBG 3 หลายเดือนก่อน ⁺²
Is it done by searching to the nearest/closest embedding?
@LookRainy 3 หลายเดือนก่อน ⁺¹
@@Imp0ssibleBGpretty much. But there are other different approaches for doing RAG
@hunterkauffman9400 3 หลายเดือนก่อน ⁺⁴
simplest technique is the cosine similarity between the query and each document chunk.
@joreilly 3 หลายเดือนก่อน ⁺¹
But is this the best example of 'kind of' training your own model? There's a race right now for people to train models on private or proprietary data. RAG seems to be the most practical solution so far even though it's not perfect. Or am I wrong about this?
@rusyaidimusa2309 3 หลายเดือนก่อน ⁺³²
I would consider giving a shoutout to the llamacpp project that serves as the backend engine to many of the open source programs like Ollama, and the many many talented engineers who brought support to so many different systems configurations.
The open source scene has been on fire since Llama dropped and running models locally has never been easier.
@tuckerbeauchamp8192 3 หลายเดือนก่อน ⁺⁴
Oh man, was not ready for that intro. I love LTT and your channel, that was a great little combination
@medalikhaled 3 หลายเดือนก่อน ⁺¹³
they are not directly using svelte, they are using a project OSS project called Gradio for the UI which uses svelte
@ofadiman 3 หลายเดือนก่อน ⁺⁹
Finally 🎉 I couldn't wait any longer for ray tracing support in my chat bot GUI
@DaniDipp 3 หลายเดือนก่อน ⁺³⁵
This is huge for my wiki. I can just give it a directory of markdown files. 🤯
@hugazo 3 หลายเดือนก่อน ⁺⁵
Better search in docs, i would add my frameworks/libraries documentations as well
@Sindoku 3 หลายเดือนก่อน
I’ll definitely be checking this out this weekend when I don’t have to work. This looks bad ass!
@Petyr25 3 หลายเดือนก่อน ⁺²
Feedback: Superb video, more AI stuff from you would be great. Specially with open source stuff with our own data.
@GeorgeG-is6ov 2 หลายเดือนก่อน ⁺¹
If you don't want to use it because it's so large, get ollama and you can run it on your command prompt, I recommend watching a tutorial on it, and there's models as little as 1.8 gb (For example, phi 2, which is small yet very powerful)
@DNA912 3 หลายเดือนก่อน ⁺³
While watching this I started to realise how huge usage I would have with this at work. The project I'm in has a huge documentation, but everything is just a brain dump, and there have many times happened that we've found something "new" in the docs that we completely had missed before. Imma work on making an AI on that dataset asap. I love experimenting with AI's locally, it so fun and it feels so much cooler and better then the cloud once
@SenorRobinHood 3 หลายเดือนก่อน ⁺⁶
Would this work on the codebase for a library? For example inputting a freshly downloaded wordpress directory and then also digesting the wordpresss developer docs to make it your private Q&A tutor for platform you're trying to learn?
@Al-Storm 2 หลายเดือนก่อน ⁺¹
Yes
@aloufin 3 หลายเดือนก่อน
Yes, your explanation of RAG was very nice and easy to understand
@adam_k99 3 หลายเดือนก่อน ⁺¹
Good stuff! Could you make a video on how well it performs as a coding assistant?
@user-pc8vn6ym7r 3 หลายเดือนก่อน ⁺⁴
I have to admit, that is the MOST creative L&S I've ever seen on here. And I normally swear at the screen in response.
Maybe.
@MightyDantheman 2 หลายเดือนก่อน ⁺¹
This is pretty cool, though I'm still waiting for the day that I can use at least GPT-4 level AI locally *(and ideally either for free or a one-time payment for one single version).* Sadly, I doubt this will ever happen outside of opensource projects, which tend to not be as good due to less funding and resources. But I still appreciate any effort put towards that future.
@E-Juice 3 หลายเดือนก่อน ⁺²
I make 2 points here. 1 questioning the accuracy of this system 2 why windows
1. what's interesting about downloading youtube video transcripts and using those files at 7:40 is that nvidia's setup is MOST LIKELY using their own ASR (Automatic Speech Recognition) model, either Canary or Parakeet, which i've tested and found that theyre good but still not as accurate as open ai's Whisper ASR model. So without knowing what specific model is used to transcribe the youtube videos, we don't know how exact those transcriptions are, so that affects how well this RAG can asnwer questions using that data. I would reccommend using Whisper-Large-v3 and manually transcribing the youtube videos, or just uploading actual documents and notes and testing them rather than transcribing youtube videos.
2. you dont reccommend using WSL but you didnt elaborate. what is the best alternative? installing linux locally or using a cloud workstation? dont say mac because they dont come with nvidia gpu
@arnaudlelong2342 3 หลายเดือนก่อน ⁺¹
You and Prime need to get with this soon
@jzeltman 3 หลายเดือนก่อน
Would love to see more AI content. Great look into this new release from NVIDIA
@brett_rose 2 หลายเดือนก่อน
I did something similar with Pinecone. I parsed a huge chuck of wiki data into a Pinecone DB. I then would use one model prompt which would return multiple pieces of data based on the prompt. That model would then decide which pieces of outside information were the most related to the prompt. It would then send the original prompt along with the external data to a new model prompt which then would provide the response to the user.
@Al-Storm 2 หลายเดือนก่อน ⁺¹
WSL2 works surprisingly well. I've been using it on one of my machines for SD, llama, and mixtral.
@Falkov 3 หลายเดือนก่อน
Good stuff..liked and subbed.
@niteshbaskaran2262 3 หลายเดือนก่อน
If all of the python docs were fed to an LLM model, would you use query that LLM model or still refer to the original docs?
@aloufin 3 หลายเดือนก่อน
I keep thinking about this video.... RAG is showing up on my timeline on twitter everywhere... I would have had to spend HOURS trying to understand it... let alone realise I could download NVIDIAs demo and run it on my GPU..... Your videos are amazing to understand huge swathes of new AI tech easily... not to mention actually show working tech demos.
@unowenwasholo 3 หลายเดือนก่อน ⁺¹
This is like ControlNet for LLMs. Dope.
@TrimutiusToo 3 หลายเดือนก่อน ⁺³
Deeper, go even deeper!!!
@nothingtoseehere5760 3 หลายเดือนก่อน ⁺²
NOT DEEP ENOUGH! MOAR PLZ!
@SkyyySi 3 หลายเดือนก่อน ⁺¹
The app isn't made with Svelte, but with Gradio. Gradio is a Python library for creating web UIs for ML applications. Gradio, however, uses Svelte and Tailwind internally.
@exapsy 2 หลายเดือนก่อน
00:03 and i already saw linus dropping not just something, a graphics card.
Instant like! xD
@arianj2863 3 หลายเดือนก่อน
Could you make a small RAG project :-)?
Or do you have a channel who is like the theo of open source LLMs?
@chaks2432 2 หลายเดือนก่อน ⁺¹
This kind of stuff would be a lifesaver if it manages to work as an AI powered chatbot for documentation for proprietary frameworks and stuff. I'm working at a startup and we're building our own framework from scratch, so having RTX Chat work as an AI documentation assistant would be great
@Tymon0000 3 หลายเดือนก่อน ⁺⁴
This will be huge when the ai will be capable of parsing a whole project and multiple docs.
@pencilcheck 3 หลายเดือนก่อน ⁺¹
ahhh, imagine parsing and generating tests that makes sense based on prompts ::OOOO
@RisingPhoenix96 3 หลายเดือนก่อน ⁺³
2:19 The TH-cam algorithm recommends me your videos frequently. Is there any real benefit I get from subscribing if I'm going to watch your videos and see all your community posts anyway?
@Gocunt 2 หลายเดือนก่อน
what if i don't subscribe to anyone because i just don't want to be subscribed to a bunch of random channels? ai doesn't understand
@sarjannarwan6896 3 หลายเดือนก่อน ⁺¹
Rags are cool, they can use vector databases to map to data.
@sadshed4585 3 หลายเดือนก่อน ⁺¹
what is vram for RAG or their version of it
@creatortray 3 หลายเดือนก่อน
I love it! I find this stuff fascinating
@eointolster 3 หลายเดือนก่อน
What’s the advantage over privateer that’s been out for months where you can choose your own model and it is tiny in comparison?
@kennypitts4829 26 วันที่ผ่านมา
Lawyer: AI, find precedence to get my client off the hook for drunk in public.
AI: Beep, bop... Bort - Say it was diabetes related.
@bugged1212 3 หลายเดือนก่อน
Already been using this, I have also wired up ollama to serve multiple requests and I run a business off it now.
@pixma140 3 หลายเดือนก่อน ⁺³
Nice thing what Nvidia did there :) Do you want to share the two TH-cam Playlists in the comments or description maybe? :D
@riftsassassin8954 3 หลายเดือนก่อน
Definitely on team AI deep dive!
@blenderpanzi 3 หลายเดือนก่อน ⁺³
Point it to a playlist of Jonathan Blow videos and then tell it JavaScript is the best language and ask when will Jai have LSP support? Can a LLM can get an aneurysm?
@banalMinuta 3 หลายเดือนก่อน
I would definitely appreciate more AI content as somebody just getting into web development.
It seems pretty apparent that AI is going to unleash a new category of tools, one whose mastery will most likely be paramount to ones success
@red9090 3 หลายเดือนก่อน ⁺¹
Theo just dropped a 3 min suscribe pitch.
@hohohotreipatlajele2044 3 หลายเดือนก่อน
I've tried but it's a bit strange and slow and I didn't find how to start again after shutting down
@entropywilldestroyusall1323 3 หลายเดือนก่อน
Great vid, Adam.
@lancemarchetti8673 3 หลายเดือนก่อน
Crazy... it's hard to keep up. And now there's Groq.. which is ridiculously fast.
@tasmto 3 หลายเดือนก่อน
Ok... this is really really cool!
@Fire.Blast. 3 หลายเดือนก่อน
1:36 "as you can see, it's pretty fast" yes, instant even it would seem
@schtormm 3 หลายเดือนก่อน
8:17 for some reason the Dutch public news broadcaster also uses svelte sometimes lmao
@christianremboldt1557 3 หลายเดือนก่อน ⁺¹
Never thought an AI would convince me to subscribe to someone
@user-tk5ir1hg7l 3 หลายเดือนก่อน
50% faster inference with nvidia gpus on tensor-rt is no joke, i hope they expand this and let you fine tune and add models
@ThePawel36 2 หลายเดือนก่อน
I wonder if you could train your language model to play a game on your behalf, such as Cyberpunk, for example of course. It seems feasible, as some local language models are equipped with vision capabilities. It would be fascinating to witness the first TH-camr attempting this."
@TheGoodMorty 3 หลายเดือนก่อน
Ollama just released the Windows preview
@juanmacias5922 3 หลายเดือนก่อน
Damn, that's impressive.
@chriss3154 3 หลายเดือนก่อน
A comparison against privateGPT and/or localGPT would've been awesome
@hairy7653 2 หลายเดือนก่อน ⁺¹
the TH-cam option isn't showing up on my rtxchat
@omanimedia หลายเดือนก่อน
Same Issue Bro I Also searching for this whole internet but couldn't even get one useful tutorial🥲 If you figured it out Then please let me know.
@jackg_ 3 หลายเดือนก่อน
Here's hoping more and more AI things move to have local options. Sure not everyone can run these locally, I am typing this on an iMac from 2015... BUT it is super promising.
@Readraid_ 3 หลายเดือนก่อน ⁺¹
'Nvidia just dropped' linus clip is CRAZY
@kyleleblancvlogs3820 3 หลายเดือนก่อน
Finally when someone says "when did i say that !! Huh"
I can go. If this video on this date at this timestamp. Checkmate
@banalMinuta 3 หลายเดือนก่อน
Why would you not recommend running ollama on WSL right now
@setasan 3 หลายเดือนก่อน
That would be great for my hundred pages onenote files.
@PRIMARYATIAS 3 หลายเดือนก่อน
Can be good for learning, Could point it to some programming books I have so I can "chat" with them 😂
@sozno4222 3 หลายเดือนก่อน
The models you can run on chat with RTX are a bit inadequate right now. But it shows promise
@zyxwvutsrqponmlkh 3 หลายเดือนก่อน
I wonder how large a single text can be for this to work. Can I throw whole books at it? What about my states entire law code?
@FarishKashefinejad 3 หลายเดือนก่อน ⁺¹
Theo Tech Tips
@MultiMojo 3 หลายเดือนก่อน
Keep in mind that Chat with RTX bundles a 7B parameter model, which will consume GPU memory during use. Inference is going to be painfully slow if you're running a weaker gpu. Responses from this model aren't going to be at par with GPT4/Claude. If you're looking to chat with your own documents, paying for an OpenAI API key w. langchain RAG implementation is the more efficient way to go.
@dubya85 2 หลายเดือนก่อน
It will not work on anything less than 8gb 30 or 40 series rtx cards. 12g Min for the larger ai model
@johnbarros1 3 หลายเดือนก่อน
Hey I hit subscribe! Gimme more Ai!
@scottiedoesno 3 หลายเดือนก่อน ⁺¹
MOAR AI. The solution to having a job with AI is knowing how to use it
@TheD3adlysin 3 หลายเดือนก่อน ⁺¹
This is a good video. Just wish you could run this on Linux....with a AMD card.....heh
@d4rkg 3 หลายเดือนก่อน ⁺³
I never expected to see Freddie Mercury talking about AI
@Hunger53 3 หลายเดือนก่อน ⁺³
The main downside of this program is it only parses one file at a time, even if you have multiple files with data. Kinda meh if you need to do comparisons or use one file as a context to process the second.
@user-oo2wb8tf7i 3 หลายเดือนก่อน
You need CrewAI
@vitorwindberg4212 3 หลายเดือนก่อน ⁺²
I agree. It would be awesome if, for example, in the "What is Theo's favorite library?" question, the model could use all different videos data at once and assume it's React - instead of relying on a single video that it deemed the most important for that question.
@sauer.voussoir 3 หลายเดือนก่อน
Nice pointing it out, I didn't even realize it did it that way. Hopefully they get updated along the way.
@FamilyManMoving 3 หลายเดือนก่อน
This is a simple demonstration cobbled together from open source. It's not meant to be an actual system. What you want is already available; it just requires a little more work on your end, and models that can handle the actual data.
Context length is a real issue with a lot of open source models. There is only so much RAG can do if a model limits context to 2048 tokens, for instance. I've had models start hallucinating when they get close to the limit. The good news is those hallucinations are so off the topic that it's obvious when they occur.
@dubya85 2 หลายเดือนก่อน
It can look at everything in a folder at once
@bhaskaruprety230 3 หลายเดือนก่อน
Please make a video on SLM
@azeek 3 หลายเดือนก่อน
Brooo what an opening ❤❤😂
@MobCat_ 3 หลายเดือนก่อน
I wanna point that folder at my current project im working on.
or a massive archive or python code lol...
@jaylenjames364 3 หลายเดือนก่อน
I can get a little more specific on a topic with chat with rtx. Compared to chatgpt.
@nathanfife2890 2 หลายเดือนก่อน
I'm curious how good it is at code
@cintron3d 2 หลายเดือนก่อน
Fine fine I'll subscribe.
@jacobgoldenart 3 หลายเดือนก่อน
I'm confused here, There are a ton of vector databases that you can install and run on a Mac. No external GPU needed. Like Chromadb, or Faise. Then just use something like llamaindex or langchain to chunk your documents and create embeddings using something like openai's ada2. Then insert them into chroma and starting doing rag on your documents. You certainly don't need an Nvidia GPU.
@andrewdunbar828 3 หลายเดือนก่อน
Ask me about a question and I'll tell you about an answer.
@seanmartinflix 2 หลายเดือนก่อน
Okay this is something I'm trying to learn. For the record I'm not a programmer just an enthusiast trying to learn stuff kind of a dummy Compared to you all probably. But what do you guys think of Pinocchio I've gotten llama to run through it but it doesn't always run And there's not very many good tutorials on Pinocchio . I would love it to be covered on this channel. just what you think of it? And other insights if possible. Anything I really can find on it is very basic and only gets you so far. Weird comment I know. like the channel always some great insights.
@patricknelson 3 หลายเดือนก่อน
Maybe “trt-llm-rag-windows” implies maybe there will be a Linux or MacOS version someday. 🤔
@jopansmark 3 หลายเดือนก่อน
It's over for OpenAI
@minimal2224 3 หลายเดือนก่อน
Woah woah woah it’s only fast because of your laptop hardware lol M1 - M4 chip?
@__greg__ 3 หลายเดือนก่อน
Nice
@Endelin 3 หลายเดือนก่อน
PrivateGPT is doing something similar to this.
@amodo80 3 หลายเดือนก่อน
you can so RAG with ollama when you run ollama-webui
@anime.x_ror 3 หลายเดือนก่อน
i missed nvidia exe for nvidia chat zip. could someone share it with me? )
@smallbluemachine 3 หลายเดือนก่อน
So, you're gonna need an Nvidia RTX-capable card (30xx is fine) and it'll need at least 7GB vram, so maybe your laptop might struggle.
Really interesting app. Super simple to install and run. It's quite the killer app for a demo 😅 some start-ups and their investors somewhere are gonna be sweating bullets. 😂
@wlockuz4467 3 หลายเดือนก่อน
I will never understand how Meta's OS side can be so good, while their business side is always being so predatory.
@hallooww 2 หลายเดือนก่อน
facebook actually making progress in the AI scene…
@RobbPage 3 หลายเดือนก่อน
more Ai. definitely. as devs we need to stay on top of this stuff. the war against the machines has begun my friends and we're on the front lines.
@_DashingAdi_ 3 หลายเดือนก่อน
I'm scared😢
@Dav-jj2jb 3 หลายเดือนก่อน
I would be excited about AI if people would stop at chat bots, but we won't. It will get from "cool chat bot" to "AGI existential dread" real quick.
@jaysonp9426 3 หลายเดือนก่อน
Nvidia should be freaking out over Groq
@jhonnyrodrigues 3 หลายเดือนก่อน
Chat with Ray Tracing -> nice
@jazilzaim 3 หลายเดือนก่อน
This is going to give another moat to Windows devices over Mac and Linux devices
@TheMirrorslash 3 หลายเดือนก่อน
This is the future. There's so many AI integrations dropping. The possibilities are getting out of hand.

ต่อไป

เล่นอัตโนมัติ