This Isn't Just A Chatbot (OpenAI Should Be Scared...)
ฝัง
- เผยแพร่เมื่อ 19 ก.พ. 2024
- I heard nvidia was doing some chat bot stuff, but Chat With RTX ended up being much more interesting than I expected. Retrieval-augmented generation (RAG) is a fascinating new technique and I'm curious how we see it adopted over time. Compared to ChatGPT and ollama, this is very different.
"insert statement about Tensorflow for SEO reasons here"
Sources:
www.nvidia.com/en-us/ai-on-rt...
github.com/NVIDIA/trt-llm-rag...
github.com/ollama/ollama
Check out my Twitch, Twitter, Discord more at t3.gg
S/O Ph4se0n3 for the awesome edit 🙏 - วิทยาศาสตร์และเทคโนโลยี
That start with linus is makes my heart drop.
Same point 😂
When I saw him I thought I clicked the wrong notification for a split second
Same! Wasn't expecting that.
Wow somehow I missed it lol
hahhaha.. there we go Linus.
Jajajaja a ver si os enteráis que el OpenSource es una de los mayores engaños de las multinacionales para que trabajéis a cambio de una camiseta
ONLY addition, it HAS to support markdown.
Imagine, just setting this to your obsidian vaults folder path and boom, you can chat with your second brain 🤯
I need this yesterday
The way these models work means it almost definitely already does to some extent, since there a decent amount of markdown in the training process for the models used for both generation and document embedding (making the search part work). At that point it's mostly just a matter of prompt tuning
What if I told you that already exists?
- There is an Ollama plugin to chat with your notes. Although this is more single note specific, it's free.
- There is also a plugin called Smart Connections, but you need a OpenAI API key for that. This one is note global, it will create embeddables (vector files created from your notes) from your notes. And then you can chat with all your notes.
So I’m working on this… anyone who sees this comment what are some must have’s for a local first AI knowledge vault?
❤ oh man that's actually a very good thinking..
Finally one use for my 4080 that doesn't involve crying trying to play cities skylines 2
Wut? It lags on that game?
@@RT-. Yeah that game is beyond a mess
The game is broken@@RT-.
Isnt that a cpu based game
@@GetUrFunnyUpyes
Hey Theo, just wanted to point out a few inconsistencies. RAG doesn’t train a model, it indexes the text files in a vector database and uses word similarity to look up relevant text. So the model, such as llama2 or mistral is unchanged but it is able to add context and make the retrieved text more conversational.
There are loads of great AI/RAG projects other than Ollama out in the git seas too. Many not quite as simple or easy to use though.
Thanks for all the great videos. Already subscribed ;)
Came here to say this, and also - your bias is showing :P
Shamelessly mentioning here the product/startup I work for: Qdrant is an open source vector database which excels at RAG setups. I created the JS SDK :P which offers fully-typed REST and gRCP clients
@@rendezone What does it offer that PGVector for instance does not?
@YomiTosh by any chance do you know which RAG system/framework is giving out the best performance?
It's not training the model just doing RAG. Retrieval is basically querying for relevant docs based on semantic similarity basically doing a sql query which a vectors in the where clause
Yep thanks for that, was about to comment something similar. Words are coordinates, yo.
Is it done by searching to the nearest/closest embedding?
@@Imp0ssibleBGpretty much. But there are other different approaches for doing RAG
simplest technique is the cosine similarity between the query and each document chunk.
But is this the best example of 'kind of' training your own model? There's a race right now for people to train models on private or proprietary data. RAG seems to be the most practical solution so far even though it's not perfect. Or am I wrong about this?
I would consider giving a shoutout to the llamacpp project that serves as the backend engine to many of the open source programs like Ollama, and the many many talented engineers who brought support to so many different systems configurations.
The open source scene has been on fire since Llama dropped and running models locally has never been easier.
Oh man, was not ready for that intro. I love LTT and your channel, that was a great little combination
they are not directly using svelte, they are using a project OSS project called Gradio for the UI which uses svelte
Finally 🎉 I couldn't wait any longer for ray tracing support in my chat bot GUI
This is huge for my wiki. I can just give it a directory of markdown files. 🤯
Better search in docs, i would add my frameworks/libraries documentations as well
I’ll definitely be checking this out this weekend when I don’t have to work. This looks bad ass!
Feedback: Superb video, more AI stuff from you would be great. Specially with open source stuff with our own data.
If you don't want to use it because it's so large, get ollama and you can run it on your command prompt, I recommend watching a tutorial on it, and there's models as little as 1.8 gb (For example, phi 2, which is small yet very powerful)
While watching this I started to realise how huge usage I would have with this at work. The project I'm in has a huge documentation, but everything is just a brain dump, and there have many times happened that we've found something "new" in the docs that we completely had missed before. Imma work on making an AI on that dataset asap. I love experimenting with AI's locally, it so fun and it feels so much cooler and better then the cloud once
Would this work on the codebase for a library? For example inputting a freshly downloaded wordpress directory and then also digesting the wordpresss developer docs to make it your private Q&A tutor for platform you're trying to learn?
Yes
Yes, your explanation of RAG was very nice and easy to understand
Good stuff! Could you make a video on how well it performs as a coding assistant?
I have to admit, that is the MOST creative L&S I've ever seen on here. And I normally swear at the screen in response.
Maybe.
This is pretty cool, though I'm still waiting for the day that I can use at least GPT-4 level AI locally *(and ideally either for free or a one-time payment for one single version).* Sadly, I doubt this will ever happen outside of opensource projects, which tend to not be as good due to less funding and resources. But I still appreciate any effort put towards that future.
I make 2 points here. 1 questioning the accuracy of this system 2 why windows
1. what's interesting about downloading youtube video transcripts and using those files at 7:40 is that nvidia's setup is MOST LIKELY using their own ASR (Automatic Speech Recognition) model, either Canary or Parakeet, which i've tested and found that theyre good but still not as accurate as open ai's Whisper ASR model. So without knowing what specific model is used to transcribe the youtube videos, we don't know how exact those transcriptions are, so that affects how well this RAG can asnwer questions using that data. I would reccommend using Whisper-Large-v3 and manually transcribing the youtube videos, or just uploading actual documents and notes and testing them rather than transcribing youtube videos.
2. you dont reccommend using WSL but you didnt elaborate. what is the best alternative? installing linux locally or using a cloud workstation? dont say mac because they dont come with nvidia gpu
You and Prime need to get with this soon
Would love to see more AI content. Great look into this new release from NVIDIA
I did something similar with Pinecone. I parsed a huge chuck of wiki data into a Pinecone DB. I then would use one model prompt which would return multiple pieces of data based on the prompt. That model would then decide which pieces of outside information were the most related to the prompt. It would then send the original prompt along with the external data to a new model prompt which then would provide the response to the user.
WSL2 works surprisingly well. I've been using it on one of my machines for SD, llama, and mixtral.
Good stuff..liked and subbed.
If all of the python docs were fed to an LLM model, would you use query that LLM model or still refer to the original docs?
I keep thinking about this video.... RAG is showing up on my timeline on twitter everywhere... I would have had to spend HOURS trying to understand it... let alone realise I could download NVIDIAs demo and run it on my GPU..... Your videos are amazing to understand huge swathes of new AI tech easily... not to mention actually show working tech demos.
This is like ControlNet for LLMs. Dope.
Deeper, go even deeper!!!
NOT DEEP ENOUGH! MOAR PLZ!
The app isn't made with Svelte, but with Gradio. Gradio is a Python library for creating web UIs for ML applications. Gradio, however, uses Svelte and Tailwind internally.
00:03 and i already saw linus dropping not just something, a graphics card.
Instant like! xD
Could you make a small RAG project :-)?
Or do you have a channel who is like the theo of open source LLMs?
This kind of stuff would be a lifesaver if it manages to work as an AI powered chatbot for documentation for proprietary frameworks and stuff. I'm working at a startup and we're building our own framework from scratch, so having RTX Chat work as an AI documentation assistant would be great
This will be huge when the ai will be capable of parsing a whole project and multiple docs.
ahhh, imagine parsing and generating tests that makes sense based on prompts ::OOOO
2:19 The TH-cam algorithm recommends me your videos frequently. Is there any real benefit I get from subscribing if I'm going to watch your videos and see all your community posts anyway?
what if i don't subscribe to anyone because i just don't want to be subscribed to a bunch of random channels? ai doesn't understand
Rags are cool, they can use vector databases to map to data.
what is vram for RAG or their version of it
I love it! I find this stuff fascinating
What’s the advantage over privateer that’s been out for months where you can choose your own model and it is tiny in comparison?
Lawyer: AI, find precedence to get my client off the hook for drunk in public.
AI: Beep, bop... Bort - Say it was diabetes related.
Already been using this, I have also wired up ollama to serve multiple requests and I run a business off it now.
Nice thing what Nvidia did there :) Do you want to share the two TH-cam Playlists in the comments or description maybe? :D
Definitely on team AI deep dive!
Point it to a playlist of Jonathan Blow videos and then tell it JavaScript is the best language and ask when will Jai have LSP support? Can a LLM can get an aneurysm?
I would definitely appreciate more AI content as somebody just getting into web development.
It seems pretty apparent that AI is going to unleash a new category of tools, one whose mastery will most likely be paramount to ones success
Theo just dropped a 3 min suscribe pitch.
I've tried but it's a bit strange and slow and I didn't find how to start again after shutting down
Great vid, Adam.
Crazy... it's hard to keep up. And now there's Groq.. which is ridiculously fast.
Ok... this is really really cool!
1:36 "as you can see, it's pretty fast" yes, instant even it would seem
8:17 for some reason the Dutch public news broadcaster also uses svelte sometimes lmao
Never thought an AI would convince me to subscribe to someone
50% faster inference with nvidia gpus on tensor-rt is no joke, i hope they expand this and let you fine tune and add models
I wonder if you could train your language model to play a game on your behalf, such as Cyberpunk, for example of course. It seems feasible, as some local language models are equipped with vision capabilities. It would be fascinating to witness the first TH-camr attempting this."
Ollama just released the Windows preview
Damn, that's impressive.
A comparison against privateGPT and/or localGPT would've been awesome
the TH-cam option isn't showing up on my rtxchat
Same Issue Bro I Also searching for this whole internet but couldn't even get one useful tutorial🥲 If you figured it out Then please let me know.
Here's hoping more and more AI things move to have local options. Sure not everyone can run these locally, I am typing this on an iMac from 2015... BUT it is super promising.
'Nvidia just dropped' linus clip is CRAZY
Finally when someone says "when did i say that !! Huh"
I can go. If this video on this date at this timestamp. Checkmate
Why would you not recommend running ollama on WSL right now
That would be great for my hundred pages onenote files.
Can be good for learning, Could point it to some programming books I have so I can "chat" with them 😂
The models you can run on chat with RTX are a bit inadequate right now. But it shows promise
I wonder how large a single text can be for this to work. Can I throw whole books at it? What about my states entire law code?
Theo Tech Tips
Keep in mind that Chat with RTX bundles a 7B parameter model, which will consume GPU memory during use. Inference is going to be painfully slow if you're running a weaker gpu. Responses from this model aren't going to be at par with GPT4/Claude. If you're looking to chat with your own documents, paying for an OpenAI API key w. langchain RAG implementation is the more efficient way to go.
It will not work on anything less than 8gb 30 or 40 series rtx cards. 12g Min for the larger ai model
Hey I hit subscribe! Gimme more Ai!
MOAR AI. The solution to having a job with AI is knowing how to use it
This is a good video. Just wish you could run this on Linux....with a AMD card.....heh
I never expected to see Freddie Mercury talking about AI
The main downside of this program is it only parses one file at a time, even if you have multiple files with data. Kinda meh if you need to do comparisons or use one file as a context to process the second.
You need CrewAI
I agree. It would be awesome if, for example, in the "What is Theo's favorite library?" question, the model could use all different videos data at once and assume it's React - instead of relying on a single video that it deemed the most important for that question.
Nice pointing it out, I didn't even realize it did it that way. Hopefully they get updated along the way.
This is a simple demonstration cobbled together from open source. It's not meant to be an actual system. What you want is already available; it just requires a little more work on your end, and models that can handle the actual data.
Context length is a real issue with a lot of open source models. There is only so much RAG can do if a model limits context to 2048 tokens, for instance. I've had models start hallucinating when they get close to the limit. The good news is those hallucinations are so off the topic that it's obvious when they occur.
It can look at everything in a folder at once
Please make a video on SLM
Brooo what an opening ❤❤😂
I wanna point that folder at my current project im working on.
or a massive archive or python code lol...
I can get a little more specific on a topic with chat with rtx. Compared to chatgpt.
I'm curious how good it is at code
Fine fine I'll subscribe.
I'm confused here, There are a ton of vector databases that you can install and run on a Mac. No external GPU needed. Like Chromadb, or Faise. Then just use something like llamaindex or langchain to chunk your documents and create embeddings using something like openai's ada2. Then insert them into chroma and starting doing rag on your documents. You certainly don't need an Nvidia GPU.
Ask me about a question and I'll tell you about an answer.
Okay this is something I'm trying to learn. For the record I'm not a programmer just an enthusiast trying to learn stuff kind of a dummy Compared to you all probably. But what do you guys think of Pinocchio I've gotten llama to run through it but it doesn't always run And there's not very many good tutorials on Pinocchio . I would love it to be covered on this channel. just what you think of it? And other insights if possible. Anything I really can find on it is very basic and only gets you so far. Weird comment I know. like the channel always some great insights.
Maybe “trt-llm-rag-windows” implies maybe there will be a Linux or MacOS version someday. 🤔
It's over for OpenAI
Woah woah woah it’s only fast because of your laptop hardware lol M1 - M4 chip?
Nice
PrivateGPT is doing something similar to this.
you can so RAG with ollama when you run ollama-webui
i missed nvidia exe for nvidia chat zip. could someone share it with me? )
So, you're gonna need an Nvidia RTX-capable card (30xx is fine) and it'll need at least 7GB vram, so maybe your laptop might struggle.
Really interesting app. Super simple to install and run. It's quite the killer app for a demo 😅 some start-ups and their investors somewhere are gonna be sweating bullets. 😂
I will never understand how Meta's OS side can be so good, while their business side is always being so predatory.
facebook actually making progress in the AI scene…
more Ai. definitely. as devs we need to stay on top of this stuff. the war against the machines has begun my friends and we're on the front lines.
I'm scared😢
I would be excited about AI if people would stop at chat bots, but we won't. It will get from "cool chat bot" to "AGI existential dread" real quick.
Nvidia should be freaking out over Groq
Chat with Ray Tracing -> nice
This is going to give another moat to Windows devices over Mac and Linux devices
This is the future. There's so many AI integrations dropping. The possibilities are getting out of hand.