- 38
- 117 714
AI Fusion
เข้าร่วมเมื่อ 4 ก.ค. 2023
Explore the cutting edge of artificial intelligence with us. On this channel, we dive deep into large language models, examining their logic, reasoning, and mathematical prowess. From hands-on coding tutorials to practical applications of Retrieval Augmented Generation (RAG), we provide in-depth analyses and showcase the latest advancements in AI technology. Whether you're an AI enthusiast, a developer, or just curious about how these models work, you'll find engaging content and valuable insights here. Subscribe and join us on our journey to unlock the potential of AI!
New RAG Technique for Improved Accuracy and Cost Savings: Interactive RAG #rag #llm
In this video, we're diving into an exciting new approach to Retrieval-Augmented Generation, or RAG-introducing Interactive RAG. If you've been using traditional RAG, you already know how it can improve AI responses by pulling relevant information from external sources. However, traditional RAG has limitations when it comes to cost-efficiency and precision. That's where Interactive RAG comes in.
Interactive RAG empowers you with more control by adding an interactive layer to the retrieval process. Instead of automatically sending every retrieved chunk to the AI model, Interactive RAG lets you decide which chunks to include. This simple but powerful change allows you to reduce token usage, lower costs, and fine-tune the AI’s responses to be exactly what you need-focused, accurate, and highly relevant.
In this video, we’ll start with a quick overview of how traditional RAG works, and then I’ll walk you through the unique benefits of Interactive RAG. We’ll even look at an example, where we interact with retrieved data about medications to get a precise response. This innovation is perfect for anyone looking to maximize the accuracy of AI outputs while keeping costs under control.
If you’re passionate about getting the most out of AI and want to stay updated on cutting-edge advancements like Interactive RAG, make sure to hit that subscribe button and turn on notifications!
#InteractiveRAG #RetrievalAugmentedGeneration #RAG #AIInnovation #CostEfficiency #PrecisionAI #ArtificialIntelligence #GenerativeAI #MachineLearning #DataControl #AIDevelopment #TokenOptimization #AItools #AIResearch #AICommunity #FutureOfAI #SmartAI #ContentCreation
Interactive RAG empowers you with more control by adding an interactive layer to the retrieval process. Instead of automatically sending every retrieved chunk to the AI model, Interactive RAG lets you decide which chunks to include. This simple but powerful change allows you to reduce token usage, lower costs, and fine-tune the AI’s responses to be exactly what you need-focused, accurate, and highly relevant.
In this video, we’ll start with a quick overview of how traditional RAG works, and then I’ll walk you through the unique benefits of Interactive RAG. We’ll even look at an example, where we interact with retrieved data about medications to get a precise response. This innovation is perfect for anyone looking to maximize the accuracy of AI outputs while keeping costs under control.
If you’re passionate about getting the most out of AI and want to stay updated on cutting-edge advancements like Interactive RAG, make sure to hit that subscribe button and turn on notifications!
#InteractiveRAG #RetrievalAugmentedGeneration #RAG #AIInnovation #CostEfficiency #PrecisionAI #ArtificialIntelligence #GenerativeAI #MachineLearning #DataControl #AIDevelopment #TokenOptimization #AItools #AIResearch #AICommunity #FutureOfAI #SmartAI #ContentCreation
มุมมอง: 149
วีดีโอ
Llama 3 Uncensored - Run it Locally | Dolphin Llama 3 Setup Guide
มุมมอง 2.5Kวันที่ผ่านมา
In this video, we'll show you how to set up and run the Dolphin Llama 3 model locally on your system. Learn how to download, install, and configure this powerful, uncensored language model to suit your needs. What You’ll Learn: How to download Dolphin Llama 3 from Ollama and run it on your local machine. Step-by-step setup instructions, including how to handle advanced settings. How to optimize...
Use LLAMA 3.1 405b in your projects for Free (Free API credits)
มุมมอง 4.1K14 วันที่ผ่านมา
A step-by-step guide showing how to use LLAMA 3.1 (405B parameters) through NVIDIA NIMS cloud platform. The video explains what NVIDIA NIMS is, how it lets you run large AI models without local hardware, and walks through the complete setup process. We cover the platform's interface, parameter settings like temperature and top P, and demonstrate how to implement the API in your code. The tutori...
Run LLAMA 3.1 405b on 8GB Vram
มุมมอง 16K21 วันที่ผ่านมา
Script : www.patreon.com/posts/114566125 Revolutionize your AI workflow with AIR-LLM - the game-changing tool that's breaking hardware barriers in LLM deployment! In this must-watch tutorial, we explore how AIR-LLM achieves the impossible: running a massive 405B parameter language model on just 8GB of VRAM - that's a 30x reduction in hardware requirements compared to traditional methods! 🔑 Key ...
RUN LLMs on CPU x4 the speed (No GPU Needed)
มุมมอง 2.2Kหลายเดือนก่อน
Unlock the power of large language models on your CPU! This video showcases LamaFile, a revolutionary tool that lets you run complex AI, including image processing, without a GPU. Watch as we demonstrate blazing-fast performance on a standard i5 processor, making advanced AI accessible to everyone. Learn the simple setup process and see real-time CPU utilization. Whether you're a dev or AI enth...
GEMMA 2 2b VS QWEN 2 1.5b - RAG Test #gemma2 #qwen2 #llm #rag
มุมมอง 406หลายเดือนก่อน
Dive into the world of small large language models as we compare two powerhouses: Gemma 2 (2 billion parameters) and Qwen 2 (1.5 billion parameters). Watch as we put these models through their paces in a Retrieval Augmented Generation (RAG) test using a custom-built application. We'll evaluate their performance across eight questions, focusing on accuracy, completeness, and hallucination avoida...
100M TOKEN Context Window size LLM!
มุมมอง 551หลายเดือนก่อน
Dive into the groundbreaking world of ultra-long context AI models with MAGIC's revolutionary 100 million token window. Discover how this leap in AI memory is set to transform software development and beyond. We explore MAGIC's partnership with Google Cloud, their innovative HashHop evaluation method, and the game-changing efficiency of their LTM models. Learn how these advancements could redef...
LLAMA 3.2 11B Vision Fully Tested (Medical X-ray, Car Damage Assessment, Data Extraction) #llama3.2
มุมมอง 9Kหลายเดือนก่อน
In this video, we dive deep into the capabilities of the LLAMA 3.2 model, specifically the 11 billion parameter version, and put it to the test across various image recognition and analysis tasks. From visual representations and medical X-rays to counting objects and recognizing car damage, we explore the model's accuracy and performance. What You'll See in This Video: Introduction: Brief overv...
The Secret Behind OpenAI o1 + Trying it on LLAMA 3.1 #O1 #LLAMA3 #OPENAIO1 #COT #openai #llm
มุมมอง 1.4Kหลายเดือนก่อน
Join us as we explore OpenAI's groundbreaking o1 model and its innovative "Chain of Thought" reasoning technique! In this video, we dive deep into how o1 is revolutionizing the AI landscape with its advanced problem-solving capabilities in coding, mathematics, and scientific reasoning. Discover the secrets behind o1’s exceptional performance, which allows it to think through complex problems st...
QWEN 2.5 72b Fully tested (Coding, Logic and Reasoning, Math) #qwen2.5
มุมมอง 2.2Kหลายเดือนก่อน
In this video, we put the Qwen2.5 72B model to the test across a variety of tasks to see how it performs. Qwen2.5 is the latest and most powerful model in the Qwen series, boasting 72.7 billion parameters and a range of advanced features, including improved coding capabilities, long-context support, and multilingual support. We challenge Qwen2.5 with several coding tasks, including building a G...
GEMMA 2 2b GPU Requirements (q4, q8 and fp16) +Test on RTX 4060 #gemma2 #gemma #aigpu #llm #localllm
มุมมอง 836หลายเดือนก่อน
This Tool allows you to choose an LLM and see which GPUs could run it... : aifusion.company/gpu-llm/ The GPU I'm using : RTX 4060: amzn.to/4d7s0KM But i would recommend getting the RTX 3060 as it's cheaper and has more VRAM which will allow you to run larger models. RTX 3060 : amzn.to/4gtLFHG In this video, we explore Gemma 2, a 2 billion parameter AI language model, and break down its performa...
Free Real time Voice Cloning - Sound like anyone (Advanced Voice changer) #voicecloning #voicechange
มุมมอง 1.1K2 หลายเดือนก่อน
Download Link: huggingface.co/wok000/vcclient000/tree/main Voice models: voice-models.com Recommended GPU : amzn.to/3z6A2FP Ever wanted to sound just like your favorite celebrity or fictional character? In this tutorial, we'll guide you through the process of real-time voice cloning, allowing you to transform your voice to sound like anyone you choose instantly. From downloading the necessary s...
Create and Add Custom GPT to Instagram & Messenger (Customer support/Lead generation)
มุมมอง 1422 หลายเดือนก่อน
Link to the tool: www.chatbase.co/?via=ai-fusion In this video, I’ll guide you through creating a custom GPT for your business, perfect for customer support and lead generation. We'll begin by exploring the platform to set up the GPT, using my agency’s website as an example. I’ll show you how to train the chatbot by fetching data directly from your site and integrate it with your Instagram and ...
GPT 5 Is Almost Here! Official News on OpenAI New AI Model #gpt5 #gptnext #orion #openai #strawberry
มุมมอง 2552 หลายเดือนก่อน
OpenAI is preparing to unveil GPT-5, the most powerful AI model to date, under the secretive Project Strawberry. Expected to revolutionize AI capabilities, GPT-5 brings massive improvements in complex problem-solving, including mathematics and programming. This breakthrough follows CEO Sam Altman’s cryptic hints on social media and OpenAI’s demonstration to national security officials. Integrat...
Create a Custom GPTs for Your Website (Lead generation/Customer support) #customgpt #aichatbot
มุมมอง 1172 หลายเดือนก่อน
Create a Custom GPTs for Your Website (Lead generation/Customer support) #customgpt #aichatbot
GPT 4O Mini VS GROK 2 Mini (Coding, Logic & Reasoning, Math) #gpt4omini #gpt4o #grok2 #grok2mini
มุมมอง 2452 หลายเดือนก่อน
GPT 4O Mini VS GROK 2 Mini (Coding, Logic & Reasoning, Math) #gpt4omini #gpt4o #grok2 #grok2mini
Best GPU Under 300$ for Running LLMs Locally #llm #ai #localllm #gpuforaidevelopment
มุมมอง 1.1K2 หลายเดือนก่อน
Best GPU Under 300$ for Running LLMs Locally #llm #ai #localllm #gpuforaidevelopment
LLAMA 3.1 70b GPU Requirements (FP32, FP16, INT8 and INT4)
มุมมอง 33K2 หลายเดือนก่อน
LLAMA 3.1 70b GPU Requirements (FP32, FP16, INT8 and INT4)
LLAMA 3.1 405b VS GROK-2 (Coding, Logic & Reasoning, Math) #llama3 #grok2 #local #opensource #grok-2
มุมมอง 5483 หลายเดือนก่อน
LLAMA 3.1 405b VS GROK-2 (Coding, Logic & Reasoning, Math) #llama3 #grok2 #local #opensource #grok-2
LLM System and Hardware Requirements - Running Large Language Models Locally #systemrequirements
มุมมอง 9K3 หลายเดือนก่อน
LLM System and Hardware Requirements - Running Large Language Models Locally #systemrequirements
LLAMA 3.1 70b VS QWEN 2 72b (Coding, Logic & Reasoning, Math) #llama3 #qwen2 #local #opensource
มุมมอง 1K3 หลายเดือนก่อน
LLAMA 3.1 70b VS QWEN 2 72b (Coding, Logic & Reasoning, Math) #llama3 #qwen2 #local #opensource
LLAMA 3.1 8b VS LLAMA 3 8b (Coding, Logic & Reasoning, Math) #llama3 #llm #localllm
มุมมอง 2193 หลายเดือนก่อน
LLAMA 3.1 8b VS LLAMA 3 8b (Coding, Logic & Reasoning, Math) #llama3 #llm #localllm
LLAMA 3.1 8b VS GEMMA 2 9b (Coding, Logic & Reasoning, Math) #llama3 #gemma2 #llm #localllm
มุมมอง 5K3 หลายเดือนก่อน
LLAMA 3.1 8b VS GEMMA 2 9b (Coding, Logic & Reasoning, Math) #llama3 #gemma2 #llm #localllm
its not nsfw?
Download iRAG??!
can this model work for Arabic or it works for English only
no only red or green though
Great informative video. Could you kindly suggest, if I should choose an AMD 9800X3D or AMD 9900X if I want to run LLMs locally?
Great video
Nobody knows how much precision they need
I run qwen2.5 72B at BF16 , which is what you can dl off HF, with intel cpu intel gpu arc a770 16GB and 64GB DDR4 RAM and 5TB 7400MB/sec NVME ssd and it runs good! Ok, the initial load takes a few mintues but inference is fine. I only swap to SSD disk about 8GB but if you have much less than 64GB RAM then you bettter have a fast SSD cuz it will be hammered alot. For my setup intel IPEX-LLM is crucial and it's what makes it all possible on the intel cpu intel gpu. YEs, you need to know how to code, basic python at least. I also first got running a qwen2.5 coder 7B so if you have less resources than my windows pc, so should stick to the 7B model. On the bright side, register with HF and you can call the qwen2.5 72B via their API where they host the model. You get 1000.....
These products are all garbage with several flaws; 1) impossible to accomplish for the average person running an average laptop. Unless you have a $2K cuda-enabled GPU and have significant knowledge of how to install and run custom add-ons, you won't get very far. 2) The quality is not imperfect, it's downright laughable. 3) You will have difficulty cloning your own custom voices because nobody really knows how to do this. In addition, many of these github based products are listed in other languages, making it even more impossible. Sorry, with all the advancement of AI, nobody has been able to make a useable, authentic voice changer that runs in real time with adjustable delay times.
❤thank you too much, subscribe and like
Didn't expect this topic from such ladies😂
Hello, thanks for that great video. how about that system : i5 13600f, 32 gb of ddr5 ram and rtx 3090. For the current published llms?
Free for trial?
Nice video!, does it work?, I've been running the code from the CMD for almost 1-2 days, this line has been repeated since then, is it normal? (RTX2060 14GBVRAM & 16GB RAM): "new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████████████████████████████████████████████████████| 83/83 [07:22<00:00, 5.34s/it] total time for load_safe_tensor: 344.40438961982727 total time for compression_time: 86.546875 total time for create_layer_from_safe_tensor: 1.7751367092132568 total infer process time(including all above plus gpu compute): 105.0000 total infer wall time(including all above plus gpu compute): 444.1316 The class `optimum.bettertransformers.transformation.BetterTransformer` is deprecated and will be removed in a future release."
Is there a way to fine-tune this model via cloud
thank you, but t is very slow :-) why?
the sad thing is that they have cascading failure. all of them. windows 10, 11 and ubuntu.
What is cascading failure pls? Thanks
@@publicsectordirect982 they all start out strong, then start repeating sentences, then words, then start producing gibberish with random characters. it's the same on both my asus k712e laptop and my gigabyte z77x desktop.
you are loading a 4 bit bitsandbytes quantized LLM its already compromised on precision and then using AirLLM blockwise quantization the accuracy of the model will take a major hit
In infinity, even a calculator can run gpt-5
Note it wont run on an AMD RX 6700 xt, due to the absence of bitsandbytes module implementation for the AMD HIP on rocm. Bitsandbytes modules is key to the quantization of models. If u use any consumer AMD gpu < 7700 get an nvidia gpu or migrate to a higher module.
When you say 'full Adam training', is it full parameter fine-tuning or training an LLM from scratch?
According to Meta Blog, "We performed training runs on two custom-built 24K GPU clusters." How come only 13 H100 GPUs are required for a 70B model for full training on your webpage? Do you mean "at least"?
It is the minimal count of GPUs which allows you to keep all 70B parameters of the model, plus all Adam parameters (necessary during the training), in the VRAM memory at the same time. Ie. the minimal assumption of any computing efficiency. The untold information is that this way you would train it for 61 years. :/ Provided that you train it on the same corpus and with the same approach which META used. Good luck.
Why version 3.1 while 3.2 is available for months?
Because 3.2 still hasn't published its 405B model
testing with the 3.1 model that has been out is the appropriate and expected path. would you want the release to be delayed so they can start over again with 3.2?
Sure, why put any links in the description, makes no sense. People should search, make so much more fun 🙄
Please share the script mentioned in the video
if this is true then it would be ollama and openwebui loooong ago. and it would mean ipex would be able to smash through it but alas i do not think it is as good as its said.
Considering this is Python we're talking about here, how long does it take to run.. 1000 years?
Almost all of the Python you see in ML/AI projects are thin wrappers around C, C++ or CUDA.
@@orlandovftw so... 800?
You did not run it on 8gb vram
Better wait for a BitNET version add: or TriLM
That's probably a software that processes layer after layer and yes in theory you can run a very large model that way. But nobody said you can expect a result before the next ice age....
Yes very misleading video. I did the same thing a few years ago on llama 1 when it was leaked. Ran it on a 3090. Layer by layer, it took something like 5 mins per token. But it did work, but not useful at all.
@@yakmage8085
And how long does the inference take? 40h/token? Just because you can run it does not mean its useful
I only can wish the author of the video the similar "revolutionary" rate of viewcount and subscriber growth.
Apparently, you're right. I see a discussion saying A100 GPU with 16 core CPU inference time for one sentence takes 20+ minutes.... for one sentence.
@@funkytaco1358 So what AirLLM is claiming to do (and I haven't tested it yet) is selective layer loading / activation (e.g. it loads and unloads the layers as needed in the execution sequence of the model) thus the required memory is equal to roughly the parameter size of the layer. Besides this selective activation/loading, they also claim to be doing block-wise quantization. How much overhead all this incurs is something I'll have to poke at it. It That said, while it might sound like voodoo, this is actually memory optimization strategies people have been talking about for a while.
...is this real? I've never heard of this o_O
Llama 3.1 is vastly superior for coding
i don't believe it. i'll wait till someone else beta-tests it. this sounds like a scam to me. well looking at the github, it apparently works, but at speeds like 40 seconds per token. not very useful to me, but cool nonetheless
Crazyyyyyyy😮
Well, If it works, I wonder if this should be incorporated into LM Studio and similar projects...
Please try it out xD than I switch to Nvidia for that one
@@BeethovenHD You switch to Nemo?
script ???????
www.patreon.com/posts/114566125
WTF!!!!! 🫨
can you tell me the minimum CPU requirement for gemma2_2b model
Hey, I'm thinking of getting a new laptop with an RTX 4060 and 8GB of VRAM. But I'm also considering using Google Colab or Jupyter notebooks to learn about and play around with Large Language Models (LLMs) and their applications. The thing is, I need a new laptop anyway because my current one is ancient and barely hanging on. So, I'm wondering if it makes more sense to just buy my own machine or if I should go the Colab/Jupyter route. What do you think?
If you are running 2b model, you don’t need gps to begin with😂
Thank you for your video. Tell me please, what models can you advise for i9 9900, 32gb ram, rtx3090.
Llama 3.1 8b will run great
You never showed us how to clone any voice. He just showed us how to use presets why don’t you show us how to do what you said you were gonna do.
Will a 3060 ti with the i7 raptor k with integrated gpu and 64 gigs of ram on a asus pro art work?
is it possible to use AMD EPYC 9965 (192 cores, 576 GB/s memory bandwidth) for inference and training? maybe it is not as fast as GPUs but I can use much cheaper RAM modules and only one processor and it will be cheap enough
No, Consider that NVidia GPUS have between 1024 cores to over 16,384 cores.
@@guytech7310 but for LLM's as I know bottle neck is memory bandwidth not an amount of cores. And my question is how much cores is enough to reach memory bandwidth's bottle neck
@@guytech7310 and do not forget that avx512 instructions allow compute numbers in parallel in just one core
@@loktevra If that was true than LLMs would not be heavily dependent on GPUs for processing. Its that the larger LLM models require more VRAM to load. Otherwise with low VRAM, the LLM has to swap out parts with the DRAM on the motherboard. PCIe Supports DMA (Direct Memory Access) & thus the GPU already has full access to the Memory on the motherboard.
@@guytech7310 yes, but on GPUs memory bandwidth is bigger then on CPU. AMD EPYC 9965 is the latest CPU from AMD has just 576 GB/s. So for commercial usage using GPUs without doubt will be better chose with higher speed of VRAM. But for home lab maybe EPYC CPU is just enough?
neat site, i like this kind of information. i have been using Exl2 models 8bpw Llama 3.1 70B and Llama 3.1 Nemotron. I can load them on 2 rtx a 6000's. i would love to see this information for different quant types. the hardest part for me is figuring out how much vram i need for the models i try and its more of a brute force just try it and see.
🙏❤️
Llama vision is not very good for transcribing text - it makes a lot of things up. I show that in the latest video on my channel. Claude is currently miles ahead of anything else. GPT-4o not far behind. Llama a bit of a joke.
What if you want to just run an LLM specifically for better speech recognition. It should be very small, a subset. Could that be done on integrated to keep the GPU free?
Have you tried LLAMA 3.2 vs GEMMA 2 ?
my left ear really liked this