@@blackwhite-3607 Thanks to chip design, the ML Compute framework is able to utilize system ram for vram. Imagine if your video card had 64GB RAM! Now you might understand how a 20GB model can run easily on a Macbook Pro.
Thanks for the kind words. The rabbit hole is so deep. Meta prompting + o1 + OpenAI 12 day launch content in the works. The things we can do with this tech is mind boggling.
@@indydevdan I am not a coder at all and running Ollama/OpenWebUI via WSL2. However, I understood quickly I needed to come up with a better system for prompting and just found your channel. Got a lot to learn! Thanks to your suggestions in this video, I asked Grok to generate an XML format for specific instructions to edit a particular piece of text and it actually worked on the first try! Normally, I have to ask the AI 2-3 times before it understands the directions.
Love the reasoner - extractor pattern. Prompt chaining seems very useful, especially for agents with tool use - you could have the reasoner decide what step to take next, then have an extractor, then verifier with inspection tools that goes back to the reasoner with new information in case something went wrong with the reasoning…
I would really like to preserve all that thinking process, merge them all into one giant file, and turn it into a knowledge graph. I like to talk about philosophy and futurism, so the details are often very important. I don't care how long it takes. If the AI needs to get back to me tomorrow, that's what I'm used to anyway with people.
This is cool but you can also just use the structured output in ollama and force the output to put the chain of thought in one key and the final result in another one. Then you don’t need the second LLM pass at all.
Dude! I don’t know how I haven't watched one of your videos yet. Assuming that all your videos are like this... I have found my new favorite channel... I sometimes feel like I'm the only one I know who is really nerding out on prompt engineering in more complicated (and better performing) manner... To be honest, I feel like I am the only one I know who is really into using generative AI. In any event you’ve got a new regular sub here on YT.
There are millions dude.... Most people don't talk about it. It almost feels like a intern as an assistant, can't wait until PhD level. Never making a template from scratch again lol
I don’t always get everything from your vids as I’m a fkin noob with limited follow through but fk me I love your vids and I’m getting so much value out of it. Thank you for putting it out there
You can achieve almost the same output quality just by using the next prompt like: "Use previous answer to formulate the final answer in the json format"))
I mean it is slow, but in the time it took me to watch this video it wrote snake game on my rtx 3060. Technically I only asked it for an outline, but it decided phase three of outlining how to code snake game was showing me the code that outlines how to instruct your computer to run snake game on it so idk, I guess it failed successfully. Considering I just gave it some generic and poorly worded instructions in a state of sleep deprivation, I'm pretty impressed and excited to see what else it's capable of.
This is a very nice approach. Thank you for sharing. I'm wondering if QwQ will output the final answer in a specified tag like similar to how some other models will. That could help with the extraction for sure.
After I watched your previous video about prompt levels I wondered how I should implement dynamic variables, is the way you do it in your 5:47 example the best way? Or are there other better ways of utilizing the dynamic variable?
Very interesting. Is this script for prompt chaining possible on an IDE like Windsurf for each of the prompt give to an agent like Sonnet 3.5 ? Thanks for your work
i disagree, he's bigger than that mate his hands wizard look at how his hands handle the keyboard invoking strokes, sounds like keyboard wizard vim shortcuts on OS!
Curious to know if you've done any work with DSPy Dan? We've just started piloting it in my organisation so will be generating preliminary results soon but it's an interesting concept. Would be cool to hear your thoughts.
hat it speaks Chinese would indicate that it is a well-trained model. Mandarin Chinese is the most spoken language in the world, while English only becomes the largest if you count those who speak it as a second language. As a Swede, I often encounter U.S. bias in AI responses, where it uses feet and inches even though the metric system is the most widely used. I have to use a system prompt to make it use metric, but it often leaves the conversion in the answer, which I have to remove later.
Are you using this stack? Or an idea? If you are using it, I'm really curious about the perfs ! I want to switch from sonnet 3.5 to something local to reduce my climate impact
@@MaelSimonApprenTyr Currently running QwQ > output to file > input to aider. As Dan mentions the pitfall of this reasoning model is that it outputs its whole thought process so it would take super long for architect to run efficiently. Extracting the specific steps and details with prompt chaining is best here, but still takes quite a bit longer than using something like GPT o1
TL; DR - I am not convinced that Qwen is all that great. Admittedly, I haven’t put the newest one through its paces yet due to time constraints, but I intend to do my due diligence when time permits. I will elaborate below on why Qwen, and for that matter a good number of recent models and papers, all seem to be at best not really moving the industry forward. At worst, well, they seem like they are an intentional grift or at a minimum plagiarism. Due to there inability to correctly answer characters lacking certain identifying aspects. I’m sorry if that sounds too general, but in a effort to keep this short I will simply say that if you use o-1 & o-1 mini; Claude, Haiku, and Opus; Llama 70B & 405Band many of its variants; Gemini Pro; and to some degree Mistral though I find that Mistral has fewer, more focused strengths, what you find is that they, at least in English, are able to generalize and abstract in mind blowing ways already. Particularly OpenAI and Anthropic’s models. Despite their flaws, Gemini, Llama, Mistral, and Grok all have extremely amazing ability to infer from a query what the next words should factually be given the effectiveness of the prompt and, often, even with ineffective prompts. However, Qwen out of the box has not, as of yet, shown me anything, but party tricks. Previous versions of Qwen have performed absolutely horribly when I have tested their ability to make certain connections between ideas that every other model, even the 6B-8B models, tend to do without missing a beat. I have several theories as to why and as a scientist, enthusiast, and member of the human race it bums me out. I will leave those theories to your imagination. With all that said, I look forward to putting this new Qwen through its paces and see if it finds a place in my stack. Believe it or not, this was by far the TL; DR because I could go on for some time about this topic. Anyway… Great video. I look forward to following your content moving forward. Cheers.
Wait....... How do you fit the context length for extraction? I mean, it's sooooo long.... And your ram only 64GB and you use 32B model. hmm. I really wonder why it can fit and works really well. >,
what are your device's specs?
2023 M2 Macbook Pro with 64 GB RAM and 12 cores. Shows up at 3:39 in the video.
@@patrickbrady535 wow macbook pro can run ollama qwq 32b?
@@blackwhite-3607 Thanks to chip design, the ML Compute framework is able to utilize system ram for vram.
Imagine if your video card had 64GB RAM! Now you might understand how a 20GB model can run easily on a Macbook Pro.
Bro you are down the rabbit hole on this stuff it's so impressive. Some of the best quality AI content on youtube!
Thanks for the kind words. The rabbit hole is so deep. Meta prompting + o1 + OpenAI 12 day launch content in the works. The things we can do with this tech is mind boggling.
@@indydevdan I am not a coder at all and running Ollama/OpenWebUI via WSL2. However, I understood quickly I needed to come up with a better system for prompting and just found your channel. Got a lot to learn!
Thanks to your suggestions in this video, I asked Grok to generate an XML format for specific instructions to edit a particular piece of text and it actually worked on the first try!
Normally, I have to ask the AI 2-3 times before it understands the directions.
Let's just call it Qwik, ironic but easy to say
One might say that it's quick to say.
John Qwik
Amazing video. I love how you are pushing the tech to do as much as it can. i def try this out myself
Super insightful prompting chaining examples! Thankyou IndyDevDan. 5:42
thank you very much for this prompt guide!
Wow dude, this is amazing! liked and subscribed.
Thank you for reviewing!
Love the reasoner - extractor pattern. Prompt chaining seems very useful, especially for agents with tool use - you could have the reasoner decide what step to take next, then have an extractor, then verifier with inspection tools that goes back to the reasoner with new information in case something went wrong with the reasoning…
I would really like to preserve all that thinking process, merge them all into one giant file, and turn it into a knowledge graph.
I like to talk about philosophy and futurism, so the details are often very important.
I don't care how long it takes. If the AI needs to get back to me tomorrow, that's what I'm used to anyway with people.
Sounds like you should look at graph rag.
@johang1293 The database part is still Greek to me. Most of the tutorials are like "draw a circle, now draw the rest of the owl". 🤷♂️
This is cool but you can also just use the structured output in ollama and force the output to put the chain of thought in one key and the final result in another one. Then you don’t need the second LLM pass at all.
Dude! I don’t know how I haven't watched one of your videos yet. Assuming that all your videos are like this... I have found my new favorite channel... I sometimes feel like I'm the only one I know who is really nerding out on prompt engineering in more complicated (and better performing) manner... To be honest, I feel like I am the only one I know who is really into using generative AI. In any event you’ve got a new regular sub here on YT.
PS- Ironically, I already follow you on GitHub somehow. I don’t recall checking out your repos, but I look forward to following your work man. Cheers.
yep hes fucking spot on always, I thought I was the only one in my IT team on the cutting edge (I am) , but at least we've got indydevdan
There are millions dude.... Most people don't talk about it. It almost feels like a intern as an assistant, can't wait until PhD level. Never making a template from scratch again lol
I don’t always get everything from your vids as I’m a fkin noob with limited follow through but fk me I love your vids and I’m getting so much value out of it. Thank you for putting it out there
You can achieve almost the same output quality just by using the next prompt like: "Use previous answer to formulate the final answer in the json format"))
I mean it is slow, but in the time it took me to watch this video it wrote snake game on my rtx 3060. Technically I only asked it for an outline, but it decided phase three of outlining how to code snake game was showing me the code that outlines how to instruct your computer to run snake game on it so idk, I guess it failed successfully. Considering I just gave it some generic and poorly worded instructions in a state of sleep deprivation, I'm pretty impressed and excited to see what else it's capable of.
This is a very nice approach. Thank you for sharing. I'm wondering if QwQ will output the final answer in a specified tag like similar to how some other models will. That could help with the extraction for sure.
Yeah, then after you can extract it with a simple regex 😀
After I watched your previous video about prompt levels I wondered how I should implement dynamic variables, is the way you do it in your 5:47 example the best way? Or are there other better ways of utilizing the dynamic variable?
Thanks for sharing this! 🤩
Is that an EGPU for Mac on your desktop? Would love to hear a bit about your setup…
And here I had hoped we mere mortals could get something useful out of this. Not for those who don't program I guess ;o)
Very interesting. Is this script for prompt chaining possible on an IDE like Windsurf for each of the prompt give to an agent like Sonnet 3.5 ? Thanks for your work
Crushing it ! Can I call you Mr. Hands
There can be only one!
i disagree, he's bigger than that mate his hands wizard look at how his hands handle the keyboard invoking strokes, sounds like keyboard wizard vim shortcuts on OS!
Curious to know if you've done any work with DSPy Dan? We've just started piloting it in my organisation so will be generating preliminary results soon but it's an interesting concept. Would be cool to hear your thoughts.
hat it speaks Chinese would indicate that it is a well-trained model. Mandarin Chinese is the most spoken language in the world, while English only becomes the largest if you count those who speak it as a second language. As a Swede, I often encounter U.S. bias in AI responses, where it uses feet and inches even though the metric system is the most widely used. I have to use a system prompt to make it use metric, but it often leaves the conversion in the answer, which I have to remove later.
Europeans have a hard time creating innovative projects of their own due to draconian censorship laws in the EU. That's the problem!
Would you say prompt chaining like this is as efficient as a framework like LangGraph in a production context?
Using this to run a local aider --architect Qwen + QwQ stack :D
Are you using this stack? Or an idea? If you are using it, I'm really curious about the perfs ! I want to switch from sonnet 3.5 to something local to reduce my climate impact
@@MaelSimonApprenTyr Currently running QwQ > output to file > input to aider. As Dan mentions the pitfall of this reasoning model is that it outputs its whole thought process so it would take super long for architect to run efficiently. Extracting the specific steps and details with prompt chaining is best here, but still takes quite a bit longer than using something like GPT o1
@@magnusquest I'll have to check this in details, thanks man ! 😊
MCP x Ada/Agent OS, haven't see any channel covering memory server, also check awesome-MCP server repo. Can't wait.
100% feels big - I'm still digesting MCP. Massive OAI releases incoming.
TL; DR - I am not convinced that Qwen is all that great. Admittedly, I haven’t put the newest one through its paces yet due to time constraints, but I intend to do my due diligence when time permits. I will elaborate below on why Qwen, and for that matter a good number of recent models and papers, all seem to be at best not really moving the industry forward. At worst, well, they seem like they are an intentional grift or at a minimum plagiarism. Due to there inability to correctly answer characters lacking certain identifying aspects. I’m sorry if that sounds too general, but in a effort to keep this short I will simply say that if you use o-1 & o-1 mini; Claude, Haiku, and Opus; Llama 70B & 405Band many of its variants; Gemini Pro; and to some degree Mistral though I find that Mistral has fewer, more focused strengths, what you find is that they, at least in English, are able to generalize and abstract in mind blowing ways already. Particularly OpenAI and Anthropic’s models. Despite their flaws, Gemini, Llama, Mistral, and Grok all have extremely amazing ability to infer from a query what the next words should factually be given the effectiveness of the prompt and, often, even with ineffective prompts. However, Qwen out of the box has not, as of yet, shown me anything, but party tricks. Previous versions of Qwen have performed absolutely horribly when I have tested their ability to make certain connections between ideas that every other model, even the 6B-8B models, tend to do without missing a beat. I have several theories as to why and as a scientist, enthusiast, and member of the human race it bums me out. I will leave those theories to your imagination. With all that said, I look forward to putting this new Qwen through its paces and see if it finds a place in my stack. Believe it or not, this was by far the TL; DR because I could go on for some time about this topic. Anyway… Great video. I look forward to following your content moving forward. Cheers.
Wait....... How do you fit the context length for extraction? I mean, it's sooooo long.... And your ram only 64GB and you use 32B model. hmm. I really wonder why it can fit and works really well. >,
Great 👍👍🥰🥰 Thanks.
nice. What IDE is that?
The model is fully open? Or is through an API? Does any info, data, meta-data go back to anyone’s servers or is it 100% tested local?
Quick question. Are you putting the subtitles automatically? If so what tool are you using?
What is your Macbook Specs?
Can you do a multi-run test to see how many of those chained outputs fail? Are we talking 70% or 99%?
This looks amazing!
But I dont think by Intel Core I7, 16GB RAM DDR4, RTX 3050 4GB (Acer Nitro 5) Laptop will be able to handle it.
Nice video.
It is also NOT as smart as o1-preview. o1-preview was the first model ever that was able to solve my puzzle, and QwQ made stupid logic errors.
Totally. It's no where near o1-preview or o1 (just released). For local reasoning though it's a massive step forward.
what are the specs of your MacBook pro?
I'm impressed how quickly you can generate tokens on QwQ
I have an M4 Pro, MBP, but with only 24gb ram so I can't run the model locally
@@daburritoda2255 At 3:36 he has a description on the screen about the specs of his Macbook. He has an M2 Max with 64 GB RAM.
Call it Quick
Hard to use??? I’m thinking nahhhhhh, immediately disregard it as there are other models that are easier to use
deepseek r1 is better than qwq, sadly they didnt released the model and API yet
No Patreon? Come on man. Your content is way to important for you not to be getting memberships.