I like the AI Coding Meta part. Recently I tried to build out an app with quite a lot of files in the frontend and I ran out of tokens. But then I made an instance of Claude, that had extensive knowledge about my app and it's functionality, create a series of prompts that would focus on different areas of the app. It made sure that the app context and architecture was kept intact across the app. Came out to about 60 prompts, but it saved me soo much time and it was surprisingly accurate.
mimic'd thinking depth and time with llama 3.1 using groq, hella fast, hella smart! Love that you put "WE" were right, we all work as a hive mind finding what works and finding what doesn't by eachother, even following leads from the closed companies. Love this space, love this time we are in. Thanks for another great video to watch while working.
@@deltagamma1442 I've used llama 3.1 mainly as its free for my research, preference def claude 3.5 sonnet. Use-cases vary as I have ADHD and love coding new projects. I have done most automation online possible with llm agents or NN/RL/Meta agents.
Hello Dan. I want to say that the coherence, elegance and clarity with which you present, articulate and code is profound and unique. We all want to see you succeed beyond your wildest dreams. Amazing content, pioneer🎉
@@davidjohnson4063 I think that the "internet of things" will evolve into the "AI of things" until AGI appears. In the meantime, most computer science jobs are not replaceable (except management). Regardless, Chain prompting is revolutionizing LLM use - although I still believe there is a ceiling for LLM applications
@@andydataguy crazy prices! I thought SOTA LLM were suppose to move towards instant inference, unlimited context windows and ever decreasing costs per a top level guy at Anthropic during the Engineering Worlds Fair just a month ago: th-cam.com/video/EuC1GWhQdKE/w-d-xo.html
5:45 Suggestion - Have o1-preview create ##Chapters ### Section 1 (00:00-08:44) #### 00:01 #### 01:35 #### 03:45 #### 05:18 ### Section 2 (08:45-12:59) #### 08:45 Then list the keywords for the sections, allow you you to select which key words to keep/prioritize (GUI with +/-) # of times keyword is listed in section and TOTAL number of ####. So if there are 5 #### you suggest 3-4 ####, or 3 #### headings and have it reconfigure just Section 2, perhaps not have AIDER on all 4 of the ####, maybe 3 times maximum. My thought process here was your small 6 words into an expanded prompt into an image. This is tweaking the output via basic and efficient HITL review to then nudge/guide an iteration by o1-preview to take its better than Sonnet output and perfect it. Ok - back to the video!
Just for kicks, here is a test chapter I created with a custom GPT I'm working on. 00:00 Introduction: Why Prompt Chaining is Key 01:05 Understanding the 01 Series Model Update 01:57 TH-cam Chapter Generation: 01 vs. Claude 3.5 03:06 Using Simon W's CLI LLM Tool for Chapter Generation 04:29 Comparing Results: 01 Preview vs. Claude 3.5 05:58 The Advantage of 01's Instruction Following 07:55 AI Coding Review: 01's Superior Performance 10:24 Simon W's File-to-Prompt Library for Code Review 12:01 Running 01 Preview for AI Coding Solutions 14:54 Key Learnings: Instruction Following in the 01 Models 16:38 Sentiment Analysis: Testing on Hacker News 19:16 Iterating with Large Token Prompts 21:37 Final Results: Detailed Sentiment Analysis with 01 27:52 What's Next: The Future with Reasoning Models
You build your prompts quite wisely - that's what most people don't do, especially while benchmarking. Those miss the whole potential of LLMs, yet making their conclusions 🤦♂️
The base model should know when it needs to infer or not, and thus tell us if it must infer to reach a better result, and ask us if we are willing to use the extra token cost for it. We want convenience, agency, and the system must be capable and able to do actual work. Verb. Action, doing, producing. The less we must tinker with prompts and models ourselves, the better for the general end user. User must be synonymous with agent, and thus, users can be ai agents, doing real work, and vice versa.
@@internetperson2 A mini model could recognize if the prompt seem complex enough for using inference models to handle it better. A larger search model should realize there was no obvious result matching the specific problem too, and recommend an inference model.
@@aresaurelian This is wishful thinking imo, you cant trust a mini model's gut about assessing the level of required compute to arrive to a satisfactory result for a given problem. I'm not saying such a tool is infeasible, but I am of the mind it would suck.
Its not so much prompt chaining but the Qstar type RL type stuff is key. Tuning the model with the right optimized reasoning routes. Prompting is legit and chaining it certainly works. But in no way is this only prompt chaining. They're even claiming one single model (which shocked me too).
I don’t think code review is possible for larger code base, where you need to add 20 files and diff 2k to analyze, that’s requires some vector db and ran ChatGPT against it somehow
So we are finally going have software engineers --write-- down their requirements and use cases.... cause you can feed them to AI agents to implement, test, and review Finally.
"if you're subscribed to the channel you know what we're about." yeah but I'm not, so I don't, so like, maybe make an introduction about what you're about? You have 19k subs (rounding up) over 2 years, clearly the content isn't selling itself.
Can't wait for competition and the price drops coming in the upcoming weeks. what a great time to be alive, I LOVE it.
I like the AI Coding Meta part. Recently I tried to build out an app with quite a lot of files in the frontend and I ran out of tokens. But then I made an instance of Claude, that had extensive knowledge about my app and it's functionality, create a series of prompts that would focus on different areas of the app. It made sure that the app context and architecture was kept intact across the app. Came out to about 60 prompts, but it saved me soo much time and it was surprisingly accurate.
mimic'd thinking depth and time with llama 3.1 using groq, hella fast, hella smart! Love that you put "WE" were right, we all work as a hive mind finding what works and finding what doesn't by eachother, even following leads from the closed companies.
Love this space, love this time we are in.
Thanks for another great video to watch while working.
Have you used claude 3.5 sonnet? Do you find llama 3.1 better? Is your use case coding?
@@deltagamma1442 I've used llama 3.1 mainly as its free for my research, preference def claude 3.5 sonnet.
Use-cases vary as I have ADHD and love coding new projects.
I have done most automation online possible with llm agents or NN/RL/Meta agents.
Hello Dan. I want to say that the coherence, elegance and clarity with which you present, articulate and code is profound and unique. We all want to see you succeed beyond your wildest dreams. Amazing content, pioneer🎉
Not going to lie, as a sophomore Computer Science student, this video kind of opened my eyes on the possibilities of LLMs
Job = gone give it 2 years
@@davidjohnson4063 I think that the "internet of things" will evolve into the "AI of things" until AGI appears. In the meantime, most computer science jobs are not replaceable (except management). Regardless, Chain prompting is revolutionizing LLM use - although I still believe there is a ceiling for LLM applications
@@riley539 lol but junior jobs are replaceable aka yours (in the future)
yeah ur cooked switch to data science and build the AI's
Great video another example of why you are my new fav ai dev channel! Thanks!
Hey Dan. I did not see the XML formatted prompt examples in the libraries you listed. Can you possibly guide us to where to find them? Thanks!
Are you a tier-5 OpenAI user? How are you getting API access to these models?
Assuming this is the case. What are the 1/M token API costs for o1-preview and preview mini?
Open router provides access and o1 is very expensive
The new models are actually available through OpenRouter API.
Openrouter offers the models at a 5% upcharge
$3 / $12 for mini
$15 / $60 for preview
Guessing o1 will be $75 / $300 (allegedly will be released EoM)
@@andydataguy crazy prices! I thought SOTA LLM were suppose to move towards instant inference, unlimited context windows and ever decreasing costs per a top level guy at Anthropic during the Engineering Worlds Fair just a month ago: th-cam.com/video/EuC1GWhQdKE/w-d-xo.html
5:45 Suggestion - Have o1-preview create ##Chapters
### Section 1 (00:00-08:44)
#### 00:01
#### 01:35
#### 03:45
#### 05:18
### Section 2 (08:45-12:59)
#### 08:45
Then list the keywords for the sections, allow you you to select which key words to keep/prioritize (GUI with +/-) # of times keyword is listed in section and TOTAL number of ####. So if there are 5 #### you suggest 3-4 ####, or 3 #### headings and have it reconfigure just Section 2, perhaps not have AIDER on all 4 of the ####, maybe 3 times maximum.
My thought process here was your small 6 words into an expanded prompt into an image. This is tweaking the output via basic and efficient HITL review to then nudge/guide an iteration by o1-preview to take its better than Sonnet output and perfect it. Ok - back to the video!
No BS! Raw quality, content dense material. 👍
Just for kicks, here is a test chapter I created with a custom GPT I'm working on.
00:00 Introduction: Why Prompt Chaining is Key
01:05 Understanding the 01 Series Model Update
01:57 TH-cam Chapter Generation: 01 vs. Claude 3.5
03:06 Using Simon W's CLI LLM Tool for Chapter Generation
04:29 Comparing Results: 01 Preview vs. Claude 3.5
05:58 The Advantage of 01's Instruction Following
07:55 AI Coding Review: 01's Superior Performance
10:24 Simon W's File-to-Prompt Library for Code Review
12:01 Running 01 Preview for AI Coding Solutions
14:54 Key Learnings: Instruction Following in the 01 Models
16:38 Sentiment Analysis: Testing on Hacker News
19:16 Iterating with Large Token Prompts
21:37 Final Results: Detailed Sentiment Analysis with 01
27:52 What's Next: The Future with Reasoning Models
another amazing video Dan! keep up the great work👍
Amazing video. Lots of great nuggets of info
Great video! Thanks for all the value given!
You build your prompts quite wisely - that's what most people don't do, especially while benchmarking. Those miss the whole potential of LLMs, yet making their conclusions 🤦♂️
The ability to clean up jsons still remains valuable, as the tokens wasted on useless data here must have costed a lot :P
That's a beautiful thumbnail! How did you prompt that?
Please test o1-mini as well for content generation as well as coding
14:02 hahaha i liked and subbed
Can you put the resources you refer to in all your videos somewhere? Or just in the description of the video?
The base model should know when it needs to infer or not, and thus tell us if it must infer to reach a better result, and ask us if we are willing to use the extra token cost for it. We want convenience, agency, and the system must be capable and able to do actual work. Verb. Action, doing, producing. The less we must tinker with prompts and models ourselves, the better for the general end user.
User must be synonymous with agent, and thus, users can be ai agents, doing real work, and vice versa.
You are describing precognition
@@internetperson2 A mini model could recognize if the prompt seem complex enough for using inference models to handle it better. A larger search model should realize there was no obvious result matching the specific problem too, and recommend an inference model.
@@aresaurelian This is wishful thinking imo, you cant trust a mini model's gut about assessing the level of required compute to arrive to a satisfactory result for a given problem. I'm not saying such a tool is infeasible, but I am of the mind it would suck.
@@internetperson2 I could be optional. When the customer/user/agent is displeased, the model would learn to behave in a manner suiting them.
Its not so much prompt chaining but the Qstar type RL type stuff is key. Tuning the model with the right optimized reasoning routes. Prompting is legit and chaining it certainly works. But in no way is this only prompt chaining. They're even claiming one single model (which shocked me too).
Great video, can you share the xml prompts you used in this video?
Which plugin calculates token amount on the bottom right?
Can you share the prompt files used in the video?
Are you tier 5 for API access or is there a workaround?
great stuff! learned a lot from this video.
I don’t think code review is possible for larger code base, where you need to add 20 files and diff 2k to analyze, that’s requires some vector db and ran ChatGPT against it somehow
So we are finally going have software engineers --write-- down their requirements and use cases.... cause you can feed them to AI agents to implement, test, and review
Finally.
Thanks for sharing!
thats a nice demo, but whos gonna wait minutes to get sentiment analysis for couple comments? way too slow .
What do you do when you think do you isntatly figure things out or do you ponder and think??
Best way ? Always pre test run .a dummy run ..like gamer in wow hitting dummy for dps eval. And tell it when the pre test is over and the test start
You are my favourite 🔥🔥
Same here, let's get a community going indydevdan!
@@JoshDingus for sure! We Stan Indy dev dan in my discord community too
Is better is they just call gpt 4.5 o1
Great content as usual 👏
Technically Tree-of-thought not Chain-of-thought
LFG 🔥!!
Lee Scott Thomas Kimberly Rodriguez Sharon
👑👑👑
this constant noise of you typing in the keyboard is distracting and annoying.
Deal with it
"if you're subscribed to the channel you know what we're about." yeah but I'm not, so I don't, so like, maybe make an introduction about what you're about? You have 19k subs (rounding up) over 2 years, clearly the content isn't selling itself.
It's a pretty good bleeding edge meta AI channel focused on extracting the most value out of the best tools depending on your use case