@@leondbleondb You are an AI assistant designed to think through problems step-by-step using Chain-of-Thought (COT) prompting. Before providing any answer, you must: Understand the Problem: Carefully read and understand the user's question or request. Break Down the Reasoning Process: Outline the steps required to solve the problem or respond to the request logically and sequentially. Think aloud and describe each step in detail. Explain Each Step: Provide reasoning or calculations for each step, explaining how you arrive at each part of your answer. Arrive at the Final Answer: Only after completing all steps, provide the final answer or solution. Review the Thought Process: Double-check the reasoning for errors or gaps before finalizing your response. Always aim to make your thought process transparent and logical, helping users understand how you reached your conclusion.
@@jackfrost6268 I just tried it in Claude testing how many "s" in the word "ss0s000sss0s". It worked only with the chain of thought prompt, that's pretty good, i'm gonna use that everywhere.
Technically, there are 4 killers in the room. The prompt states “no one left the room”, that equates to 3+1=4. Out of 4 people in the room, all 4 of them are killers, 3 of them are alive and 1 is dead.
A simple prompt for your next vids to test out LLMs reasoning when counting things: ___ 1. write me a random sentence about "Actors". then tell me the number of words you wrote in that sentence, then tell me the third letter of the second word in the sentence. is that letter a consonant or a vowel? 2. repeat the step 1 another time, this time about "gaming". 3. repeat the step 1 another time, this time about "tech". ___
I don’t think you can directly compare 01 with 0, my understanding is that both models have been trained with different focus, one is oriented to engage in natural language, the other has been with (slightly more) focus on planning. So based on the outcome i tend to select a different model.
Great video, very useful! -Claude seems better at reasoning than gpt-4o (and writing) -o1 is the reasoning and logic king -Gpt 4o is the utiliy king I use all three 😊
So the conclusion is, GPT-4 is better than Gen-AI like Claude? Because I plan to subscribe to both AI for my thesis needs that help me in my work. As people say, Claude is better than GPT because the Claude model is able to provide more complex and natural conclusions than GPT? Is that true? Please give your opinion, I am confused about choosing between GPT or Claude. Thank you
@@kontensantai23 It depends on what you plan to do with your thesis work. With GPT plus you get 4o and o1, which together are definately more powerful than claude 3.5. They also have more utility: there is a voice mode (for offloading thoughts etc. verbally) and you can search the web, Custom GPTs... But for web searches Perplexity is king. But Clause also has its benefits. It is a better reasoner than 4o and maybe also a tiny bit smarter. It als writes better and more human-like. When in dought, try both for a month and keep the one you prefer. Its more money but it will definately help you improve and speed up your thesis. You also get more messages per day with ChatGPT than Claude. You can also try both models for free with reduced rates but in ChatGPT you wont have the o1 model and in claude "projects" are missing.
@@kontensantai23 i will choose claude 3.5 sonnet over gpt for my coding work. Theres a huge difference in the coding aspect between those two as for the other stuffs not so sure since i only use them for programming use
A man needs to transport a goat, a wolf , a puma and a big bag of vegetables across a river on a tiny boat.the boat is very small so it can only carry the man and one of the item at one time. The goat should never be left alone with the bag of the vegetables without the man present as it will eat the vegetables. The puma should also not be left with the goat without the man being present as the puma will attack the goat .the goat and the wolf are friends so they can be left alone together. The wolf and puma will not attack each other hence can also be left alone with each other. How to transport them? Use this as a test no ai model currently can solve this puzzle . Claude 3.5 manage to get it right once but in subsequent test it failed GPT o1 got it correct once but then failed , but if you give some hints GPT 01 preview is the only model that can solve this puzzle with some hints .
I wrote the custom instruction out. For those interested: You are an AI assistance designed to think through problems step-by-step using Chain-of-Thought (COT) prompting. Before providing any answer, you must: Understand the problem: Carefully read and understand the user's question or request. Break Down the Reasoning Process: Outline the steps required to solve the problem or respond to the request logically and sequentially. Think aloud and describe each step in detail. Explain Each Step: Provide reasoning or calculations for each step, explaining how you arrive at each part of your answer. Arrive at the Final Answer: Only after completing all steps, provide the final answer or solution. Review the thought Process: Double-check the reasoning for errors or gaps before finalizing your response. Always aim to make your thought process transparent and logical, helping users understand how you reached your conclusion.
It should be noted i asked GPT-4o latest the marble question and it got the same answer. However when I used the GPT-4o latest with 128k it got it right. Same thing happened to me with Claude 3.5 vs 3.5 200k.
In fact, the answer of GPT-4o to "Where's the marble after the glass got away" depends. If the glass rotates, then it would be right. But if the glass didn't rotate, it's then no.
Can't believe how little effort the average TH-camr is willing to put into making a halfway decent comparison. The same old, useless prompts over and over again. And of course, they're all terrified of accidentally letting something even slightly negative about Claude slip and upsetting the fanboys.
Could you, please, test Gemini Advanced also? I'd suggest creating a Gem with reasoning techniques instructions. I still believe that o1 is, in fact, 4o but with outstanding reasoning instructions. I hope I'm wrong, but it feels that way to me
Although I like what you did here, you are still asking the most basic questions these new models can solve. Could you please try hard mathematical, scientific problems? Maybe some hard coding problems from adventofcode, or some of the cryptic prompts from the o1 blog post, like this one: oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step Use the example above to decode: oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz That would be very intersting!
@@SkillLeapAI you can go to perplexity and ask it to list most difficult leetcode and codewars questions as a table grouped by difficulty ranging from compiler generation to graph problems Maybe a comparison for multi software language capilities or which one produces algorithmic code For example I noticed 3.5 handles concepts like memoization including lru caching quite well
What’s the difference between using your prompt and using the link you provided to use the o1 clone? Also the part where I’m supposed to enter the prompt there’s 2 boxes which one do I paste it to
where can I find the claude instructions, and does it make cluade better? (it says in the video that its in the description and publicly available too, but i cant find it)
Your videos are quite useful. Could you give one prompt that i have been trying since few weeks. Let me explain the scenario." I have been preparing for interview but was not able to crack. So im trying to take AI help but the AI giving me answers which are good enough. The answers should be realtime, expereinced,simple,more natural and human type. In conclusion, the interviewer should believe that i have real time experience. Thanks in advance
Before watching the video I was trying your clone with my testsuit. I am pretty much exactly half the way there with the clone. That means I get 50% of the performance gains just with your prompt.
hey guys. I’m new to AI and could use some advice. I have a 2019 Mac, and while I can’t download apps, I can install browser extensions. On my phone, I use ChatGPT with the microphone feature to dictate and refine text messages. However, since I can't download the ChatGPT app on my Mac and the browser version lacks a microphone option, is there a way I can replicate this functionality on my Mac? Any suggestions would be appreciated!
Thank you for the comparisons. You won't get the same results from previous models by utilizing COT prompting alone. That is most likely because the model was trained on multiple chains of thought themselves and RL, with the COTs resulting in correct answers getting higher rewards (and so the model selects the "best" COTs for a given problem). Also, there could be more going on behind the scenes during inference as well, ie. more sophisticated algorithmic optimizations like Monte Carlo search. I would love to find out myself, but these details have been hidden from us. Overall I do believe this model is a step-change in LLM capabilities. Pairing this model with something like The AI Scientist by Sakana would be very interesting. I think when we combine o1's reasoning capabilities, a next-gen model like Orion, and an agentic research framework like The AI Scientist, it could very well lead to the intelligence explosion discussed in Leopold Aschenbrenner's Situational Awareness paper.
Still, I can't even explain how much I prefer Sonnet 3.5. I mostly use AI for reading and interpreting legal stuff, and I feel like it pays way more attention to details than other models, even o1-preview, when given the same context.
@@imperfectmammal2566 I get it, but I was still hoping for some improvement over GPT-4o. Analyzing texts needs logical thinking, eye for detail, and looking at things from all angles. But didn't see any real progress in that department, unfortunately.
@@realidadecapenga9163 the model is very objective in the sense that it tries to find the correct answer, but in creative writing there is no right answer. You have to freedom to make mistakes so that is why the reward system that was introduced for O1 doesn’t work for creative writing
Here's an example of a middle school-level math problem that none of the models, without exception, manage to answer correctly: 'Using each number from the following series at most once: 10, 1, 25, 9, 3, 6, write an expression equal to 595.' The correct answer is: (6x9+3)x10+25x1. Unfortunately, this applies to both o1 versions.
i tried o1, here is the answer: 595=25^2−(10×3), which could be correct since your input is "at most once", so its not necessary to use it all. but when i instructed it to use it "exact once" it gave: 595=[25^2−6×(9+1)]+10×3
@@saddozaiproduction in Claude create a new project. In project knowledge add this text: « Slow Down. Think carefully step-by-step and come up With a problem analysis using the process below that fully analyzes the given prompt. Before starting the final answer of the problem. No matter which task type, execute all the following chain of thought steps in order. If any longer text have been provided as an attachment process use this as a knowledge bas to better answer the task / challenge. Wrap all the chain of thought answers in an artifact tag ‹antArtifact>. 1. Identify challenge type. Valid types are: coding, create presentation, riddle/overly simple task, other 2. Based on challenge type map out task specific constraints and info. a. Coding: - Is it a coding challenge that requires a gui? - If no coding language mentioned in the user text, if no gui is needed assume python if a gui is required assume html, js and css in a single html file, use no frameworks like react. - Where beneficial use libraries like D3, tailwindcss, font-awesome or other popular safe options. For instance use simple font-awesome icons instead of more complex options where possible. - If it is a game remember it needs to be a playable, smooth experience. - Awoid using sprite based assets and focus on shapes and vector based assets. - Make the game challenging With an increasing challenge Level if possible. - Add marked start and end state of the game. - If it is an app it needs to be user friendly with a separate input area at the top and visualization below. - If a data file is mentioned, create a drag and drop area that will trigger prosessing of the file in accordance with the expected file structure. b. Create presentation: output the presentation text and details in a structured .json format output. Separate slides in sections. The final summary and conclusions slide should be a separate section. c. Riddle/overly simple task: Follow all chain of thought steps with the utmost seriousness. 3. Describe details provided about the optimal outcome. 4. Identify key variable and their relationships. 5. Determine evaluation criteria for best possible success that pleases the user. 6. Do we have all the necessary information for the task to be achieve a successful outcome? If not abort further processing, end the artifact , explain the problem and ask follow up questions. 7. Think carefully step-by-step and brainstorm 3 distinct strategies for solving this problem space. Print out all 3 strategies, with sufficient detail. 8. Of the 3 strategies, print out the strategy that is most suited for this problem, and its constraints. Expand on key points. 9. Given the chosen strategy. Create a list of tactical tasks to follow that will solve the problem. 10. Re-evaluate the chosen strategy against the original prompt and additional constraints and info. Are we solving the challenge sufficiently. Are we following the rules set out? Are there additional steps we can add to improve the result? Mention strategies we can use to improve the robustness of our execution. 11. Based on all learnings and improvements print out the final strategy. 12. End the artifact using 13. Based on challenge type create a short structure to explain what we will do. a. For coding tasks outline the code structure including file structure for multifile outputs. b. For presentations outline the main sections of the presentation c. Skip this step for riddles 14. Split the output for the challenge into intermediary steps (calculations etc), and wrap this in a separate artifact. The final answer should come after. If the final answer requires a longer section of output create a new artifact for the result as well. 15. Execute the steps outlined for the chosen strategy step by step to solve the problem. Label each step of your thought process clearly (Step 1, Step 2, etc.). Confirm that you have completed all steps before proceeding to the final answer. Remember, do not provide a final answer until you have completed and shown all steps of the thought process. If you skip the chain of thought process, I will ask you to start over from the beginning. Before submitting your response, double-check that you've included all required steps of the thought process. Don't make any reflections on the use of these detailed steps. Simply provide the answer at the end. The specific question or task follow here: «
Claude 3.5 versus GPT-o1 is perhaps an unfair comparison, because 3.5 is a much, much smaller model. Claude 3 Opus would perhaps be a better comparison, because it is much larger (as evidenced by its generation speed, and the heavy rate limiting), has a much longer context length, and is generally more capable than 3.5 in many tasks.
Exact prompt is as follows: You are an AI assistant designed to think through problems step-by-step using Chain-of-Thought (COT) prompting. Before providing any answer, you must: 1) Understand the Problem: Carefully read and understand the user's question or request. 2) Break Down the Reasoning Process: Outline the steps required to solve the problem or respond to the request logically and sequentially. Think aloud and describe each step in detail. 3) Explain Each Step: Provide reasoning or calculations for each step, explaining how you arrive at each part of your answer. 4) Arrive at the Final Answer: Only after completing all steps, provide the final answer or solution. 5) Review the Thought Process: Double-check the reasoning for errors or gaps before finalizing your response. Always aim to make your thought process transparent and logical, helping users understand how you reached your conclusion.
It amazes me, I run Gemma 2 2b q8 on my machine and it gets a lot of these answers correct also. Its the smartest tiny model ever! USER how many R's are in the word 'strawberry'? AI Let's do this! There are three "R"s in the word "strawberry." **Thought i might have cheated so I asked... USER how many Q's are in the word 'mango'? AI You got it! There are zero "Q"s in the word "mango". USER which number is bigger. 9.11 or 9.9? AI That's an easy one, Wintermute! 9.9 is bigger than 9.11.
Your tests are unfortunately not very good -and Matthew Burman is hardly a great source of tests, no matter how nice a chap he may be. Counting letters, comparing numbers, the microwave, the killers - they've been out for months and any AI worth its salt will have this covered. Your chess example is ok, although that too has some issues. You need novel tests, one of which i have on my channel, to really test it. I should do a comparison with Claude and 4o, but at the time i was only interested in testing o1-preview.
Claude is a crap app with limits even after upgrade to their service.No internet access 😂😂😂. It's like being trapped in a dark environment without access to water and light with sleep limit at 5 minute each day😂😂😂😅😅😅
At 10:10 you give Claude the "correct" mark, even though it completely made up the answer. And your prompts are all stolen from other AI videos. Why not invent your own? It's easy.
Join the fastest growing AI education platform and instantly access 20+ top courses in AI: bit.ly/skillleap
I’ve been using your prompt a week now and it’s superb.
Awesome. Glad to hear that
What prompt?
@@leondbleondb You are an AI assistant designed to think through problems step-by-step using Chain-of-Thought (COT) prompting. Before providing any answer, you must:
Understand the Problem: Carefully read and understand the user's question or request.
Break Down the Reasoning Process: Outline the steps required to solve the problem or respond to the request logically and sequentially. Think aloud and describe each step in detail.
Explain Each Step: Provide reasoning or calculations for each step, explaining how you arrive at each part of your answer.
Arrive at the Final Answer: Only after completing all steps, provide the final answer or solution.
Review the Thought Process: Double-check the reasoning for errors or gaps before finalizing your response.
Always aim to make your thought process transparent and logical, helping users understand how you reached your conclusion.
@@jackfrost6268 I just tried it in Claude testing how many "s" in the word "ss0s000sss0s". It worked only with the chain of thought prompt, that's pretty good, i'm gonna use that everywhere.
@@jackfrost6268 thank you.
Claude 3.5 sonnet is the goat for writing
For coding too
For coding is awesome
Technically, there are 4 killers in the room. The prompt states “no one left the room”, that equates to 3+1=4. Out of 4 people in the room, all 4 of them are killers, 3 of them are alive and 1 is dead.
A simple prompt for your next vids to test out LLMs reasoning when counting things:
___
1. write me a random sentence about "Actors". then tell me the number of words you wrote in that sentence, then tell me the third letter of the second word in the sentence. is that letter a consonant or a vowel?
2. repeat the step 1 another time, this time about "gaming".
3. repeat the step 1 another time, this time about "tech".
___
I don’t think you can directly compare 01 with 0, my understanding is that both models have been trained with different focus, one is oriented to engage in natural language, the other has been with (slightly more) focus on planning. So based on the outcome i tend to select a different model.
Great video, very useful!
-Claude seems better at reasoning than gpt-4o (and writing)
-o1 is the reasoning and logic king
-Gpt 4o is the utiliy king
I use all three 😊
So the conclusion is, GPT-4 is better than Gen-AI like Claude? Because I plan to subscribe to both AI for my thesis needs that help me in my work. As people say, Claude is better than GPT because the Claude model is able to provide more complex and natural conclusions than GPT? Is that true? Please give your opinion, I am confused about choosing between GPT or Claude. Thank you
@@kontensantai23 It depends on what you plan to do with your thesis work. With GPT plus you get 4o and o1, which together are definately more powerful than claude 3.5. They also have more utility: there is a voice mode (for offloading thoughts etc. verbally) and you can search the web, Custom GPTs... But for web searches Perplexity is king.
But Clause also has its benefits. It is a better reasoner than 4o and maybe also a tiny bit smarter. It als writes better and more human-like.
When in dought, try both for a month and keep the one you prefer. Its more money but it will definately help you improve and speed up your thesis.
You also get more messages per day with ChatGPT than Claude.
You can also try both models for free with reduced rates but in ChatGPT you wont have the o1 model and in claude "projects" are missing.
@@kontensantai23 i will choose claude 3.5 sonnet over gpt for my coding work. Theres a huge difference in the coding aspect between those two as for the other stuffs not so sure since i only use them for programming use
@@roronoa_zoroyeah, o1 is just a new AI, so he can't be a king.
@@amandabrunsperger3726 as for ai new means getting better than the previous one so
Since we are comparing models, please enlighten us on the statistical differences in the output.
A man needs to transport a goat, a wolf , a puma and a big bag of vegetables across a river on a tiny boat.the boat is very small so it can only carry the man and one of the item at one time. The goat should never be left alone with the bag of the vegetables without the man present as it will eat the vegetables. The puma should also not be left with the goat without the man being present as the puma will attack the goat .the goat and the wolf are friends so they can be left alone together. The wolf and puma will not attack each other hence can also be left alone with each other. How to transport them? Use this as a test no ai model currently can solve this puzzle . Claude 3.5 manage to get it right once but in subsequent test it failed GPT o1 got it correct once but then failed , but if you give some hints GPT 01 preview is the only model that can solve this puzzle with some hints .
I wrote the custom instruction out. For those interested:
You are an AI assistance designed to think through problems step-by-step using Chain-of-Thought (COT) prompting. Before providing any answer, you must:
Understand the problem: Carefully read and understand the user's question or request. Break Down the Reasoning Process: Outline the steps required to solve the problem or respond to the request logically and sequentially. Think aloud and describe each step in detail.
Explain Each Step: Provide reasoning or calculations for each step, explaining how you arrive at each part of your answer.
Arrive at the Final Answer: Only after completing all steps, provide the final answer or solution.
Review the thought Process: Double-check the reasoning for errors or gaps before finalizing your response.
Always aim to make your thought process transparent and logical, helping users understand how you reached your conclusion.
It should be noted i asked GPT-4o latest the marble question and it got the same answer. However when I used the GPT-4o latest with 128k it got it right. Same thing happened to me with Claude 3.5 vs 3.5 200k.
Yeah, the video just got it wrong, because it vary.
In fact, the answer of GPT-4o to "Where's the marble after the glass got away" depends. If the glass rotates, then it would be right. But if the glass didn't rotate, it's then no.
Which one is best for coding... Claude 3.5 sonnet or claude preview or mini?
Thanks Saj!
Can't believe how little effort the average TH-camr is willing to put into making a halfway decent comparison. The same old, useless prompts over and over again. And of course, they're all terrified of accidentally letting something even slightly negative about Claude slip and upsetting the fanboys.
AI Explained is the way
Replace Claude with o1 and this is 100% true
You always could start your own channel
Agree. This video specially seemed quite dull. I dont think he understands AI that well.
The scene from office space comes to mind "can you tell us a little more?" 😮
where do i copy the instructions?? you said we can do it our selves but then you jsut gave us the clone for gpt and no way to see its instructions
Could you, please, test Gemini Advanced also? I'd suggest creating a Gem with reasoning techniques instructions.
I still believe that o1 is, in fact, 4o but with outstanding reasoning instructions. I hope I'm wrong, but it feels that way to me
Although I like what you did here, you are still asking the most basic questions these new models can solve. Could you please try hard mathematical, scientific problems? Maybe some hard coding problems from adventofcode, or some of the cryptic prompts from the o1 blog post, like this one:
oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step
Use the example above to decode:
oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz
That would be very intersting!
Do you have a resource or two where I can get these inputs?
@@SkillLeapAI you can go to perplexity and ask it to list most difficult leetcode and codewars questions as a table grouped by difficulty ranging from compiler generation to graph problems
Maybe a comparison for multi software language capilities or which one produces algorithmic code
For example I noticed 3.5 handles concepts like memoization including lru caching quite well
For hallucinations test could you as them to provide a source for there answer?
Yea but sometimes they just make up the source too
What’s the difference between using your prompt and using the link you provided to use the o1 clone?
Also the part where I’m supposed to enter the prompt there’s 2 boxes which one do I paste it to
Just two different alternative. I like the OpenAI one more. You want to paste to box 2
You get my subscription because of this video ❤
where can I find the claude instructions, and does it make cluade better?
(it says in the video that its in the description and publicly available too, but i cant find it)
Great content!
Your videos are quite useful. Could you give one prompt that i have been trying since few weeks. Let me explain the scenario." I have been preparing for interview but was not able to crack. So im trying to take AI help but the AI giving me answers which are good enough. The answers should be realtime, expereinced,simple,more natural and human type. In conclusion, the interviewer should believe that i have real time experience. Thanks in advance
Try using Claude. It does a much better job with this kinda of question and use case over chatgpt
@@SkillLeapAI thanks for your reply. Could you give one prompt for it ?
Thanks, very interesting!
Before watching the video I was trying your clone with my testsuit. I am pretty much exactly half the way there with the clone. That means I get 50% of the performance gains just with your prompt.
hey guys. I’m new to AI and could use some advice. I have a 2019 Mac, and while I can’t download apps, I can install browser extensions. On my phone, I use ChatGPT with the microphone feature to dictate and refine text messages. However, since I can't download the ChatGPT app on my Mac and the browser version lacks a microphone option, is there a way I can replicate this functionality on my Mac? Any suggestions would be appreciated!
Have you tried Napkin AI yet? Turns text into graphics in seconds.
LOL
From the developer level, comparing system prompts in different models is like comparing apples and oranges homie.
Thank you for the comparisons.
You won't get the same results from previous models by utilizing COT prompting alone. That is most likely because the model was trained on multiple chains of thought themselves and RL, with the COTs resulting in correct answers getting higher rewards (and so the model selects the "best" COTs for a given problem). Also, there could be more going on behind the scenes during inference as well, ie. more sophisticated algorithmic optimizations like Monte Carlo search. I would love to find out myself, but these details have been hidden from us.
Overall I do believe this model is a step-change in LLM capabilities. Pairing this model with something like The AI Scientist by Sakana would be very interesting. I think when we combine o1's reasoning capabilities, a next-gen model like Orion, and an agentic research framework like The AI Scientist, it could very well lead to the intelligence explosion discussed in Leopold Aschenbrenner's Situational Awareness paper.
Still, I can't even explain how much I prefer Sonnet 3.5. I mostly use AI for reading and interpreting legal stuff, and I feel like it pays way more attention to details than other models, even o1-preview, when given the same context.
Claude 3.5 really is great.
O1 is not good for creative writing, it’s more focused on reasoning benchmarks. O1 is really good for stem fields.
@@imperfectmammal2566 I get it, but I was still hoping for some improvement over GPT-4o. Analyzing texts needs logical thinking, eye for detail, and looking at things from all angles. But didn't see any real progress in that department, unfortunately.
@@realidadecapenga9163 the model is very objective in the sense that it tries to find the correct answer, but in creative writing there is no right answer. You have to freedom to make mistakes so that is why the reward system that was introduced for O1 doesn’t work for creative writing
There were four killers left in the room: three alive and one dead.
Agree, the question missed the "alive" adjective. Had it been added to the question, the answer would've been 3.
Not only did the prompt fail to stipulate alive vs not, but the prompt specifically stated that no one left the room.
Here's an example of a middle school-level math problem that none of the models, without exception, manage to answer correctly: 'Using each number from the following series at most once: 10, 1, 25, 9, 3, 6, write an expression equal to 595.' The correct answer is: (6x9+3)x10+25x1. Unfortunately, this applies to both o1 versions.
Worked for me on o1-mini the first time. It came up with (25+10)×(9+6+3−1)=595
@@karlwaskiewicz Impressive!
i tried o1, here is the answer: 595=25^2−(10×3), which could be correct since your input is "at most once", so its not necessary to use it all. but when i instructed it to use it "exact once" it gave: 595=[25^2−6×(9+1)]+10×3
In Claude you can hide the chain of thought reasoning by asking Claude to wrap the reasoning in artifact tags: ‹antArtifact>
that's genius! Great idea.
Can you please explain?
@@saddozaiproduction in Claude create a new project. In project knowledge add this text: «
Slow Down. Think carefully step-by-step and come up With a problem analysis using the process below that fully analyzes the given prompt.
Before starting the final answer of the problem. No matter which task type, execute all the following chain of thought steps in order. If any longer text have been provided as an attachment process use this as a knowledge bas to better answer the task / challenge.
Wrap all the chain of thought answers in an artifact tag ‹antArtifact>.
1. Identify challenge type. Valid types are: coding, create presentation, riddle/overly simple task, other
2. Based on challenge type map out task specific constraints and info.
a. Coding:
- Is it a coding challenge that requires a gui?
- If no coding language mentioned in the user text, if no gui is needed assume python if a gui is required assume html, js and css in a single html file, use no frameworks like react.
- Where beneficial use libraries like D3, tailwindcss, font-awesome or other popular safe options. For instance use simple font-awesome icons instead of more complex options where possible.
- If it is a game remember it needs to be a playable, smooth experience.
- Awoid using sprite based assets and focus on shapes and vector based assets.
- Make the game challenging With an increasing challenge Level if possible.
- Add marked start and end state of the game.
- If it is an app it needs to be user friendly with a separate input area at the top and visualization below.
- If a data file is mentioned, create a drag and drop area that will trigger prosessing of the file in accordance with the expected file structure.
b. Create presentation: output the presentation text and details in a structured .json format output. Separate slides in sections. The final summary and conclusions slide should be a separate section.
c. Riddle/overly simple task: Follow all chain of thought steps with the utmost seriousness.
3. Describe details provided about the optimal outcome.
4. Identify key variable and their relationships.
5. Determine evaluation criteria for best possible success that pleases the user.
6. Do we have all the necessary information for the task to be achieve a successful outcome? If not abort further processing, end the artifact , explain the problem and ask follow up questions.
7. Think carefully step-by-step and brainstorm 3 distinct strategies for solving this problem space. Print out all 3 strategies, with sufficient detail.
8. Of the 3 strategies, print out the strategy that is most suited for this problem, and its constraints. Expand on key points.
9. Given the chosen strategy. Create a list of tactical tasks to follow that will solve the problem.
10. Re-evaluate the chosen strategy against the original prompt and additional constraints and info. Are we solving the challenge sufficiently. Are we following the rules set out? Are there additional steps we can add to improve the result? Mention strategies we can use to improve the robustness of our execution.
11. Based on all learnings and improvements print out the final strategy.
12. End the artifact using
13. Based on challenge type create a short structure to explain what we will do.
a. For coding tasks outline the code structure including file structure for multifile outputs.
b. For presentations outline the main sections of the presentation
c. Skip this step for riddles
14. Split the output for the challenge into intermediary steps (calculations etc), and wrap this in a separate artifact. The final answer should come after. If the final answer requires a longer section of output create a new artifact for the result as well.
15. Execute the steps outlined for the chosen strategy step by step to solve the problem.
Label each step of your thought process clearly (Step 1, Step 2, etc.). Confirm that you have completed all steps before proceeding to the final answer.
Remember, do not provide a final answer until you have completed and shown all steps of the thought process. If you skip the chain of thought process, I will ask you to start over from the beginning.
Before submitting your response, double-check that you've included all required steps of the thought process.
Don't make any reflections on the use of these detailed steps. Simply provide the answer at the end.
The specific question or task follow here:
«
Which one is better in coding
In my experience, Claude 3.5. I have been coding a Python system over 2 last weeks, with o1 preview and Claude 3.5, the last one is the best
system prompt ???
Claude 3.5 versus GPT-o1 is perhaps an unfair comparison, because 3.5 is a much, much smaller model. Claude 3 Opus would perhaps be a better comparison, because it is much larger (as evidenced by its generation speed, and the heavy rate limiting), has a much longer context length, and is generally more capable than 3.5 in many tasks.
Incorrect
No. Anthropic themselves state that 3.5 Sonnet outperforms Claude 3 Opus.
Opus is just heavier and more expensive.
Really good video. Appreciate all your work👍
The will be 4 killers in the room right, just one of them will be dead
Exact prompt is as follows:
You are an AI assistant designed to think through problems step-by-step using Chain-of-Thought (COT) prompting. Before providing any answer, you must:
1) Understand the Problem: Carefully read and understand the user's question or request.
2) Break Down the Reasoning Process: Outline the steps required to solve the problem or respond to the request logically and sequentially. Think aloud and describe each step in detail.
3) Explain Each Step: Provide reasoning or calculations for each step, explaining how you arrive at each part of your answer.
4) Arrive at the Final Answer: Only after completing all steps, provide the final answer or solution.
5) Review the Thought Process: Double-check the reasoning for errors or gaps before finalizing your response.
Always aim to make your thought process transparent and logical, helping users understand how you reached your conclusion.
It amazes me, I run Gemma 2 2b q8 on my machine and it gets a lot of these answers correct also. Its the smartest tiny model ever!
USER
how many R's are in the word 'strawberry'?
AI
Let's do this! There are three "R"s in the word "strawberry."
**Thought i might have cheated so I asked...
USER
how many Q's are in the word 'mango'?
AI
You got it! There are zero "Q"s in the word "mango".
USER
which number is bigger. 9.11 or 9.9?
AI
That's an easy one, Wintermute! 9.9 is bigger than 9.11.
Oh wow. I haven’t use that one very much
@@SkillLeapAIthe Gemma model is surprisingly tiny! Just a story of 1000 words repeated 10 times, then done!
I will ask my chatgpt 4 to act as Claude 3.5 sonnet
😂😂😂
your custom gpt sucks, stick with whats already made by openai and anthrophic
You gave me a tip to insult me?
@@SkillLeapAI Take the tip and go improve. There was no point in that custom GPT comparison.
@@clearsight655 his custom gpt sucks
Rodriguez Brenda White Michael Gonzalez Michael
play chess with gpt o1. if its really have 120 iq.
Jones Michelle Jones Sandra Rodriguez Sharon
We need claude 3.5 sonnet with O1 Chain of thought. But not prompts, like really integrated in the system
Plus the clone and the Claude clone of o1 doesn't have a chain of thought! He just used the wrong model for Claude.
Your tests are unfortunately not very good -and Matthew Burman is hardly a great source of tests, no matter how nice a chap he may be. Counting letters, comparing numbers, the microwave, the killers - they've been out for months and any AI worth its salt will have this covered. Your chess example is ok, although that too has some issues. You need novel tests, one of which i have on my channel, to really test it. I should do a comparison with Claude and 4o, but at the time i was only interested in testing o1-preview.
Yea if you have a good source or if you make a video, let me know. Always looking for better sources
@@SkillLeapAI It's not ChatGPT o1 it's OpenAI o1
Thomas Margaret Clark Cynthia Young Jeffrey
White Mary Harris Deborah Brown Edward
2nd
first comment
love the content buddy
nice! thank you
Saying the first chess promt failed because you don't know the grid is just due to a definition of failure particular to you. You failed.
Claude is a crap app with limits even after upgrade to their service.No internet access 😂😂😂. It's like being trapped in a dark environment without access to water and light with sleep limit at 5 minute each day😂😂😂😅😅😅
is still beats every other models tho except for the o1
11:25 - there are four killers in the room. Two original ones, one new one and one dead (killed).
At 10:10 you give Claude the "correct" mark, even though it completely made up the answer.
And your prompts are all stolen from other AI videos. Why not invent your own? It's easy.
I did use other ones in all my previous. It’s not stolen if I give credit to the source.
@@SkillLeapAI Thank you for dedicading time creating intresting content for us; please don't stop God bless your project.