Claude-ception: Teaching Claude3 to prompt engineer itself

LangChain

มุมมอง 33 946

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 1 มิ.ย. 2024
@alexalbert__ from @anthropicai had a great thread on Claude3-Opus prompt engineering. Write a prompt, run it with Claude3-Opus on a set of test cases, grade the responses, let Claude3-Opus use the grades to improve the prompt, and repeat iteratively until you are satisfied.
@WHinthorn and @rlancemartin show how to use LangSmith to simplify this process: (1) Create a dataset of test cases in LangSmith, (2) grade generations with feedback using the LangSmith Annotation Queue, and (3) run as an iterative improvement loop.
We apply this approach to paper summarization, asking Claude3 to summarize papers in the excellent communication style of @omarsar0. With feedback, Claude3 tunes its own summarization prompt and produces increasingly engaging paper summaries. This demonstrates a general strategy for automated prompt engineering.
Reference:
x.com/alexalbert__/status/176...
Code:
github.com/langchain-ai/langs...
⏰ Timestamps
-----------
00:00 Claude as a prompt engineer
00:50 Workflow overview
02:05 Creating the dataset
03:40 Initial predictions
04:30 Creating the annotation queue
05:05 Annotating examples - providing unstructured feedback
08:00 Regenerate the prompt
09:00 Review in the prompt hub
11:00 Updated predictions
11:45 Annotating again
13:30 Compare on new (validation) examples
14:30 Review
15:25 Conclusion

ความคิดเห็น • 20

@top_1_percent 2 หลายเดือนก่อน ⁺¹
Always enjoy your videos. Learnt something new
@peter102 2 หลายเดือนก่อน ⁺¹
explained very clearly in step by step format.
@verolthompson502 2 หลายเดือนก่อน ⁺²
Is this only for text document datasets? What if my Dataset was a series of customer information (transactions) and i want to process the data set and send a tweet, email or text to relevant users? Is it able to do this?
@r.lancemartin7992 2 หลายเดือนก่อน
(This is Lance from the video.) What is the end goal you want? Do you want to tune the prompt to improve the quality of user-feedback? If so, the same approach would work, but the dataset and the feedback would be different.
@markmodesti4054 2 หลายเดือนก่อน ⁺¹
Bewildering.
@MGeeify 2 หลายเดือนก่อน ⁺²
Can you guys showcase a more complex scenario to really showcase the potential of the LangChain system?
@r.lancemartin7992 2 หลายเดือนก่อน ⁺³
(This is Lance from the video.) What is a use-case that would satisfy? One idea is many-shot prompting, where you auto-generate prompts w/ many instructions.
@MGeeify 2 หลายเดือนก่อน
@@r.lancemartin7992 Thanks for the reply. I don't know if I'm asking too much or something but I'm REALLY interested in knowing whether it makes sense to spawn 30-50 agents and have them work in coordination. With Haikus reasoning capabilities, the context length and the price one should be able to spawn a lot of agents for a low price and have them work on tasks autonomously, where each agent is a specialist with its own set of tools to call. Theoretically each task is just the application of tools in a specific order. I would be really interested in knowing if a big cluster in LangGraph for example would collapse or sustain itself. Something like "automating financial forecasting, using the insights for refined ideas and pitching new innovations based on the analysis while getting and sending constant data streams to SAP" or something like that, idk. Or maybe really advanced research clusters or showcasing how different clusters work together. Knowing that it can or can't work at all would help a great deal in knowing where to allocate my time.
@MGeeify 2 หลายเดือนก่อน
Hey thanks for the answer. Mostly I‘m interested in big Graphs with 30-50 agents that maybe use Haiku for cost reasons and so some bigger coordinated tasks. I‘m really interested in knowing if such structures can be sustained or if they collapse. Maybe it would be cool to see many graphs collaborating with each one being specialized in some task structure with its own instructions and toolsets. Each task in my opinion is the application of tools at a given time so theoretically maybe one should be able to have a financial analysis agent for example that is connected to some dataset that sends the results of the analysis to some structural management cluster of agents that think about the measures one should take from the financial analysis or other hyper applicable use cases. What I‘m most interested in is a demonstration in wether big clusters can be orchestrated or if the collapse.
@DeanPeters 2 หลายเดือนก่อน
I wonder if at some point, we don't create a very simple four column data set of examples {context, good response, better response, great response} as reusable artifact to drive better responses?
@DonBranson1 2 หลายเดือนก่อน
An interesting use case but the code to pull in the formatted_feedback is failing due to some annotation records having no output. Having the ability tot delete records from the annotation set would help.
I was getting rate limiting errors with the langchain_anthropic client. Exponential backoff would definitely help.
@r.lancemartin7992 2 หลายเดือนก่อน ⁺¹
(This is Lance from the video.) Will look into this!
@choiswimmer 2 หลายเดือนก่อน
The dependency on langsmith is a little frustrating but makes sense
@r.lancemartin7992 2 หลายเดือนก่อน
The original tweet from Anthropic didn't use LangSmith, so the workflow is totally do-able w/o it
@jackbauer322 2 หลายเดือนก่อน
Is there the same for any llm ?
@choiswimmer 2 หลายเดือนก่อน ⁺¹
Technically yes but you won't get this quality since Opus is insanely good
@Dzun-sr3xj 2 หลายเดือนก่อน
it will depend
@priteshjain9671 2 หลายเดือนก่อน
How do you create diagram any ai?
@r.lancemartin7992 2 หลายเดือนก่อน
Excalidraw

ต่อไป

เล่นอัตโนมัติ

Deploying code agents without all the agonizing pain