Claude-ception: Teaching Claude3 to prompt engineer itself
ฝัง
- เผยแพร่เมื่อ 1 มิ.ย. 2024
- @alexalbert__ from @anthropicai had a great thread on Claude3-Opus prompt engineering. Write a prompt, run it with Claude3-Opus on a set of test cases, grade the responses, let Claude3-Opus use the grades to improve the prompt, and repeat iteratively until you are satisfied.
@WHinthorn and @rlancemartin show how to use LangSmith to simplify this process: (1) Create a dataset of test cases in LangSmith, (2) grade generations with feedback using the LangSmith Annotation Queue, and (3) run as an iterative improvement loop.
We apply this approach to paper summarization, asking Claude3 to summarize papers in the excellent communication style of @omarsar0. With feedback, Claude3 tunes its own summarization prompt and produces increasingly engaging paper summaries. This demonstrates a general strategy for automated prompt engineering.
Reference:
x.com/alexalbert__/status/176...
Code:
github.com/langchain-ai/langs...
⏰ Timestamps
-----------
00:00 Claude as a prompt engineer
00:50 Workflow overview
02:05 Creating the dataset
03:40 Initial predictions
04:30 Creating the annotation queue
05:05 Annotating examples - providing unstructured feedback
08:00 Regenerate the prompt
09:00 Review in the prompt hub
11:00 Updated predictions
11:45 Annotating again
13:30 Compare on new (validation) examples
14:30 Review
15:25 Conclusion
Always enjoy your videos. Learnt something new
explained very clearly in step by step format.
Is this only for text document datasets? What if my Dataset was a series of customer information (transactions) and i want to process the data set and send a tweet, email or text to relevant users? Is it able to do this?
(This is Lance from the video.) What is the end goal you want? Do you want to tune the prompt to improve the quality of user-feedback? If so, the same approach would work, but the dataset and the feedback would be different.
Bewildering.
Can you guys showcase a more complex scenario to really showcase the potential of the LangChain system?
(This is Lance from the video.) What is a use-case that would satisfy? One idea is many-shot prompting, where you auto-generate prompts w/ many instructions.
@@r.lancemartin7992 Thanks for the reply. I don't know if I'm asking too much or something but I'm REALLY interested in knowing whether it makes sense to spawn 30-50 agents and have them work in coordination. With Haikus reasoning capabilities, the context length and the price one should be able to spawn a lot of agents for a low price and have them work on tasks autonomously, where each agent is a specialist with its own set of tools to call. Theoretically each task is just the application of tools in a specific order. I would be really interested in knowing if a big cluster in LangGraph for example would collapse or sustain itself. Something like "automating financial forecasting, using the insights for refined ideas and pitching new innovations based on the analysis while getting and sending constant data streams to SAP" or something like that, idk. Or maybe really advanced research clusters or showcasing how different clusters work together. Knowing that it can or can't work at all would help a great deal in knowing where to allocate my time.
Hey thanks for the answer. Mostly I‘m interested in big Graphs with 30-50 agents that maybe use Haiku for cost reasons and so some bigger coordinated tasks. I‘m really interested in knowing if such structures can be sustained or if they collapse. Maybe it would be cool to see many graphs collaborating with each one being specialized in some task structure with its own instructions and toolsets. Each task in my opinion is the application of tools at a given time so theoretically maybe one should be able to have a financial analysis agent for example that is connected to some dataset that sends the results of the analysis to some structural management cluster of agents that think about the measures one should take from the financial analysis or other hyper applicable use cases. What I‘m most interested in is a demonstration in wether big clusters can be orchestrated or if the collapse.
I wonder if at some point, we don't create a very simple four column data set of examples {context, good response, better response, great response} as reusable artifact to drive better responses?
An interesting use case but the code to pull in the formatted_feedback is failing due to some annotation records having no output. Having the ability tot delete records from the annotation set would help.
I was getting rate limiting errors with the langchain_anthropic client. Exponential backoff would definitely help.
(This is Lance from the video.) Will look into this!
The dependency on langsmith is a little frustrating but makes sense
The original tweet from Anthropic didn't use LangSmith, so the workflow is totally do-able w/o it
Is there the same for any llm ?
Technically yes but you won't get this quality since Opus is insanely good
it will depend
How do you create diagram any ai?
Excalidraw