thanks for the breakdown. hmmm, with dspy, I am still missing the options to work more focused on the "for which cases didn't it work and how could that be mitigated/tackled" route. Of course, you don't want to just focus on those; you'd certainly want to keep the performance as good as possible for the ones that already worked. However, quite often blindly optimizing against a score is a kind senseless exercise if the failures come from mislabeled data (which I would clearly first look at if a large LLM model can't solve such a task). This is just from experience - if you have flawed data, that might hurts the actual downstream application because the optimization process might draw to much on it. Great content; go on!
Thanks for the breakdown. Have you had success with it in production? It seems like in your examples, the performance didn’t go up significantly over baseline until there was fine tuning of actual weights. A trial and error prompt engineering approach might yield similar results if there is a test.
thanks for the breakdown. hmmm, with dspy, I am still missing the options to work more focused on the "for which cases didn't it work and how could that be mitigated/tackled" route. Of course, you don't want to just focus on those; you'd certainly want to keep the performance as good as possible for the ones that already worked. However, quite often blindly optimizing against a score is a kind senseless exercise if the failures come from mislabeled data (which I would clearly first look at if a large LLM model can't solve such a task). This is just from experience - if you have flawed data, that might hurts the actual downstream application because the optimization process might draw to much on it. Great content; go on!
Thanks for the breakdown. Have you had success with it in production? It seems like in your examples, the performance didn’t go up significantly over baseline until there was fine tuning of actual weights. A trial and error prompt engineering approach might yield similar results if there is a test.
can it create the json / struct (i haven't watched the full video yet) as part of the optimized prompt?
Yesser
Is dspy harder to use for complicated prompts?
does it make sense to pair it w pydantic ai?
Yes definitely
Danke!
Thank you for supporting the channel! 🙏