Synthetic data is awesome but garbage in garbage out. I’ve been trying to come up with a way to automate to fact checking process for synthetically generated data. Super cool thanks for sharing
Thanks for putting this paper on my radar. Very timely. I’ve found DSPy to be a useful framework for programming synthetic data generation and evaluation pipelines, particularly in conjunction with their pydantic integration. I haven’t taken it too far though or tried to scale it up. Curious what other frameworks or approaches people here have found useful and manageable for generating high-quality synthetic data.
Synthetic data is awesome but garbage in garbage out. I’ve been trying to come up with a way to automate to fact checking process for synthetically generated data. Super cool thanks for sharing
Great point. Let me know when you have some automation to create synthetic data. Always keen to learn more ways of dealing with datasets. Thanks.
Thanks for putting this paper on my radar. Very timely.
I’ve found DSPy to be a useful framework for programming synthetic data generation and evaluation pipelines, particularly in conjunction with their pydantic integration. I haven’t taken it too far though or tried to scale it up. Curious what other frameworks or approaches people here have found useful and manageable for generating high-quality synthetic data.