Best Practices on Synthetic Data for LLMs

แชร์
ฝัง

ความคิดเห็น •

  • @spencerfunk6697
    @spencerfunk6697 7 หลายเดือนก่อน +1

    Synthetic data is awesome but garbage in garbage out. I’ve been trying to come up with a way to automate to fact checking process for synthetically generated data. Super cool thanks for sharing

    • @fahdmirza
      @fahdmirza  7 หลายเดือนก่อน +1

      Great point. Let me know when you have some automation to create synthetic data. Always keen to learn more ways of dealing with datasets. Thanks.

    • @kevon217
      @kevon217 7 หลายเดือนก่อน

      Thanks for putting this paper on my radar. Very timely.
      I’ve found DSPy to be a useful framework for programming synthetic data generation and evaluation pipelines, particularly in conjunction with their pydantic integration. I haven’t taken it too far though or tried to scale it up. Curious what other frameworks or approaches people here have found useful and manageable for generating high-quality synthetic data.