The Hierarchy of Needs for Training Dataset Development: Chang She and Noah Shpak

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ต.ค. 2024
  • Training and fine-tuning models depends critically on how you construct your dataset. Part art, part science, we’ll share with you practical lessons in dataset construction at Character AI and how to build a data platform to support rapid iterative refinement of training data. For LLMs, data scale is much larger and workloads are more diverse. This is especially true for multimodal datasets. To deal with these challenges, we'll show you how LanceDB is used in production to solve many pain-points around the storage, management, and querying of large scale AI data.
    Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at www.ai.enginee... & join us at the AI Engineer World's Fair in 2025! Get your tickets today at ai.engineer/2025
    About Chang
    Chang She is the CEO and cofounder of LanceDB, the developer-friendly, open-source database for multi-modal AI. A serial entrepreneur, Chang has been building DS/ML tooling for nearly two decades and is one of the original contributors to the pandas library. Prior to founding LanceDB, Chang was VP of Engineering at TubiTV, where he focused on personalized recommendations and ML experimentation.
    About Noah
    Noah is a Research Engineer with a passion for building data systems and ML platforms from the ground up.
    He leads the Data Platform team at Character, focusing on accelerating foundation model research, alignment, and product development through internet-scale data mining, prompting tools, and retrieval systems. Making data go vroom while gpus go brrrr is what makes him (and the team) tic!

ความคิดเห็น • 1

  • @maxjesch
    @maxjesch 3 ชั่วโมงที่ผ่านมา

    great talk but annoying background noise