Naman Jain - "LiveCodeBench: Holistic and contamination free evaluation of LLMs for code"

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 ก.ย. 2024
  • Friday 12 July 2024, noon (EDT)
    Toronto Data Workshop
    Naman Jain, UC Berkeley
    “LiveCodeBench: Holistic and contamination free evaluation of large language models for code”
    In this talk we introduce LiveCodeBench, a comprehensive and contamination-free benchmark for LLMs in code, which continuously collects new problems from LeetCode, AtCoder, and CodeForces. LiveCodeBench evaluates a wide range of capabilities, including self-repair, code execution, and test output prediction. It currently hosts 400 coding problems published between May 2023 and May 2024. We evaluated 18 base LLMs and 34 instruction-tuned LLMs, presenting findings on contamination, performance comparisons, and potential overfitting.
    Naman Jain is a CS Ph.D. student at UC Berkeley, focusing on using machine learning to enhance developer productivity tools like program analysis, synthesis, and repair. He also explores how synthesis and verification can improve algorithm generalizability and explainability. He holds an undergraduate degree from IIT Bombay, where he researched NLP robustness and computer vision. Before his Ph.D., he was a predoctoral research fellow at Microsoft Research India, working on program repair, improving large language models, and learning decision trees.

ความคิดเห็น •