Terry Yue Zhuo "BigCodeBench: Benchmarking Code Generation"

Jonathan Mellon "Using LLMs to code open-text social survey responses at scale"

[1hr Talk] Intro to Large Language Models

LIVE CONTINENTAL FUTSAL CHAMPIONSHIP MATCH 8 THAILAND v GUATEMALA l ถ่ายทอดสด พร้อมบทวิเคราะห์

ซูเปอร์ไต้ฝุ่น “ยางิ” แรงสุดในรอบ 10 ปีแรงสุดอันดับ 2 ของโลกในปีนี้

การแข่งขัน RoV Pro League 2024 Winter | รอบเก็บคะแนน Week 4 Day 1

Naman Jain - "LiveCodeBench: Holistic and contamination free evaluation of LLMs for code"

Rohan Alexander

มุมมอง 48

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 7 ก.ย. 2024
Friday 12 July 2024, noon (EDT)
Toronto Data Workshop
Naman Jain, UC Berkeley
“LiveCodeBench: Holistic and contamination free evaluation of large language models for code”
In this talk we introduce LiveCodeBench, a comprehensive and contamination-free benchmark for LLMs in code, which continuously collects new problems from LeetCode, AtCoder, and CodeForces. LiveCodeBench evaluates a wide range of capabilities, including self-repair, code execution, and test output prediction. It currently hosts 400 coding problems published between May 2023 and May 2024. We evaluated 18 base LLMs and 34 instruction-tuned LLMs, presenting findings on contamination, performance comparisons, and potential overfitting.
Naman Jain is a CS Ph.D. student at UC Berkeley, focusing on using machine learning to enhance developer productivity tools like program analysis, synthesis, and repair. He also explores how synthesis and verification can improve algorithm generalizability and explainability. He holds an undergraduate degree from IIT Bombay, where he researched NLP robustness and computer vision. Before his Ph.D., he was a predoctoral research fellow at Microsoft Research India, working on program repair, improving large language models, and learning decision trees.

ความคิดเห็น •

ต่อไป

เล่นอัตโนมัติ

Terry Yue Zhuo "BigCodeBench: Benchmarking Code Generation"

Terry Yue Zhuo "BigCodeBench: Benchmarking Code Generation"

Jonathan Mellon "Using LLMs to code open-text social survey responses at scale"

Jonathan Mellon "Using LLMs to code open-text social survey responses at scale"

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

LIVE CONTINENTAL FUTSAL CHAMPIONSHIP MATCH 8 THAILAND v GUATEMALA l ถ่ายทอดสด พร้อมบทวิเคราะห์

LIVE CONTINENTAL FUTSAL CHAMPIONSHIP MATCH 8 THAILAND v GUATEMALA l ถ่ายทอดสด พร้อมบทวิเคราะห์

ซูเปอร์ไต้ฝุ่น “ยางิ” แรงสุดในรอบ 10 ปีแรงสุดอันดับ 2 ของโลกในปีนี้

ซูเปอร์ไต้ฝุ่น “ยางิ” แรงสุดในรอบ 10 ปีแรงสุดอันดับ 2 ของโลกในปีนี้

การแข่งขัน RoV Pro League 2024 Winter | รอบเก็บคะแนน Week 4 Day 1

การแข่งขัน RoV Pro League 2024 Winter | รอบเก็บคะแนน Week 4 Day 1

ร้านประจำดีเจภูมิ @DJPoom

ร้านประจำดีเจภูมิ @DJPoom

Matheus Facure "Why Banking has the Coolest Stats/Data Science Problems"

Matheus Facure "Why Banking has the Coolest Stats/Data Science Problems"

SciCode, AssistantBench, CiteME and SWE-bench: Summer of Benchmarks

SciCode, AssistantBench, CiteME and SWE-bench: Summer of Benchmarks

Kosuke Imai "Does AI help humans make better decisions?"

Kosuke Imai "Does AI help humans make better decisions?"

A Day In The Life Of A Software Engineer | realistic | NYC Edition

A Day In The Life Of A Software Engineer | realistic | NYC Edition

What is generative AI and how does it work? - The Turing Lectures with Mirella Lapata

What is generative AI and how does it work? – The Turing Lectures with Mirella Lapata

Bradley Congelio - Introduction to NFL Analytics with R

Bradley Congelio - Introduction to NFL Analytics with R

Free Programs that EVERY PC should have! (NOT SPONSORED!)

Free Programs that EVERY PC should have! (NOT SPONSORED!)

"okay, but I want Llama 3 for my specific use case" - Here's how

"okay, but I want Llama 3 for my specific use case" - Here's how

Abel Brodeur - Mass Reproducibility and Replicability: A New Hope

Abel Brodeur - Mass Reproducibility and Replicability: A New Hope

HIGHLIGHTS : Argentina 3-0 Chile | CONMEBOL World Cup Qualifiers | 06.09.24

HIGHLIGHTS : Argentina 3-0 Chile | CONMEBOL World Cup Qualifiers | 06.09.24

การแข่งขัน RoV Pro League 2024 Winter | รอบเก็บคะแนน Week 4 Day 1

การแข่งขัน RoV Pro League 2024 Winter | รอบเก็บคะแนน Week 4 Day 1

Linkin Park: FROM ZERO (Livestream)

Linkin Park: FROM ZERO (Livestream)

เลียม แฮร์ริสัน vs เสกสรร อ.ขวัญเมือง ONE 168 | 7 ก.ย.67

เลียม แฮร์ริสัน vs เสกสรร อ.ขวัญเมือง ONE 168 | 7 ก.ย.67

[กูเพิ่งซื้อมึงปิดหนี] 20 นาที ผมรีวิวเกมที่แย่ที่สุดในโลก CONCORD

[กูเพิ่งซื้อมึงปิดหนี] 20 นาที ผมรีวิวเกมที่แย่ที่สุดในโลก CONCORD

การแข่งขัน RoV Pro League 2024 Winter | รอบเก็บคะแนน Week 4 Day 2

การแข่งขัน RoV Pro League 2024 Winter | รอบเก็บคะแนน Week 4 Day 2

Cute kitty gadget 💛💕

Cute kitty gadget 💛💕

ถ้าทำตัวแบบนี้ออกไปเลย #หกฉากครับจารย์

ถ้าทำตัวแบบนี้ออกไปเลย #หกฉากครับจารย์