Aakanksha Chowdhery: Multimodal Reasoning and its Applications to Computer Use and Robotics

Jason Wei: Scaling Paradigms for Large Language Models

[1hr Talk] Intro to Large Language Models

How to treat Acne💉

บังอาจ ทาบบารมี ! ผ่าเบื้องลึก 1 วันก่อนสังหาร เดินเกมล้มตระกูล “วิลาวัลย์” #ถกไม่เถียง

ช้างศึกโดนก่อน ไล่ยิงคืนสิงคโปร์ ทะลุน็อคเอาท์

Yann Dubois: Scalable Evaluation of Large Language Models

Mayur Naik

มุมมอง 6 040

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 8 ม.ค. 2025

ความคิดเห็น • 2

@dheerajagrawal9107 2 หลายเดือนก่อน ⁺⁴
Here are a few key takeaways from Yan's talk on evaluating large language models:
1. Evaluation is important for identifying model improvements, selecting the best model for a use case, and determining if a model is ready for production.
2. Desired properties of evaluation include scalability, relevance, discriminative power, interpretability, reproducibility, and robustness to gaming.
3. Evaluating LLMs is challenging due to the diverse set of possible tasks and open-ended nature of responses.
4. Common evaluation approaches include:
- Converting open-ended tasks to closed-ended (e.g. multiple choice)
- Reference-based heuristics comparing model outputs to human references
- Human evaluation
- LLM-based evaluation using another model as judge
5. LLM-based evaluation (e.g. Alpaca Eval) can be scalable and correlate well with human judgments, but requires careful design to avoid biases.
6. Challenges include consistency across implementations, contamination of test data in training sets, quick saturation of benchmarks, and incentives to keep using outdated metrics.
7. Future directions may involve more human-in-the-loop approaches (like rubric-based evaluation) and using LLMs to generate more targeted evaluation instructions.
8. Overall, evaluation of LLMs remains an active area of research with many open challenges to address.
@403saket 3 หลายเดือนก่อน
Thank you prof!

ต่อไป

เล่นอัตโนมัติ

Aakanksha Chowdhery: Multimodal Reasoning and its Applications to Computer Use and Robotics

Aakanksha Chowdhery: Multimodal Reasoning and its Applications to Computer Use and Robotics

Jason Wei: Scaling Paradigms for Large Language Models

Jason Wei: Scaling Paradigms for Large Language Models

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

How to treat Acne💉

How to treat Acne💉

บังอาจ ทาบบารมี ! ผ่าเบื้องลึก 1 วันก่อนสังหาร เดินเกมล้มตระกูล “วิลาวัลย์” #ถกไม่เถียง

บังอาจ ทาบบารมี ! ผ่าเบื้องลึก 1 วันก่อนสังหาร เดินเกมล้มตระกูล “วิลาวัลย์” #ถกไม่เถียง

ช้างศึกโดนก่อน ไล่ยิงคืนสิงคโปร์ ทะลุน็อคเอาท์

ช้างศึกโดนก่อน ไล่ยิงคืนสิงคโปร์ ทะลุน็อคเอาท์

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

Building Production RAG Over Complex Documents

Building Production RAG Over Complex Documents

Hyung Won Chung: Shaping the Future of AI from the History of Transformer

Hyung Won Chung: Shaping the Future of AI from the History of Transformer

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

A Hackers' Guide to Language Models

A Hackers' Guide to Language Models

Hanjun Dai: Preference Optimization for Large Language Models

Hanjun Dai: Preference Optimization for Large Language Models

Ilya Sutskever: "Sequence to sequence learning with neural networks: what a decade"

Ilya Sutskever: "Sequence to sequence learning with neural networks: what a decade"

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

AI and Quantum Computing: Glimpsing the Near Future

AI and Quantum Computing: Glimpsing the Near Future

“โดนัท มนัสนันท์” ไหว้ขอสามีมีอีหนูเถอะ!! “หนุ่ม กรรชัย” พร้อมช่วยเหลือ! | 3 แซ่บ (Full) 15 ธ.ค. 67

“โดนัท มนัสนันท์” ไหว้ขอสามีมีอีหนูเถอะ!! “หนุ่ม กรรชัย” พร้อมช่วยเหลือ! | 3 แซ่บ (Full) 15 ธ.ค. 67

คริสต์มาสมรณะ | Who Are You EP.7 ( Edwin )

คริสต์มาสมรณะ | Who Are You EP.7 ( Edwin )

🎄✨ Puff is saving Christmas again with his incredible baking skills! #PuffTheBaker #thatlittlepuff

🎄✨ Puff is saving Christmas again with his incredible baking skills! #PuffTheBaker #thatlittlepuff

แหกหน้าพ่อค้าจีน 2 #hagatestudio #fun #funny #พากย์นรก

แหกหน้าพ่อค้าจีน 2 #hagatestudio #fun #funny #พากย์นรก

Oren helps Durple escape Pinki in a way you wouldn't expect

Oren helps Durple escape Pinki in a way you wouldn't expect

Apko konsa RC Bus Accah laga

Apko konsa RC Bus Accah laga

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

มายคราฟ, แต่ ไลค์ = หัวใจ!

มายคราฟ, แต่ ไลค์ = หัวใจ!