Anyscale Workspaces: A Scalable Interactive ML Development Environment with Zero Setup

From Spark to Ray: An Exabyte-Scale Production Migration Case Study

Rishi Bommasani -- Holistic Evaluation of Language Models (February 15th 2023)

น้องๆ โรงเรียนอัสสัมชัญบางรักจะรู้มั้ยสาขาสะพานตากสินจะปิดแล้ว 😭💖👍 #แบร์เฮาส์ #อัสสัมบางรัก

LIVE⚽หลังเกม พาเลซ vs แมนฯ ยูไนเต็ด l ซอคเกอร์ ปาร์ตี้ ขยี้บอลสด l 2024/25 EP5 l SIAMSPORT

วันเกิดเธอ (TO YOU) - PHUWIN [ OFFICIAL MV ]

Meetup: Evaluating LLMs: Needle in a Haystack

Anyscale

มุมมอง 1 336

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 21 ก.ย. 2024
LLM evaluation is a discipline where confusion reigns and foundation model builders are effectively grading their own homework.
Building on the viral threads on X/Twitter, Greg Kamradt, Robert Nishihara, and Jason Lopatecki discuss highlights from Arize AI's ongoing research on how major foundation models - from OpenAI’s GPT-4 to Mistral and Anthropic’s Claude - are stacking up against each other at important tasks and emerging LLM use cases, covering and explaining the importance of results of Needle in a Haystack tests and other evals results on hallucination detection on private data, question-and-answer, code functionality, and more.
Curious which foundation models your company should be using for a specific use case - and which to avoid? You won’t want to miss this meetup!

ความคิดเห็น • 2

@antonidabrowski4657 5 หลายเดือนก่อน
Good content, thanks for your research
@sennetor 6 หลายเดือนก่อน
First Impressions! So human. :)

ต่อไป

เล่นอัตโนมัติ

Anyscale Workspaces: A Scalable Interactive ML Development Environment with Zero Setup

Anyscale Workspaces: A Scalable Interactive ML Development Environment with Zero Setup

From Spark to Ray: An Exabyte-Scale Production Migration Case Study

From Spark to Ray: An Exabyte-Scale Production Migration Case Study

Rishi Bommasani -- Holistic Evaluation of Language Models (February 15th 2023)

Rishi Bommasani -- Holistic Evaluation of Language Models (February 15th 2023)

น้องๆ โรงเรียนอัสสัมชัญบางรักจะรู้มั้ยสาขาสะพานตากสินจะปิดแล้ว 😭💖👍 #แบร์เฮาส์ #อัสสัมบางรัก

น้องๆ โรงเรียนอัสสัมชัญบางรักจะรู้มั้ยสาขาสะพานตากสินจะปิดแล้ว 😭💖👍 #แบร์เฮาส์ #อัสสัมบางรัก

LIVE⚽หลังเกม พาเลซ vs แมนฯ ยูไนเต็ด l ซอคเกอร์ ปาร์ตี้ ขยี้บอลสด l 2024/25 EP5 l SIAMSPORT

LIVE⚽หลังเกม พาเลซ vs แมนฯ ยูไนเต็ด l ซอคเกอร์ ปาร์ตี้ ขยี้บอลสด l 2024/25 EP5 l SIAMSPORT

วันเกิดเธอ (TO YOU) - PHUWIN [ OFFICIAL MV ]

วันเกิดเธอ (TO YOU) - PHUWIN [ OFFICIAL MV ]

“เข้าป่าอย่า…!” เตือนไม่ฟังแล้วยังปากดี เกือบกลายเป็นผีเฝ้าป่า | คุณธาม | สถานีผีดุ EP.235

“เข้าป่าอย่า…!” เตือนไม่ฟังแล้วยังปากดี เกือบกลายเป็นผีเฝ้าป่า | คุณธาม | สถานีผีดุ EP.235

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

vLLM Office Hours - FP8 Quantization Deep Dive - July 9, 2024

vLLM Office Hours - FP8 Quantization Deep Dive - July 9, 2024

Ray Train: A Production-Ready Library for Distributed Deep Learning

Ray Train: A Production-Ready Library for Distributed Deep Learning

Large Language Models and The End of Programming - CS50 Tech Talk with Dr. Matt Welsh

Large Language Models and The End of Programming - CS50 Tech Talk with Dr. Matt Welsh

Modernizing DoorDash Model Serving Platform with Ray Serve

Modernizing DoorDash Model Serving Platform with Ray Serve

Aligning LLMs with Direct Preference Optimization

Aligning LLMs with Direct Preference Optimization

End-to-End LLM Workflows with Anyscale

End-to-End LLM Workflows with Anyscale

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Ray Observability 2.0: How to Debug Your Ray Applications with New Observability Tooling

Ray Observability 2.0: How to Debug Your Ray Applications with New Observability Tooling

[TH] JFT vs RLF - VCT Ascension Pacific - Day 2

[TH] JFT vs RLF - VCT Ascension Pacific - Day 2

What is this principle #gojosatoru #shorts

What is this principle #gojosatoru #shorts

เจ๊บีเปิดคาเฟ่แมว ขนมเพียบเลย | น้องบีม

เจ๊บีเปิดคาเฟ่แมว ขนมเพียบเลย | น้องบีม

เลขอันตราย #ตู้ซ่อนหมี

เลขอันตราย #ตู้ซ่อนหมี

กักกั๊ก (GUGGUG) - 4EVE Feat. GEE | Official M/V

กักกั๊ก (GUGGUG) - 4EVE Feat. GEE | Official M/V

อายุ 20 แล้วผ่านไรมาบ้าง

อายุ 20 แล้วผ่านไรมาบ้าง

ไฮไลท์ฟุตซอลโลก 2024 รอบแบ่งกลุ่ม | ทีมชาติไทย พบ ทีมชาติบราซิล

ไฮไลท์ฟุตซอลโลก 2024 รอบแบ่งกลุ่ม | ทีมชาติไทย พบ ทีมชาติบราซิล

ปราสาทมรณะ | Who Are You EP.5 ( Cream Like x ใจร้าว )

ปราสาทมรณะ | Who Are You EP.5 ( Cream Like x ใจร้าว )