Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework

LLM Agent Fine-Tuning: Enhancing Task Automation with Weights & Biases

LLM Foundations (LLM Bootcamp)

Huge sinkhole SWALLOWS road

ผมเอา Villager มาแข่งกันใครเหลือคนสุดท้าย = ชนะ!!!

โรงพยาบาลที่แพงสุดในไทย #shorts

Deep Dive into LLM Evaluation with Weights & Biases

DeepLearningAI

มุมมอง 17 587

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 14 ส.ค. 2023
In the dynamic world of Large Language Models (LLMs), we've unlocked the power to build smart systems from our data. Just like any other piece of automation software, it's essential we take the time to assess these LLM systems. In this webinar, we're going to dive into how we can effectively evaluate these systems, with a particular focus on Retrieval Augmented Generation (RAG) systems.
We'll start by discussing the 'eye-balling' technique and why Weight & Biases Prompts stands out as the first great tool in this area. Then, we'll move on to supervised evaluation, highlighting why it's worth considering and pointing out some limitations.
To wrap things up, we'll look at how LLMs can be used to evaluate themselves - from generating their own evaluation datasets, to using standard metrics like SQuAD or BLUE, and even evaluating retrieval systems.
On top of all that, we'll also touch on how W&B Sweeps, an excellent tool for hyperparameter optimization, can be utilized to find the ideal balance to maximize accuracy and minimize costs. The session will end with a Q&A with the presenters.
This workshop is based off the foundational learnings of DeepLearning.AI’s course on Evaluating & Debugging Generative AI built in collaboration with the Weights & Biases team. Everything covered in the workshop is presented as continued education from the course.
Event Agenda
40-minute Workshop
10-minute Q&A: Answering questions from the audience.
About the Speakers:
Morgan McGuire - Growth Director at Weights & Biases
Morgan leads the Growth ML team and is a ML Engineer at Weights & Biases. He has a background in NLP and previously worked at Facebook on the Safety team where he helped classify and flag potentially high-severity content for removal.
Ayush Thakur - Machine Learning Engineer at Weights & Biases
Ayush is a MLE at Weights and Biases and Google Developer Expert in Machine Learning (TensorFlow). He is interested in everything computer vision and representation learning. For the past 7 months he’s been working with LLMs and have covered RLHF and how and what of building LLM-based systems.
Carey Phelps -Founding Product Manager at Weights & Biases
Carey is the founding Product Manager at Weights & Biases. She studied computer science at Stanford and went on to found Carta Healthcare before joining Weights & Biases.
บันเทิง

ความคิดเห็น • 6

@philipegger4599 10 หลายเดือนก่อน
It seems that on slide 32, "Human Eval" and "User Testing" have inexplicably switched places.
@ayushthakur736 10 หลายเดือนก่อน ⁺²
Thanks for catching. Corrected.
@420_gunna 2 หลายเดือนก่อน
I get that content marketing is always going to be thinly veiled product marketing, but this was a little too on the nose, and not enough meat otherwise
@Gus-AI-World 10 หลายเดือนก่อน ⁺³
this is a disaster presentation. At 10 mins Ayush starts a business presentation. Then nothing can be seen because of the small sized font. I do not know who is your target audience for this? then suddenly Ayush jumps to somehow "it works" wandb screen for LLMs.

ต่อไป

เล่นอัตโนมัติ

Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework

Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework

LLM Agent Fine-Tuning: Enhancing Task Automation with Weights & Biases

LLM Agent Fine-Tuning: Enhancing Task Automation with Weights & Biases

LLM Foundations (LLM Bootcamp)

LLM Foundations (LLM Bootcamp)

Huge sinkhole SWALLOWS road

Huge sinkhole SWALLOWS road

ผมเอา Villager มาแข่งกันใครเหลือคนสุดท้าย = ชนะ!!!

ผมเอา Villager มาแข่งกันใครเหลือคนสุดท้าย = ชนะ!!!

โรงพยาบาลที่แพงสุดในไทย #shorts

โรงพยาบาลที่แพงสุดในไทย #shorts

[TH] VCT Pacific Stage 2 - Week 4 Day 3 // GEN vs TS | PRX vs ZETA

[TH] VCT Pacific Stage 2 - Week 4 Day 3 // GEN vs TS | PRX vs ZETA

Evaluating LLM-based Applications

Evaluating LLM-based Applications

What is Local Optima?

What is Local Optima?

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Let's build GPT: from scratch, in code, spelled out.

Let's build GPT: from scratch, in code, spelled out.

Everything about LLM Agents - Chain of Thought, Reflection, Tool Use, Memory, Multi-Agent Framework

Everything about LLM Agents - Chain of Thought, Reflection, Tool Use, Memory, Multi-Agent Framework

A Hackers' Guide to Language Models

A Hackers' Guide to Language Models

How to evaluate an LLM-powered RAG application automatically.

How to evaluate an LLM-powered RAG application automatically.

Evaluating LLM-based Applications // Josh Tobin // LLMs in Prod Conference Part 2

Evaluating LLM-based Applications // Josh Tobin // LLMs in Prod Conference Part 2

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

”ปู-หาญส์” เปิดหน้าสู้ แจงปมเป็นหนี้...ทำไมยังไม่จ่าย | เปิดปากกับภาคภูมิ EP.471 | 5 ก.ค. 67 | FULL

”ปู-หาญส์” เปิดหน้าสู้ แจงปมเป็นหนี้...ทำไมยังไม่จ่าย | เปิดปากกับภาคภูมิ EP.471 | 5 ก.ค. 67 | FULL

เด็กๆพวกนั้นยังไม่ไปไหนพวกเค้ารอเกียวเมเสมอ

เด็กๆพวกนั้นยังไม่ไปไหนพวกเค้ารอเกียวเมเสมอ

ยิงนัดเดียวจอด ก็ต้อง แอ็คแบบโอ่อา..

ยิงนัดเดียวจอด ก็ต้อง แอ็คแบบโอ่อา..

”ปู-หาญส์” เปิดหน้าสู้ แจงปมเป็นหนี้...ทำไมยังไม่จ่าย | เปิดปากกับภาคภูมิ EP.471 | 5 ก.ค. 67 | FULL

”ปู-หาญส์” เปิดหน้าสู้ แจงปมเป็นหนี้...ทำไมยังไม่จ่าย | เปิดปากกับภาคภูมิ EP.471 | 5 ก.ค. 67 | FULL

#ดีเจพุฒพุฒิชัย รักคนไม่ผิด ในวันที่ต้องตัดสินใจ ขอเลือก #จุ๋ยวรัทยา เหมือนเดิม | Shorts Clip 2024

#ดีเจพุฒพุฒิชัย รักคนไม่ผิด ในวันที่ต้องตัดสินใจ ขอเลือก #จุ๋ยวรัทยา เหมือนเดิม | Shorts Clip 2024

พิซซ่าสูตรหวานตัดขา #พากย์ไทย #พากย์นรก

พิซซ่าสูตรหวานตัดขา #พากย์ไทย #พากย์นรก

หนังเรื่องแรกที่เขียนขึ้นด้วย AI 100% โดนดราม่าและจะปล่อยให้ชมฟรี #เรื่องเล่า #ดราม่า #ai #shorts

หนังเรื่องแรกที่เขียนขึ้นด้วย AI 100% โดนดราม่าและจะปล่อยให้ชมฟรี #เรื่องเล่า #ดราม่า #ai #shorts

How does he do this? 😮 I'm shocked 🤷‍♀️ #Shorts #experiment #ArtioMix

How does he do this? 😮 I'm shocked 🤷‍♀️ #Shorts #experiment #ArtioMix