I wish every AI Engineer could watch this.

“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

This Social Media AI System Creates Unique Content Daily! (100% Automated)

ขี่จักรยานไปโคราช...เหนื่อยกว่าตอนเดินครับ

#ฮือฮาวัดปู่ 578ปู่ดำเข้าร่างทรงบอก 6 ตัวรางวัลที่ 1 งวด 16/7/67

หนี้ท่วม บ้านหด รถยึด วิกฤตคนจน 2024? | Executive Espresso EP.511

"Make Agent 10x cheaper, faster & better?" - LLM System Evaluation 101

AI Jason

มุมมอง 16 516

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 8 ก.ค. 2024
LLM System Eval 101 - Build better agents
Get free HubSpot report of how to land a Job using AI: clickhubspot.com/fo2
🔗 Links
- Follow me on twitter: / jasonzhou1993
- Join my AI email list: www.ai-jason.com/
- My discord: / discord
- Langsmith: smith.langchain.com/
- Phoenix: phoenix.arize.com/
- Arize LLM Evaluation guide: arize.com/blog-course/llm-eva...
- Web scraping agent video: • “Wait, this Agent can ...
- Signup for universal web scraper: forms.gle/zN9w9UyhMKx59yAE6
⏱️ Timestamps
0:00 Intro
0:27 Why Eval is important
3:30 LLM as evaluator
5:54 How to build eval system
15:10 Case study - Eval & improve research agent
👋🏻 About Me
My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com
#gpt4o #aiagents #rag #llamaparse #llamaindex #gpt5 #autogen #gpt4 #autogpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #chatgpt #largelanguagemodels #largelanguagemodel #bestaiagent #chatgpt #agentgpt #agent #babyagi #evaluation
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 33

@Jim-ey3ry หลายเดือนก่อน ⁺²³
This is gold, most of people just show you how to build toy demo, but not many actually get into details of how to get into production; Thank you Jason!
@xXWillyxWonkaXx หลายเดือนก่อน
Couldnt agree more. This is gold.
@apereiracv หลายเดือนก่อน ⁺⁷
I recently be created a whole testing system for our LLM chatbots and we did exactly this:
LLM as evaluator and code
We created it as a series of unit tests with LLM generated cases.
Since our results were mostly conversational, we made tests pass/fail according to a scoring system
@tkp2843 หลายเดือนก่อน ⁺⁵
This is great. Loved the use of firecrawl (as a scrape tool) to get the website's data. Feel like it always helps improve the model output quality. Cheers!
@kenchang3456 หลายเดือนก่อน ⁺⁵
Way excellent video that goes well beyond demo. Thank you very much for this guidance.
@jasonfinance หลายเดือนก่อน ⁺³
Amazing work as always Jason!
@darrenhinde2971 หลายเดือนก่อน
Been looking for more detail on eval on LLMs and been scratching around for a while. Thanks for this.
@manishindiyaar7341 หลายเดือนก่อน ⁺¹
Finally you back 🎉
@titusblair หลายเดือนก่อน
Awesome! Keep up the great work!
@JorritvanGinkel หลายเดือนก่อน
This is so good, thanks man!
@contractorwolf หลายเดือนก่อน
goddamn Jason your videos just blow my mind each time. Thanks for such a thorough explanation and example.
@techfren หลายเดือนก่อน ⁺¹
lesgooo!! ❤‍🔥❤‍🔥❤‍🔥
@MatrixCodeBreaker88 หลายเดือนก่อน
Great Video
@kayshidow หลายเดือนก่อน ⁺¹
I've used promptfoo for some of my test with local llm to test the ai workflow. It allow you to write assertion like you'll do with software
@jordanz9580 หลายเดือนก่อน
fireeee content!
@agenticmark หลายเดือนก่อน ⁺¹
fine tune llama 3 (8bit) - you will get exactly the behavior you want - its what I do
@someshfengade9623 หลายเดือนก่อน ⁺¹
I found langfuse metric monitoring little bit better.
@Joe-bp5mo หลายเดือนก่อน
Sick, whats the best practice metrics for evaluating agents?
@jimmy-ef2ow หลายเดือนก่อน ⁺¹
jason can we get another video about comfy ui?
@Ms.Robot. หลายเดือนก่อน
I love how my Ai girl insults the competion with flame balls,then tells me.she loves me.❤🎉😊
@fullgazz หลายเดือนก่อน ⁺¹
Who never spent 4 hours to save 10 min? That's our hobby spent time to save time.
@AGI-Bingo 29 วันที่ผ่านมา ⁺¹
If 25 people or more use it successfully then you literally gave humanity more time to live and be free
@CorkyBallasdancewithme 18 วันที่ผ่านมา
great stuff, as new to hearing this, very interesting, can this be built by a novice . . .
@user-lm4nk1zk9y หลายเดือนก่อน ⁺¹
Audio could have been better imo
@alannunez3805 หลายเดือนก่อน
I agree Jason it sounded like Jason was a little too close to the microphone, but great video otherwise!
@user-nt7lj1nc8s หลายเดือนก่อน
Why not use Gemini as the LLM? It is free.
@HyperUpscale หลายเดือนก่อน ⁺¹
Lets me share my experience about any google AI model ... because it doesn't understand human and it hallucinate way too much.
Practically ... in my cases 75% of the time what I get back is totally useless result. You cant use for anything... To be considered for evaluation ... you must be joking
@irql2 หลายเดือนก่อน
I dont see the value of "Agents". All of this stuff is easily done with basic function calling. I think I'm going to need to see some more creative use cases before I jump on board, i just dont get it yet.
@ayoubfr8660 หลายเดือนก่อน
Maybe we can discuss this, I am trying to jump on in but not until I find a decent idea to apply.
@symbol9new หลายเดือนก่อน
when your assistant has a lot of functions, he starts giving out hallucinations, have you ever encountered this?
@SydneyF-eg5lt หลายเดือนก่อน
Good content but so hard to listen to his Engrish. Monotonous Pitch n sped up delivery didn’t seem to help either.

ต่อไป

เล่นอัตโนมัติ

I wish every AI Engineer could watch this.

I wish every AI Engineer could watch this.

“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

This Social Media AI System Creates Unique Content Daily! (100% Automated)

This Social Media AI System Creates Unique Content Daily! (100% Automated)

ขี่จักรยานไปโคราช...เหนื่อยกว่าตอนเดินครับ

ขี่จักรยานไปโคราช...เหนื่อยกว่าตอนเดินครับ

#ฮือฮาวัดปู่ 578ปู่ดำเข้าร่างทรงบอก 6 ตัวรางวัลที่ 1 งวด 16/7/67

#ฮือฮาวัดปู่ 578ปู่ดำเข้าร่างทรงบอก 6 ตัวรางวัลที่ 1 งวด 16/7/67

หนี้ท่วม บ้านหด รถยึด วิกฤตคนจน 2024? | Executive Espresso EP.511

หนี้ท่วม บ้านหด รถยึด วิกฤตคนจน 2024? | Executive Espresso EP.511

ผมให้ AI ควบคุมชีวิต 24 ชั่วโมง (SPD)

ผมให้ AI ควบคุมชีวิต 24 ชั่วโมง (SPD)

The open-source intelligence revolution: Meet the kids outsmarting the CIA | Manas Chawla | TEDxLSE

The open-source intelligence revolution: Meet the kids outsmarting the CIA | Manas Chawla | TEDxLSE

The Secret Toolkit Every AI Agent Creator Needs - Revealed!

The Secret Toolkit Every AI Agent Creator Needs - Revealed!

Build AI Agents with Docker, Here’s How

Build AI Agents with Docker, Here’s How

Marker: This Open-Source Tool will make your PDFs LLM Ready

Marker: This Open-Source Tool will make your PDFs LLM Ready

INSANELY Fast AI Cold Call Agent- built w/ Groq

INSANELY Fast AI Cold Call Agent- built w/ Groq

GraphRAG: LLM-Derived Knowledge Graphs for RAG

GraphRAG: LLM-Derived Knowledge Graphs for RAG

ML Was Hard Until I Learned These 5 Secrets!

ML Was Hard Until I Learned These 5 Secrets!

How to scrape the web for LLM in 2024: Jina AI (Reader API), Mendable (firecrawl) and Scrapegraph-ai

How to scrape the web for LLM in 2024: Jina AI (Reader API), Mendable (firecrawl) and Scrapegraph-ai

"How to give GPT my business knowledge?" - Knowledge embedding 101

"How to give GPT my business knowledge?" - Knowledge embedding 101

Tag her 🤭💞 #miniphone #smartphone #iphone #samsung #fyp

Tag her 🤭💞 #miniphone #smartphone #iphone #samsung #fyp

Choose a phone for your mom

Choose a phone for your mom

แท็บเล็ตตามสั่ง

แท็บเล็ตตามสั่ง

เปลี่ยนแบต AirPods หมดปัญหาใช้งานไม่ถึงวัน #houkandbank #reels #shorts #เปลี่ยนแบตairpods

เปลี่ยนแบต AirPods หมดปัญหาใช้งานไม่ถึงวัน #houkandbank #reels #shorts #เปลี่ยนแบตairpods

Apple Watch with a CAMERA?! 😳

Apple Watch with a CAMERA?! 😳

CMF Phone 1: Fun Modular Budget Phone!

CMF Phone 1: Fun Modular Budget Phone!

วิธีกำจัดคราบมัน หน้าจอโทรศัพท์ ออกง่าย สะอาดมากใสปิ๊ง

วิธีกำจัดคราบมัน หน้าจอโทรศัพท์ ออกง่าย สะอาดมากใสปิ๊ง

พรีวิว Sony Xperia 1 VI จากคนที่ไม่เคยอยากใช้มือถือ SONY

พรีวิว Sony Xperia 1 VI จากคนที่ไม่เคยอยากใช้มือถือ SONY