I love how deep, positive and open the conversation is even though these two companies are the two biggest competitors in the space. I leave this video with a better hope for the future but also richer in how to better use their work and use to my advantage. Thanks for making this talk happen Lenny!
This was absolutely my favorite of all the talks on stage. The two product leaders of competing companies were one-upping each other on what the future would look like, and yet they framed it so simply that it finally made me realize that most of this is going to hit us much sooner. This was a couple days after Computer use was released. And they were talking about how AI model behavior would next be part of the acceptance criteria for things we build.
Absolutely loved the insights shared in this conversation! It's incredible to hear how AI product management is evolving, especially the way these teams are dealing with unpredictability and human-like interactions with models. Truly inspiring and makes me excited about the future of AI!
Don't want to be a bummer here but I felt both Mike & Kevin gave weak answers on tool use & o1 (reasoning). The system 1 analogy / pizza-ordering demo is just something pretty much folks have heard from ML researchers elsewhere. I have played with reasoning and deeper reasoning comes not just from automatically using more reasoning time (aka 5 mins of thought) but is often necessitated when you have low confidence you can answer it well without additional information (like evidence gathering) - deeper reasoning requires you to also be able to identify when you need to use reasoning e.g. tasks which you have low confidence of completion with lookup style (system 1) reasoning.
In the context of Large Language Models (LLMs), an eval (short for “evaluation”) is a systematic way to assess the model’s performance on specific tasks or objectives. Evaluations help developers understand how well the model is functioning and guide improvements.
It’s a way to test efficacy of a Model for your niche . You can customize evals for different niches , so you can have evals that test your customer service model for example
Amazing fireside chat. Very insightful conversation and it's really cool how two competitors openly talked about the topics.
I love how deep, positive and open the conversation is even though these two companies are the two biggest competitors in the space. I leave this video with a better hope for the future but also richer in how to better use their work and use to my advantage. Thanks for making this talk happen Lenny!
This was absolutely my favorite of all the talks on stage. The two product leaders of competing companies were one-upping each other on what the future would look like, and yet they framed it so simply that it finally made me realize that most of this is going to hit us much sooner. This was a couple days after Computer use was released. And they were talking about how AI model behavior would next be part of the acceptance criteria for things we build.
0प्प0हपोप्प
Absolutely loved the insights shared in this conversation! It's incredible to hear how AI product management is evolving, especially the way these teams are dealing with unpredictability and human-like interactions with models. Truly inspiring and makes me excited about the future of AI!
I am 5min in and the amount of “likes” is very distracting
Don't know why but I can't forgive Sarah for shilling chain runners when crypto was all the hype.
hype merchant
The fact that this content is public, amazing. Thanks, Lenny!
Good talk, but the number of LIKEs is too damn high. 542 in total, 13.55 times per minute, every 4.43 seconds on average.
OUCH, no way???
The end of that great conversation should be conversation between ChatGPT and Claude 😅
Kevin explaining multi agent systems 32:30
is this OpenAI's way of saying they are out of ideas?
I love that mike tries to indecent proposal kevin in front of the audience. "Come interview at anthropic..."
Thank you!
Don't want to be a bummer here but I felt both Mike & Kevin gave weak answers on tool use & o1 (reasoning).
The system 1 analogy / pizza-ordering demo is just something pretty much folks have heard from ML researchers elsewhere.
I have played with reasoning and deeper reasoning comes not just from automatically using more reasoning time (aka 5 mins of thought) but is often necessitated when you have low confidence you can answer it well without additional information (like evidence gathering) - deeper reasoning requires you to also be able to identify when you need to use reasoning e.g. tasks which you have low confidence of completion with lookup style (system 1) reasoning.
you cooked with this line up!!
What is eval?
I’m curious as well
Evaluation I think when a feature is being brainstormed.
@@kavilrawatit’s a way to evaluate the strength of an model systematically.
In the context of Large Language Models (LLMs), an eval (short for “evaluation”) is a systematic way to assess the model’s performance on specific tasks or objectives. Evaluations help developers understand how well the model is functioning and guide improvements.
It’s a way to test efficacy of a Model for your niche .
You can customize evals for different niches , so you can have evals that test your customer service model for example
Was this discussion released as a podcast episode? If so could someone link it?
Only here on YT
she never answers my DMs
Watching this makes me hope AI goes the way of crypto 2023 and just has a giant bust.
Was it Mike who invented Likes in Instagram?:)
These people are so annoying
Second
Sarah is so exotic