If there's something that I'm curious about learning more about from you re: post-training, it would be your thoughts on tool use. I basically don't know much more than the Gorilla/Toolformer papers. I think the topic sort of dovetails with reasoning/verification when you're talking about tools like calculators and code executors, but there's also search/retrieval-type tools. I think some months ago I remembered you not being as into RAG as I would have expected you to be, wonder if that's changed at all. I too have always loved CAI and am excited to play (maybe with little SmolLM models) in these "not-traditionally-verifiable domains" using RL -- character-related things!
such a valuable video, thank you for this great content
Thank you for sharing! So much to learn.
If there's something that I'm curious about learning more about from you re: post-training, it would be your thoughts on tool use. I basically don't know much more than the Gorilla/Toolformer papers. I think the topic sort of dovetails with reasoning/verification when you're talking about tools like calculators and code executors, but there's also search/retrieval-type tools. I think some months ago I remembered you not being as into RAG as I would have expected you to be, wonder if that's changed at all.
I too have always loved CAI and am excited to play (maybe with little SmolLM models) in these "not-traditionally-verifiable domains" using RL -- character-related things!
We're working on tools in 2025!