The Phi series never fails to surprise me, combined with ONNX runtime its really portable and powerful. I'm using Phi-3.5 instruct at the moment for enterprise clients and its performing very well, Looking forward to adapting the vision model into the mix too. Fantastic work MSR team, keep up the amazing work! Small, Smart and Scalable for the win! 🚀
The phi models are truly impressive, excited to see the future work around embodiment. Only hope in future is that frozen weights at different training stages are available to download
Microsoft hasn’t contributed in the most widely used format (GGUF) though meaning unless the community does the work it won’t be usable in common tooling such as llama.cpp, Ollama etc
Fantastic presentation! I’m particularly interested in how the F3 Vision model's performance compares to other vision-language models in terms of scalability for different hardware platforms. It seems like a game-changer for integrating vision capabilities with language understanding. Also, how do you see the model evolving to address emerging challenges in diverse data contexts? Looking forward to seeing its future applications and updates!
Phi-3 is absolutely incredible, super capable and yet resilient to misuse and always kind and understanding. Magical at this size already and then it's even good at math. However, I think Microsoft should cut the parameter sizes of the different versions more smartly in regards to current device hardware.
The fucking contrast of the text transparency looks straight garbage microsoft needs to fire all the Modern art majors on their design team in the next layoff round
The Phi series never fails to surprise me, combined with ONNX runtime its really portable and powerful.
I'm using Phi-3.5 instruct at the moment for enterprise clients and its performing very well, Looking forward to adapting the vision model into the mix too.
Fantastic work MSR team, keep up the amazing work!
Small, Smart and Scalable for the win! 🚀
a realistic voice decoder along that image encoder is all we need in rest. Hope meta guys are not going to be late at the small vision models party.
The phi models are truly impressive, excited to see the future work around embodiment. Only hope in future is that frozen weights at different training stages are available to download
open source, lets go!
Microsoft hasn’t contributed in the most widely used format (GGUF) though meaning unless the community does the work it won’t be usable in common tooling such as llama.cpp, Ollama etc
what do you mean @@sammcj2000
Great and concise explanation, thanks!
Fantastic presentation! I’m particularly interested in how the F3 Vision model's performance compares to other vision-language models in terms of scalability for different hardware platforms. It seems like a game-changer for integrating vision capabilities with language understanding. Also, how do you see the model evolving to address emerging challenges in diverse data contexts? Looking forward to seeing its future applications and updates!
This was a detailed and interesting video. Congrats on the achievement.
Phi-3 is absolutely incredible, super capable and yet resilient to misuse and always kind and understanding. Magical at this size already and then it's even good at math.
However, I think Microsoft should cut the parameter sizes of the different versions more smartly in regards to current device hardware.
phi-3 vision has the same structure of PaliGemma , and both are open sourced , great !
brilliant
specswriter AI fixes this. Highly capable small vision model.
SO how well does this for extraction from pdfs in comparison to OCR?
awesome
How can i join the microsoft research team? that's one of my life-goals, and i will reach it.
Needs a GGUF!
microsoft catchin up
LFG
okay great, but i have to turn on subtitle now.
The fucking contrast of the text transparency looks straight garbage microsoft needs to fire all the Modern art majors on their design team in the next layoff round