Go beyond Text with Google APIs
ฝัง
- เผยแพร่เมื่อ 29 ก.ย. 2024
- Join Raphaël Semeteys, DevRel at Worldline, in the fourth episode of the tutorial series "GenAI's Lamp," focusing on Generative Artificial Intelligence. This episode is dedicated to Google APIs to manipulate text and image in a multimodal way.
🔍 What's Inside:
- Demonstrations of imagen model's image generation and imagetext model's image captioning capabilities.
- Use of visual question answering to interact with images and receive relevant answers.
- Introduction of Gemini Pro and Gemini Pro Vision models from Google DeepMind for text, image, and potentially audio and video reasoning.
- Showcasing Gemini Pro for combining text and code input, and Gemini Pro Vision for instructions derived from images.
- Exploration of multimodal embedding for advanced applications like image classification and search.
- Conclusion highlighting the evolution of AI models towards multimodal functionalities and future prospects.
🔗 Associated content
github.com/wor...
📚 Resources of the video
jupyter.org
cloud.google.c...
🔗 Follow Raphaël
dev.to/raphiki/
github.com/rap...
/ raphaelsemeteys
/ raphaelsemeteys
www.semeteys.org
🔗 Follow us
blog.worldline...
/ worldlinetech