LlamaIndex Workshop: Multimodal + Advanced RAG Workhop with Gemini

LlamaIndex

มุมมอง 9 187

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 15 พ.ย. 2024

ความคิดเห็น • 14

@lawrencetsang5387 10 หลายเดือนก่อน ⁺¹⁰
On the question about Paul Graham's wife at th-cam.com/video/fdpaHJlN0PQ/w-d-xo.html, I missed the chance to explain that the Google AQA model actually did its job by *not* saying that Jessica Livingston was his wife because the Paul Graham essay does not say so. Although it is the right answer, it was not an answer that can be derived from the provided source text. So, the Google AQA model demonstrates its ability to ground its response to the provided source!
@chrsl3 10 หลายเดือนก่อน
It would be nice if always a super-clear answer was given, like: "The provided document does not contain info about this."
@chaoticblankness 10 หลายเดือนก่อน ⁺¹
### Summary:
In this special edition of The W index webinar series, the focus was on presenting multimodal and advanced retrieval-augmented generation (RAG) use cases utilizing Google's API offerings, specifically the Google Gemini and Llama index. The session provided insights into semantic retrieval and how to build an advanced RAG with L index components, followed by a workshop on creating multimodal use cases with Google Gemini and Llama index.
#### Part 1: Advanced RAG with Llama Index and Google Gemini
**Presenters:** Lawrence, Michael, and Sher from Google Labs
The presentation covered RAG use cases for both novice and advanced users, including:
- A simple RAG pattern introduction for context setting.
- Google's developer RAG offerings.
- Advanced techniques for customizing use cases and improving quality.
- A demonstration of the RAG process.
**Simple RAG Pattern:**
- Ingestion phase with embeddings and Vector store.
- Retrieval step with user query and Vector store.
- Response synthesis with L to arrive at an answer.
**Google's Offerings:**
- Google Vector store - a managed Vector database and embeddings, designed for simplicity, flexibility, and production readiness. It's optimized for a small corpus of 1 million chunks.
- AQA (Attributed Question Answering) model - provides grounded answers, attributions, answerability probability, voice styles, and safety settings.
**Advanced Techniques:**
- Breaking down complex queries into focused sub-questions for better retrieval.
- Re-ranking to refine the retrieval process by comparing textual content in the question and retrieved documents.
**Demonstration:**
- A live demo showed how Google's AQA model and Llama index can be used to answer complex questions and handle cases where an answer is not available in the provided documents.
#### Part 2: Multimodal RAG with Google Gemini and Llama Index
**Presenters:** Jerry and Howan from L index
This section focused on leveraging multimodal data (text and images) to enhance RAG use cases. The presenters discussed the integration of the DEI Pro visual model and the L index, which supports text and image inputs to generate text outputs.
**Multimodal RAG:**
- Indexing both text and images.
- Retrieving relevant information using queries that include text and/or images.
- Re-ranking and synthesizing responses that incorporate multimodal data.
**Image Indexing:**
- Extracting structured text from images using a multimodal model.
- Generating image embeddings and storing them in a vector store.
**Multimodal Retrieval and Generation:**
- Retrieving and synthesizing responses based on text and image inputs.
- Using structured data extraction to create structured metadata from images.
- Leveraging this structured output to build a knowledge base for RAG.
**Demonstration:**
- A case study showed how Google Maps screenshots of restaurants were used to extract structured metadata, which was then indexed and used to answer queries about restaurant recommendations, including nearby tourist places.
**Final Q&A:**
- The possibility of fine-tuning Gemini for improved capabilities.
- Uncertainty about Gemini's ability to process video and audio.
The webinar ended with encouragement for the audience to provide feedback and explore the shared notebooks.
10 หลายเดือนก่อน ⁺³
All these techniques work quite fine for general content and knowledge. Now, for niche domains, the problems pop-up. In particular the pre-trained encoders lack accuracy and the VQA is not very helpful. The fine-tuning of the encoders is mandatory... but here again the curse of labelelling is present. Despite the size of the datasets for FT is less than for pre-training, it is still a big challenge for many companies. Again and again, the source of progress is within the labeled data and the labeling resources which are now made of Subject Matter Experts.
@EricB1 10 หลายเดือนก่อน ⁺²
Is the code shared somewhere?
@ramih5488 9 หลายเดือนก่อน
When we create a simple google index in the first simple usecase, which google region is this index created?
@chrsl3 10 หลายเดือนก่อน
Is the Google code already available for developers in the Google Cloud?
@mrchongnoi 10 หลายเดือนก่อน ⁺¹
where ae the slides located ?
@RuturajHange-k2z 10 หลายเดือนก่อน
is there a way by which we can retrieve images from a folder of images using text query? Using gemini not openai
@unclecode 10 หลายเดือนก่อน
Very helpful, thanks, and can you share the code for the first demo?
@mutumjagat 9 หลายเดือนก่อน
Can we have the code notebook link ?
@Jmstr-p6h 10 หลายเดือนก่อน
Great content thx. Can you share the slides?
@qingsongyao4974 9 หลายเดือนก่อน
how to create google index in GCP?
@karanv293 8 หลายเดือนก่อน
did you figure this out? wish they didnt leave that part out lol

ต่อไป

เล่นอัตโนมัติ

How to build Multimodal Retrieval-Augmented Generation (RAG) with Gemini