How vector search and semantic ranking improve your GPT prompts

Microsoft Mechanics

มุมมอง 18 412

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 มิ.ย. 2024
Improve the information retrieval process, so you have the most optimal set of grounding data needed to generate useful AI responses. See how Azure Cognitive Search combines different search strategies out of the box and at scale - so you don’t have to.
Keyword search-match the exact words to search your grounding data
Vector search-focuses on conceptual similarity, where the app is using part of the dialogue to retrieve grounding information
Hybrid approach-combines both keyword and vector searches
Semantic ranking-to boost precision, a re-ranking step can re-score the top results using a larger deep learning ranking model
Pablo Castro, Azure AI Distinguished Engineer, shows how to improve the quality of generative AI responses using Azure Cognitive Search.
► QUICK LINKS:
00:00 - How to generate high-quality AI responses
01:06 - Improve quality of generative AI outputs
02:56 - Why use vectors?
04:57 - Vector Database
06:56 - Apply to real data and text
08:00 - Vectors using images
09:40 - Keyword search
11:22 - Hybrid retrieval
12:18 - Re-ranking
14:18 - Wrap up
► Link References
Sample code available at aka.ms/MechanicsVectors
Complete Copilot sample app at aka.ms/EntGPTSearch
Evaluation details for relevance quality at aka.ms/ragrelevance
► Unfamiliar with Microsoft Mechanics?
As Microsoft's official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft.
• Subscribe to our TH-cam: / microsoftmechanicsseries
• Talk with other IT Pros, join us on the Microsoft Tech Community: techcommunity.microsoft.com/t...
• Watch or listen from anywhere, subscribe to our podcast: microsoftmechanics.libsyn.com...
► Keep getting this insider knowledge, join us on social:
• Follow us on Twitter: / msftmechanics
• Share knowledge on LinkedIn: / microsoft-mechanics
• Enjoy us on Instagram: / msftmechanics
• Loosen up with us on TikTok: / msftmechanics
#copilot #generativeai #vector #gpt
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 13

@CollaborationSimplified 7 หลายเดือนก่อน ⁺⁴
These are great sessions!! It really does help to better understand what's happening under the hood - well done Pablo, Jeremy and production team!
@acodersjourney 8 หลายเดือนก่อน
Your videos make software dev more accessible.
@timroberts_usa 7 หลายเดือนก่อน
thank you for this clarification-- much appreciated.
@rjarpa หลายเดือนก่อน
Great video document, now is easy to understand the solution stack.
@uploaderfourteen 8 หลายเดือนก่อน ⁺³
Jeremy - Great to see your work on this advancing so well!
One of the issues still outstanding with vector or keyword based retrieval is that, by only retrieving chunks, you aren't providing the model with the deeper 'train of thought' or 'line of reasoning' that characterises your data source as a whole (the semantic horizon is limited to the chunk size). As a consequence it seems that you can't get the model to reason over the entire data source. For example, let's imagine that your data source was Moby Dick (and let's pretend this was outside the training data)... neither vector or keyword search would allow you to ask "what is the moral of the story", as this requires developing a meta-narrative concept over all possible chunks. The only way current way language models can do this is to somehow fit the whole novel in context - but even then there are issues with how attention is dispersed over the text. In time it would be great to see whether Microsoft Mechanics can innovate around this problem somehow, as being able to reason over the full non-chunked data source would unlock much more intelligent and useful insights.
@fallinginthed33p 8 หลายเดือนก่อน ⁺¹
Maybe there could be multiple passes to combine different vector results into one large query that attempts to answer the user's question. That context window limit is a real problem. Human brains remember both tiny details selectively and the overall gist of a document.
@uploaderfourteen 8 หลายเดือนก่อน ⁺¹
@@fallinginthed33p Agreed! I'd be interested to see how well combining vector results works. Alternatively, we know that LLMs can determine the 'gist' of a document if it's in their training data. Based on that observation, I'd like to see (a) some deep research into exactly how the model extracts that 'gist' from its training set (I'm not sure this is fully understood yet), (b) decompose that process into its fundamental steps, and then (c) try to replicate that process through a kind of pseudo-training. My hunch is that there is, somewhere, a relatively easy solution to this... the human brain seems to nail it very easily even with very little training data, so there must be a trick we're missing in respect of LLMs. I can skim read a small sized book in very short time (barely taking in the details) and then make a fairly accurate overall appraisal of its content, purpose, key message etc... LLMs should in theory be able to outclass this through some fairly straightforward mechanism as yet not understood.
@fallinginthed33p 8 หลายเดือนก่อน ⁺¹
@@uploaderfourteen I think in a nutshell, humans are doing both training and inference every time we read. Our context window includes the current document and past documents, and each pass updates the past documents store with new data and weights. LLMs can't do that yet: each inference run is a blank slate that depends heavily on trained weights, but to update those weights through training requires a huge amount of computing power.
@pupthelovemonkey 7 หลายเดือนก่อน ⁺¹
@@fallinginthed33p Do the re-ranking steps and human feedback not feed back into the model to update its weights? Like for example a conversation on Bing Chat where you successfully drill down into a complex answer that takes a bit of back and forth to get to a solution like a coding problem where Bing Chat was giving you a solution that has a small error / punctuation mistake.
@fallinginthed33p 7 หลายเดือนก่อน
@@pupthelovemonkey It might. It's known that OpenAI uses the chat interactions on its web interface to train its models. I don't know about Microsoft though. You can already do something similar using Lora techniques on open source models.
Training doesn't happen immediately, unlike with human brains. You need to get updated weights or a training dataset and then spend hours or days running a training job.
@TheUsamawahabkhan 8 หลายเดือนก่อน
Love it. want to see llama on azure with cognitive search and also can we plug external vector database as well with CS?
@hughesadam87 6 หลายเดือนก่อน
Are these AI tools available in Azure govcloud or just commercial?

ต่อไป

เล่นอัตโนมัติ

What runs GPT-4o? | Inside the Biggest AI Supercomputer in the cloud with Mark Russinovich