One question -- it seems to me like each embedding / vector would contain a different number of dimensions. Trying to establish a master-type vector template with every single conceivable dimension represented would involve mainly blank space and be a computational nightmare (hence PCA and other dimensional reduction techniques). So if something complex like "the US Constitution" has thousands of dimensions and something like "grass" has hundreds of dimensions, how can they be compared, seeing as they reside in spaces with different numbers of dimensions? Like, you can't find the distance between an object that resides in 7 dimensional space and an object that resides in 11 dimensional space, right?
One question What I understand is the plain text is send to LLM and LLM will return the summarized text . But if we are sending confidential info to LLM, then it would be a breach. In that case we can create our own LLM ?
Your subtitles, for ADHD people like me, makes it very, very hard to focus on the actual content. To be able to turn it off would be great. Also, it hides content and is unreadable, it’s so quick an$ distracting…. 😢
What? Since when did English become an ambiguous language? From my understanding, it's the opposite. Ambiguous languages are those like Semitic languages. Just the fact you can use a different English word to clarify an ambiguous English word is unambiguous.
In terms of machine learning, English is ambiguous. We communicate mostly contextually. If you give one set of prompts to two or three different LLMs, the output will be wildly different. Especially in Image generation and the likes.
@@djpete2009 True, but I wouldn't call English an ambiguous language. Sure, ambiguity exists, but compared to other natural languages, it's rather unambiguous.
@@JeffreyMyersII English is spoken differently in England, NI, Wales and Scotland. The regional differences alone is so variegated its not even the same language. But it is. Intrinsically.
Do I have to use a vector database, how about using gpt to generate a ElasticSearch query and having the proprietary data in ES and then do the factual context padded gen model query @zackproser
Absolutely love it! Thank you for breaking it all down step by step for us-I truly appreciate the effort you put into this. 😊
Great job explaining and illustrating how RAG helps improve LLM responses and how Pinecone enables RAG at scale.
Your explanation of embedding spaces was on point! Thanks for sharing.
Thanks so much for the feedback, and glad it was useful!
I really enjoyed your explanation of what is the vector DB is and it’s role in LLM world.
Thanks so much!
Very great video explaining a complex concept
Very nice explanation. Thanks Zack for sharing
Glad you found it useful! 🙏
your explanation was great thanks. please keep go on.
One question -- it seems to me like each embedding / vector would contain a different number of dimensions. Trying to establish a master-type vector template with every single conceivable dimension represented would involve mainly blank space and be a computational nightmare (hence PCA and other dimensional reduction techniques). So if something complex like "the US Constitution" has thousands of dimensions and something like "grass" has hundreds of dimensions, how can they be compared, seeing as they reside in spaces with different numbers of dimensions? Like, you can't find the distance between an object that resides in 7 dimensional space and an object that resides in 11 dimensional space, right?
24:55 Doing this for human written summaries of show X is actually a great side project idea...
Totally! If you create something along those lines be sure to let us know
One question
What I understand is the plain text is send to LLM and LLM will return the summarized text .
But if we are sending confidential info to LLM, then it would be a breach. In that case we can create our own LLM ?
what an excellent explanation
So glad it was helpful!
Those subtitles are so incredibly annoying and entirely unnecessary. TH-cam already has an _optional_ captions function
Cool stuff :)
Glad you liked, thanks!
Is this the official channel?
It is, yes!
I like your subtitle style! I can focus on your voice more
Glad you found it useful!
Good content, but the PPT Slides are hazy.. hard to understand
Thanks for the feedback - I'll look to improve that next time around!
Unwatchable with the baked-in closed captions. You are also subverting assistive technologies when not using proper closed captions.
Thanks!
But please don't use the on-screen text, it's horrible! Can't watch the video because of it, soooo annoying!!!!
Understood, thanks for the feedback!
Your subtitles, for ADHD people like me, makes it very, very hard to focus on the actual content. To be able to turn it off would be great. Also, it hides content and is unreadable, it’s so quick an$ distracting…. 😢
What? Since when did English become an ambiguous language? From my understanding, it's the opposite. Ambiguous languages are those like Semitic languages. Just the fact you can use a different English word to clarify an ambiguous English word is unambiguous.
In terms of machine learning, English is ambiguous. We communicate mostly contextually. If you give one set of prompts to two or three different LLMs, the output will be wildly different. Especially in Image generation and the likes.
@@djpete2009 True, but I wouldn't call English an ambiguous language. Sure, ambiguity exists, but compared to other natural languages, it's rather unambiguous.
@@JeffreyMyersII English is spoken differently in England, NI, Wales and Scotland. The regional differences alone is so variegated its not even the same language. But it is. Intrinsically.
The baked-in subtitles are SO distracting. Really spoils an otherwise good presentation
You can turn them off by closing your eyes
Do I have to use a vector database, how about using gpt to generate a ElasticSearch query and having the proprietary data in ES and then do the factual context padded gen model query @zackproser