Weaviate Co-Founder here. Thank you for featuring Weaviate and creating this awesome video. It covers a lot in just 6min and I think it's a really cool intro to the topic and really shows the power of Weaviate. Also really like that you hinted at more traditional db features such as filtering/sorting/etc. It's those that make Weaviate super powerful for real-life applications. Appreciate the blutness about not being a fan of the builder pattern in the JS/TS client. Would love to hear some more feedback from you and tap into the collective JS/TS experience of your followers to help us design a better , more TS-native API on the Weaviate client. The clients are versioned separately from the server, and we're totally cool with releasing a new major version of the client if it can help improve the DX for anyone that uses JS/TS in their daily work.
A problem I commonly face with the builder pattern is that it's unclear to me which parameters are mandatory and which are optional, so I may end up forgetting to set a mandatory parameter and run into nullpointer exceptions at runtime.
There's a small army of us all racing toward the same goal now (AI all the things), but for most of us it's a new world; having the 'traditional' features as a frame of reference is what we need to bridge the gap-thank you.
@@آزيحسن agree. if it was a step builder this would not be a problem. You are only allowed to build after all mandatory params functions have been called.
This is by far, one of the best videos you ever made. Keep us updated !! You channel could become the new tech news reporter in video format. Thanks for the good work
1. Open-source and self-hosting 2. AFAIK pinecone requires you to vectorize your own data. For most that means paying for something like the OpenAI embeddings API. But with Weaviate we just ran our own ResNet locally.
@@beyondfireship just what i expected, people running locally their own AI for their own purposes, i see more this common than pay for a bigger and better AI suscription where one can not manipulate or use for all.
This can be also done, using CLIP model. And indexing vectors with KNN type in opensearch and using cosine similarity to find similar images. With the CLIP model you can search images by text and image both as input.
I think the main difference is that, if you do naïve cosine similarity you probably have to compare it with all images in the DB, hence O(n). In HNSW it becomes O(log(n)) since the embeddings data is "somewhat sorted".
Yeah, vector databases have been around for quite a while, but the availability of open source vectorizer models made vector search more of a viable use case, so we're seeing more and more proper databases with easy to handle clients and all. Unfortunately, the biggest issue with vector search is not the database side of things. They're mostly all the same, relying on the same compression and distance computation algorithms. The big issue is vectorization. For some use cases like text or image similarity you can use huggingface models as-is. But for more business specific things, you'd have to train your own model and that is tough AF.
I find them boring, but Jeff can do whatever he wants -- and he will always have my undivided attention because he is the electronic prophet we have all been waiting for
I mean, I'm tired of hearing about AI in general, but Jeff can make basically anything fun and interesting so I don't mind regardless of what he makes his videos about.
Please keep making these videos. They are non-stop, but so is the pace of AI right now, so they're the only thing helping me feel like I'm staying current.
Vectors are so 2022 Q4. Ive come up with trillion dollar concept. Text File Data. You will store all of your data in a single file where where each col will be separated by something like a tab code or comma and a row will be separated by a new line. Beta names are Tabbed Text Files or Comma Text Files. Reply if you’re ready to invest.
I thought "in the next 5 minutes" is a joke relating to the complexity of a search engine. Yet here I am 5 minutes layer knowing all the steps. Wild times.
That was a cool video. I am new to this game and while watching this I get a feeling iofwow that is really cool thing you have shown and I have no idea what he is doing but would luv to see more.
Realizing you could use this to reverse Stable Diffusion. You feed it a prompt, it generates an image, and then your reverse image search returns the images it --stole-- "learned from" the most to create its output.
I find that personally very interesting. Next step I'm looking for would be to automatically tag images. About AI, I'm happy that Fireship recon that he's talking a hecka lot about AI. But he's right. It's the new trend and it's not going to leave anytime soon. I'd just hope we can talk about it in a more neutral way than an AIpocalypse or an AIrmageddon.
Could you do a similar tutorial for a full beginner in programing (a bit slower, a bit more steps to explain each concepts), to create this image search database, to retrieve only copyright free image / cc0? Nice tuto thought 👌
It's time for a ai driven Javascript framework that runs on the edge😂 And be sure not to mix stuff up, a js driven ai Framework, would essentially be the apocalypse
I miss the good ol' days when the most stupid overused buzzword was "the cloud", like we all pretended nobody before 2007 had seen the word WAN on literally any network topography chart from the previous 30 years on it and instead insisted they were super original and clever and came up with something totally new.
Good video but I really wish you would talk slower and add a longer gap ( maybe just a 0.5 to 1 second) between edit cuts. Please bear in mind that for many English is a 2nd language. It is so frustrating having to constantly remind the video that I don't complete most of yours despite the great content. It's great that you pack so much info but it needs to be at a pace that can be more easily followed
We have to update the class name again and again in index.js otherwise it is giving me error that you have already used this class name can any one help me to get rid of this problem!
Did you just b64'd different formats of images without converting them to a certain file format and then send them to the vector db? Congratz you just uploaded Spanish, English and French sentences to your db and queried if the English sentence will be the nearest match with the English sentence...
this guy knows what he is talking about. Its like GPT but for images instead of tokenized letters, or words, or phrases, but is not generative. and does not have transforms xD but.. you get the point.
How can I reach you to discuss an app ive been building at pre dev stage with some of this in mind? I'm utilizing Gans with scikit learn and image on cloud to train the part of my app that will use ai for some valuable specific purposes, but this exact content here is what i needed for my leg heavy work to compile and filter all the exact images I need around defined search. I'd love to get your help, even if general on this for putting my pieces together. Your expertise on all the tools, systems, and use cases are such a great resource Ive had the pleasure of finding. Thanks
Hey guys, he mentioned that these images will be converted to a numerical array (like a vector), to find similarities between different images. Do any of you know how exactly that works? What can you compare in a vector to another vector? Thanks.
In vector databases, images are converted into numerical arrays called vector embeddings. These embeddings encode features of each image as a 1-dimensional array of numbers. These embeddings can be compared to one another to determine visual similarity between them. The similarity between two vectors can be measured by calculating the distance between them. There are several distance metrics that can be used, such as Euclidean distance or cosine similarity. The smaller the distance between two vectors, the more similar they are considered to be. - This answer was provided by the new Bing Chat, my new AI-powered copilot for the web
@@stevenparker2099 each number in the vector represents a different "attribute" as determined by the Img2Vec neural network. this is the "vectorizing" process. as long as the same vectorizing process is used on two different images, those vectors can be compared to one another. the exact attributes of an image used to determine "similarity" are dependent on the vectorizing process. once you have some vectors, you can measure the similarity between them the same way you would measure the distance between 2 points in space. you could make your own vectorizing process where you just count the number of red and blue pixels in an image, this would turn the image into a 2D vector. for example [28, 400] would be an image with a lot of blue in it. if this were compared to another image that is mostly red such as [100, 35] then the distance between these would be 372 while a fellow blue image like [5, 200] would have a distance of 201 indicating it is more similar.
Elasticsearch already implements vector searches, you can even combine them with a normal lexial search which is really helpfull if you have a semantical question with e.g. a named Entity.
@@rumfordc What are the challenges of storing a vector? What makes a vector database better than other databases in those areas? Edit: I believe I get it, but I'll still appreciate your views if you want to share them. One issue is that vectors are usually denser than other data. It's difficult to compress them with algorithms other DBMSs typically provide. Combine that with a billion vectors, and it gets challenging. But what gets really difficult are searching through them (e.g., finding the nearest neighbors) and doing arithmetics on them. That's where vector databases shine.
@@Mr_Yeah Yea the primary challenge is finding the nearest neighbor, for fast searching. Without an index, finding the closest neighboring vector would require computing the distance to every other vector in the database. This takes way too long so vector indexes are used to "cluster" vectors together ahead of time to reduce the number of distances that need to be calculated.
The javascript frameworks were not what drew me to your content. It was the memes and the "you can't miss this" in tech. Just saying. This here is interesting because databases, something traditionally boring, now get some attention due to this vector data and querying it. Querying high dimensional vector spaces isn't trivial, if there's lots of data and you still need it to be fast. There is some seriously advanced theory and tech in there.
Weaviate Co-Founder here. Thank you for featuring Weaviate and creating this awesome video. It covers a lot in just 6min and I think it's a really cool intro to the topic and really shows the power of Weaviate. Also really like that you hinted at more traditional db features such as filtering/sorting/etc. It's those that make Weaviate super powerful for real-life applications.
Appreciate the blutness about not being a fan of the builder pattern in the JS/TS client. Would love to hear some more feedback from you and tap into the collective JS/TS experience of your followers to help us design a better , more TS-native API on the Weaviate client. The clients are versioned separately from the server, and we're totally cool with releasing a new major version of the client if it can help improve the DX for anyone that uses JS/TS in their daily work.
A problem I commonly face with the builder pattern is that it's unclear to me which parameters are mandatory and which are optional, so I may end up forgetting to set a mandatory parameter and run into nullpointer exceptions at runtime.
There's a small army of us all racing toward the same goal now (AI all the things), but for most of us it's a new world; having the 'traditional' features as a frame of reference is what we need to bridge the gap-thank you.
@@آزيحسن agree. if it was a step builder this would not be a problem. You are only allowed to build after all mandatory params functions have been called.
This is by far, one of the best videos you ever made. Keep us updated !!
You channel could become the new tech news reporter in video format. Thanks for the good work
Would love a video comparing the different vector databases. Why did you choose this one over Pinecone?
Possibly due to Weaviate having a local docker container setup available (didn't see anything similar for Pinecone)
Pinecone is closed-source and paid
1. Open-source and self-hosting
2. AFAIK pinecone requires you to vectorize your own data. For most that means paying for something like the OpenAI embeddings API. But with Weaviate we just ran our own ResNet locally.
@@beyondfireship just what i expected, people running locally their own AI for their own purposes, i see more this common than pay for a bigger and better AI suscription where one can not manipulate or use for all.
Doesn't matter what the topic, JS somehow always gets involved lol
True lol😂😂😂
The real marvel of Computer Technology is Javascript at this point.
This is a webdev project, JS will obviously be involved.
That's why it's the GOAT the GOAT 🐐
Anything that can be done in javascript will eventually be done in javascript.
Your content constantly impresses. As a programmer and an educator myself I'm amazed how you do so much and do it well.
This can be also done, using CLIP model. And indexing vectors with KNN type in opensearch and using cosine similarity to find similar images. With the CLIP model you can search images by text and image both as input.
Can you tell me what's the difference? Also pros and cons of both?
You seem to know about these things and I'm new in AI
I think the main difference is that, if you do naïve cosine similarity you probably have to compare it with all images in the DB, hence O(n). In HNSW it becomes O(log(n)) since the embeddings data is "somewhat sorted".
@@deep.space.12 opensearch also uses hnsw
@@lain-401 thanks for the info
why would he use something worse thou ? (KNN) than a faster more efifcinat and scalable search algo like HSNW
Made it look so simple, I'm definitely trying this out this evening, thanks Jeff!
2:26 how to “run that command”? What command is he talking about??
Somebody please help
@@notalanjoseph - It's `docker-compose up` 😉
am getting an error that Meme class already exists while running the code. Buddy will you please help.
Yeah, vector databases have been around for quite a while, but the availability of open source vectorizer models made vector search more of a viable use case, so we're seeing more and more proper databases with easy to handle clients and all. Unfortunately, the biggest issue with vector search is not the database side of things. They're mostly all the same, relying on the same compression and distance computation algorithms. The big issue is vectorization. For some use cases like text or image similarity you can use huggingface models as-is. But for more business specific things, you'd have to train your own model and that is tough AF.
we're definitely not tired of the AI videos. Great as always
Looks like made by Fireship
I was also going to write that xD
I find them boring, but Jeff can do whatever he wants -- and he will always have my undivided attention because he is the electronic prophet we have all been waiting for
Keep em' comin;
I mean, I'm tired of hearing about AI in general, but Jeff can make basically anything fun and interesting so I don't mind regardless of what he makes his videos about.
It's actually a good topic, storing and processing data is still most important thing
I love the vector data base, how the data is stored in form of coordination with direction
There's absolutely nothing for you to apologise for. This is another great video. Please, keep 'em coming as always.
just imagine Jeff life without Javascript
He will be eff
@@xXxDerfoufixXx That's some next level programmer dad joke.
@@vhaangol4785 there are at least 3 jokes that make this a tor tier joke.
More of those please.
I appreciate the slightly slower pace of this video.
just fell over backwards watching every thing you said going over my head
Really appreciated mention of not sponsored*
After seeing the video on vector databases on the main channel, I just knew this was what was next.
Not tired of AI content at all. This is amazing. Thank you!
Thanks man! My company just changed my job role from a 'RoR developer' to an 'AI Engineer' and you have everything I need
Always remember Jeff, you are awesome and really helping Devs by introducing concepts.
Stay classy cheers.🎉
This is super cool! Like how easy it is to do those tasks nowadays
Got a kinda good idea for them, hopefully will get some time to play with them on work sometime soon. Good video as always!
Stop blowing my mind dude!! Too much power in our hands
Soon the own image boorus are going to be fantastic
Man i friggin love fireship
Love this kind of video. It gave me a lot of ideas
Please keep making these videos. They are non-stop, but so is the pace of AI right now, so they're the only thing helping me feel like I'm staying current.
JS churn -> AI churn -> panic attacks -> depression -> nirvana
Vectors are so 2022 Q4. Ive come up with trillion dollar concept. Text File Data. You will store all of your data in a single file where where each col will be separated by something like a tab code or comma and a row will be separated by a new line. Beta names are Tabbed Text Files or Comma Text Files.
Reply if you’re ready to invest.
You son of a bitch, I'm in!!!
i think that`s what people call csv files, python even has extension to manage these type of files. they can also be opened in ms excel
@@Yashss4617 r/whoosh
I like that the gif in the end was in Yellow stone. No reason.
I thought "in the next 5 minutes" is a joke relating to the complexity of a search engine. Yet here I am 5 minutes layer knowing all the steps. Wild times.
That was a cool video. I am new to this game and while watching this I get a feeling iofwow that is really cool thing you have shown and I have no idea what he is doing but would luv to see more.
Great video! Could you make a video about how we can crawl data on a knowledge base, transfer to vector databas and search with it?
Damn, I had started to really rely on Fireship for my AI news!
Can you do an episode about Medusa, the open source e-commerce js framework?
Yay, finally some docker on fireship 🎉
i um... kind of a beginner but never had trouble understanding your vids... this one was way beyond my lvl lol
i was building a meme search engine for fun and this killed it, i might apply this implementation
You should've used Rektor for your database.
I already invested in it. It's the next big thing.
Rektor? I hardly even knew her.
Thank God! Next videos will be on JS
Realizing you could use this to reverse Stable Diffusion. You feed it a prompt, it generates an image, and then your reverse image search returns the images it --stole-- "learned from" the most to create its output.
That's a pretty clever idea actually. I wonder if copyright holders will think of doing that.
Amazing work.
I find that personally very interesting. Next step I'm looking for would be to automatically tag images.
About AI, I'm happy that Fireship recon that he's talking a hecka lot about AI. But he's right. It's the new trend and it's not going to leave anytime soon. I'd just hope we can talk about it in a more neutral way than an AIpocalypse or an AIrmageddon.
This is amazing. Thanks.
I feel as though Yandex uses this model, but for some reason they rarely have the original content when you're looking for a source.
Top tier content found here.
Please do a version of this for text similarity search
MORE AI videos, mate! Your videos are always the most informative AND entertaining on the subject
Loved this video, the the test image got me ROFL
Could you do a similar tutorial for a full beginner in programing (a bit slower, a bit more steps to explain each concepts), to create this image search database, to retrieve only copyright free image / cc0? Nice tuto thought 👌
Man how did you learn all that? I mean how did you start your journey. Please make a video about your journey it would be very inspirational for many.
Read the docs
@@heroe1486 That is old school , now you just feed the doc and generate summary using AI.
@@FaisalAfroz Yes, this is the way.
when you get really popular, people approach you and tell you neat things in hopes you will make a video for them.
awesome tutorial !!!!!!
My man, you just did Web 3.0 dirty. Blockchain is epic and I hope to see some videos about it too! 💪
woah i never knew weaviate has its usefulness , i contributed in it for my gsoc but later switched to another org
Yes, I did pause to view and read each meme
It's time for a ai driven Javascript framework that runs on the edge😂
And be sure not to mix stuff up, a js driven ai Framework, would essentially be the apocalypse
lol
I love how you used the buzzwords "ai", "driven", "js", "framework" and "edge" to mispell "disaster"
@@vmbgifyput down the vodka
Or the Singularity. . . ?
I miss the good ol' days when the most stupid overused buzzword was "the cloud", like we all pretended nobody before 2007 had seen the word WAN on literally any network topography chart from the previous 30 years on it and instead insisted they were super original and clever and came up with something totally new.
😊nice video, I got an idea for my next project.
Can you train this on all my footage that I shot that I need indexed?
Could this be used to build a "hotdog or not hotdog" matcher?
00:29 does Leo have David Duke as a profile pic? lol
Good video but I really wish you would talk slower and add a longer gap ( maybe just a 0.5 to 1 second) between edit cuts. Please bear in mind that for many English is a 2nd language.
It is so frustrating having to constantly remind the video that I don't complete most of yours despite the great content. It's great that you pack so much info but it needs to be at a pace that can be more easily followed
Put the time stamps of what you don’t understand and I’ll transcribe it for you
Not even a little bit tired of the AI videos - if anything I'd love to see more tutorials
Seems Valid.
I'm not tired of the AI videos. Maybe a new channel: Fireship AI
if the ai videos help us make our jobs easier, then keep'em coming
Oof... I can follow this, but ill have no odea what I'm actually doing.
Still, this was a very good way to get your point across.
Does some notice that Jeff's voice sound different when the scene is switched in the video?
With the super AI we have these days it would be quite an easy step to implement just removing watermarks!
Gotta love how he reads comments e
Oooooh can I also use this to find similar images on my own computer despite resolution differences?
With your videos, you have established three major skills for AI devs: First AI APIs, then Prompt engineering and now Vector databases.
Please make a tutorial for vector DB and do the same with mongoDB at the same time
Then use CLIP to generate descriptions of the images you have and you've just generated your own map of your images without manual labour
Nice one!
It fells like a normal neural network with extra memory usage
As if databases weren't confusing enough
what if you took a text input and ran it through something like dalle to make it use a text input (which gets turned into an image under the hood)
Nice video
I was wondering when js started supporting top level await 😢
for as much as you talk about gpt, im surprised you didnt just get gpt to generate the entire thing
how do you know he didn't?
We have to update the class name again and again in index.js otherwise it is giving me error that you have already used this class name can any one help me to get rid of this problem!
Did you just b64'd different formats of images without converting them to a certain file format and then send them to the vector db? Congratz you just uploaded Spanish, English and French sentences to your db and queried if the English sentence will be the nearest match with the English sentence...
thank you
Very helpful and straightforward 。◕‿◕。
Hmmm. Im gonna try a duplicates detector
hey fireship what font do you use for your videos? it looks pretty clean
Damn, where is that animation from at t=30s :O I need to know because I am little data visualization nerd myself, uknow
this guy knows what he is talking about. Its like GPT but for images instead of tokenized letters, or words, or phrases, but is not generative. and does not have transforms xD but.. you get the point.
I'm trying to pick a vector database at the moment. Does anyone have any reason why you would pick, say, Pinecone over Weviate, or vice-versa?
Weaviate is open-source and can be self-hosted
I'm always just getting the same input image back as the result. What could be wrong?
How can I reach you to discuss an app ive been building at pre dev stage with some of this in mind? I'm utilizing Gans with scikit learn and image on cloud to train the part of my app that will use ai for some valuable specific purposes, but this exact content here is what i needed for my leg heavy work to compile and filter all the exact images I need around defined search. I'd love to get your help, even if general on this for putting my pieces together. Your expertise on all the tools, systems, and use cases are such a great resource Ive had the pleasure of finding. Thanks
Hey guys, he mentioned that these images will be converted to a numerical array (like a vector), to find similarities between different images. Do any of you know how exactly that works? What can you compare in a vector to another vector? Thanks.
In vector databases, images are converted into numerical arrays called vector embeddings. These embeddings encode features of each image as a 1-dimensional array of numbers. These embeddings can be compared to one another to determine visual similarity between them. The similarity between two vectors can be measured by calculating the distance between them. There are several distance metrics that can be used, such as Euclidean distance or cosine similarity. The smaller the distance between two vectors, the more similar they are considered to be.
- This answer was provided by the new Bing Chat, my new AI-powered copilot for the web
@@lance3301 Aw man, forgot I could just ask ChatGPT too. Thanks mate.
@@stevenparker2099 each number in the vector represents a different "attribute" as determined by the Img2Vec neural network. this is the "vectorizing" process. as long as the same vectorizing process is used on two different images, those vectors can be compared to one another. the exact attributes of an image used to determine "similarity" are dependent on the vectorizing process. once you have some vectors, you can measure the similarity between them the same way you would measure the distance between 2 points in space.
you could make your own vectorizing process where you just count the number of red and blue pixels in an image, this would turn the image into a 2D vector. for example [28, 400] would be an image with a lot of blue in it. if this were compared to another image that is mostly red such as [100, 35] then the distance between these would be 372 while a fellow blue image like [5, 200] would have a distance of 201 indicating it is more similar.
@@stevenparker2099 No worries mate. I keep forgetting too lol.
Not exactly same but you can find more info on prodramp yt channel. There he goes in depth about how to vectorise imagery ( 3part workshops)
don't stop with the ai news. young aspiring developers need to know as much as possible to use this info to our advantage
I am tired of the mainstream AI videos, your videos are always unique and intriguing.
and actionable. We get a glimpse of how it might work in production.
that's how facial recognition works. thank god for chatgpt to help me read debug errors.
I wonder if this will render dbs like Elasticsearch obsolete
Elasticsearch already implements vector searches, you can even combine them with a normal lexial search which is really helpfull if you have a semantical question with e.g. a named Entity.
Did I overlook something, or does the heavy lifting and real magic happen in the Img2Vec NN and not in the vector database?
the NN converts the image into a vector, the vector database stores and indexes that vector. both are doing some "heavy lifting" imo
@@rumfordc What are the challenges of storing a vector? What makes a vector database better than other databases in those areas?
Edit: I believe I get it, but I'll still appreciate your views if you want to share them.
One issue is that vectors are usually denser than other data. It's difficult to compress them with algorithms other DBMSs typically provide. Combine that with a billion vectors, and it gets challenging.
But what gets really difficult are searching through them (e.g., finding the nearest neighbors) and doing arithmetics on them. That's where vector databases shine.
@@Mr_Yeah Yea the primary challenge is finding the nearest neighbor, for fast searching. Without an index, finding the closest neighboring vector would require computing the distance to every other vector in the database. This takes way too long so vector indexes are used to "cluster" vectors together ahead of time to reduce the number of distances that need to be calculated.
The javascript frameworks were not what drew me to your content. It was the memes and the "you can't miss this" in tech. Just saying.
This here is interesting because databases, something traditionally boring, now get some attention due to this vector data and querying it.
Querying high dimensional vector spaces isn't trivial, if there's lots of data and you still need it to be fast. There is some seriously advanced theory and tech in there.
loved it!
You don't like builderpatterns? 😢