Vertex AI Matching Engine - Vector Similarity Search

ML Engineer

มุมมอง 14 609

189

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 5 ก.พ. 2025

ความคิดเห็น • 73

@jobiquirobi123 2 ปีที่แล้ว ⁺⁴
Nice tutorial. Matching engine is really promising but it does require some setup, I will try to reproduce this tutorial and see what happens.
@ml-engineer 2 ปีที่แล้ว ⁺²
See many customers moving to Matching Engine, they're all happy with it. Only time to update the index could be quicker. But I guess this depends on the requirements. Getting new embeddings into the index in real time is not possible. Though there are workarounds.
@rubenszimbres 2 ปีที่แล้ว
@@ml-engineer Sascha, do you think online inference is possible by running Cloud Run/ Cloud Functions on a Vertex AI endpoint, getting the embeddings and then submitting to Matching Engine ANN ? I was wondering if this may be a solution ....
@ml-engineer 2 ปีที่แล้ว ⁺¹
Hi Rubens
yes embeddings models are usually hosted with Vertex AI Endpoints or with Cloud Run (if you don't need a GPU).
After you did the inference you need to tell the Matching Engine Index to take the new embeddings / vector into account, stored on Cloud Storage. And that's currently the bottleneck as this indexing process takes quite a long time.
@alexchan5643 ปีที่แล้ว ⁺³
Thanks for the walkthrough. The documentation from GCP is quite messy
It doesn't seem to have great support for metadata filtering compared to other stores, only very basic operations. Any thoughts from your experience?
@ArmandoCuevas-sx5cf ปีที่แล้ว
I would like to know the answer to this one too. I don't see support for metadata as pinecone does.
@alexchan5643 ปีที่แล้ว
@@ArmandoCuevas-sx5cf Based on my further investigations over the past week the metadata filtering is restricted to string matching only with key/value pairs (so no comparators on numeric values) and the idea is to pair the matching engine IDs with another key-value store like Bigtable where you could possibly do further complex filtering-comparing this setup to to Pinecone or Qdrant and considering the costs, I don't think I would use Matching Engine
@ml-engineer ปีที่แล้ว ⁺¹
Hi Alex
Hi Armando
Matching Engine as Alex already said supports string matching on metadata.
cloud.google.com/vertex-ai/docs/matching-engine/filtering
Pinecone is indeed more flexible on this point.
@ArmandoCuevas-sx5cf ปีที่แล้ว ⁺¹
@@alexchan5643 thanks a lot, that's helpful and you're right having metadata filtering availablo is a big advantage for Pinecone.
@MOHAMMADAUSAF 11 หลายเดือนก่อน ⁺¹
Hey awesome starter, just a question, given i have a index created with a bucket, if i were to add new files to the same bucket, will the index reflect the new data files, either by itself or even by triggering ? or simply put, how can i add new data from a bucket to an existing index without rebuilding entire index again, something equivalent of pinecone or weaviate upsert functionalities ? the docs arent helping me here
@ml-engineer 11 หลายเดือนก่อน
Hi Mohammad
I can recommend to use Vertex AI Vector Search / Matching Engines streaming capabilities. This way you can simply send new data via SDK to the vector database.
Check out my sample repo to get you started
github.com/SaschaHeyer/Real-Time-Deep-Learning-Vector-Similarity-Search
It's the same process likes pinecones upsert.
@federicoph3407 2 ปีที่แล้ว ⁺²
Thank you for the tutorial!
Is it possible to choose the machine type? I tried with 100 vectors (94 kb), and in the endpoint's basic info I see machine-type: n1-standard-16. In the documentation it seems that there is a default machine based on shard size. The documentation says: "When you create an index you must specify the shard size of the index", but there is no parameter that refers to shard size during Index creation. There is also written "you can determine what machine type to use when you deploy your index" but, same as before, there is no parameter that refers to machine-type. I am a bit confused :/
@federicoph3407 2 ปีที่แล้ว
documentation: matching-engine -> create-manage-index?hl=en#create-index
@ml-engineer 2 ปีที่แล้ว
Hello Federicoph,
That is indeed a good question that is not covered in the video nor the article =).
The machine-type can be defined when deploying the index. Like mentioned in the documentation deploy_index
But if you actually check the the gcloud command there is nothing documented
cloud.google.com/sdk/gcloud/reference/alpha/ai/index-endpoints/deploy-index
So I always fall back to the actual implementation and there you can see the deploy_index method is indeed accepting a machine type.
See here
github.com/googleapis/python-aiplatform/blob/90bb8ef3d675af62b7cc1f0d2fdf99b476e8dde5/google/cloud/aiplatform/matching_engine/matching_engine_index_endpoint.py#L542
In your use case you can set it to the smallest machine. Also because you only have 100 vectors I recommend to use the brute force algorithm.
Let me know if that helps.
@ml-engineer 2 ปีที่แล้ว
Quick appendix
This is reflected in the API documentation as well
cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.indexEndpoints/deployIndex
see the request body
cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.indexEndpoints#DeployedIndex
especially this part
cloud.google.com/vertex-ai/docs/reference/rest/v1/DedicatedResources
@ZoeBraddock-t6x ปีที่แล้ว ⁺¹
Amazing thank you!
I'm really keen to see that video about how to use Cloud Run to make the Vertex AI Endpoint more accessible, did you end up making that video?
@ml-engineer ปีที่แล้ว
Hi
Google released public endpoints it is no longer required to use a VPC network. Therefore you not necessarily need a Cloud Run service in front.
Here is the documentation for the public Matching Engine endpoint:
cloud.google.com/vertex-ai/docs/matching-engine/deploy-index-public
In case you are still interested in the Cloud Run approach here I have a sample implementation for a image similarity matching solution
github.com/SaschaHeyer/image-similarity-search/tree/main/query-service.
The critical part is in the cloudbuild.yaml that contains the reference to the VPC network github.com/SaschaHeyer/image-similarity-search/blob/main/query-service/cloudbuild.yaml
Let me know if that helps
@tyronehou3553 ปีที่แล้ว ⁺¹
Great tutorial! Can you update algorithm parameters like leafNodeEmbeddingCount and leafNodesToSearchPercent on the fly? I tried using the gcloud update index command, but nothing changes when I describe the index afterward, even when the operation is complete
@ml-engineer ปีที่แล้ว
Hi
no they can only be set during index creation it is not possible to update them. That's because a update would require a full index re-build which in the end is the same as creating a new index.
@ramsure9246 ปีที่แล้ว ⁺¹
Thanks for tutorial. Is there any Langchain compatible retriever for this matching engine index ?
@ml-engineer ปีที่แล้ว ⁺¹
Yes there is langchain support for Matching Engine. The Google team implemented it a few weeks ago.
github.com/hwchase17/langchain/pull/3104
Currently writing an article on it that will be published in the next days.
from langchain.vectorstores.matching_engine import MatchingEngine
vector_store = MatchingEngine.from_components(
index_id=INDEX_NAME,
region=MATCHING_ENGINE_REGION,
embedding=embeddings_llm,
project_id=PROJECT_ID,
endpoint_id=ENDPOINT_NAME,
gcs_bucket_name=DOCS_BUCKET)
relevant_documentation=vector_store.similarity_search(question, k=8)
@anjanak8303 2 ปีที่แล้ว ⁺¹
Thank you for the tutorial. With the avro format there is an allow and deny option that you can set for the embeddings inserted. There is little documentation as to how to use this in a query. Could you help with this?
@ml-engineer 2 ปีที่แล้ว ⁺¹
Hello Anjana
are you refering to the filtering functionality?
cloud.google.com/vertex-ai/docs/matching-engine/filtering
@anjanak8303 2 ปีที่แล้ว
@@ml-engineer Yes, the same. Could you tell me how to incoorporate that into a query? I have an idea on how to have it inserted in the index, but would be good if you could give a clarity there as well. Thanks for replying :)
@ml-engineer 2 ปีที่แล้ว ⁺¹
Got it. Yeah it's not well documented. But if you check the proto file you can get an understanding on how you can use it when querying the matching engine.
As simple as applying it to your query request
namespace = match_service_pb2.Namespace()
namespace.name = 'color'
namespace.allow_tokens.append('red')
request = match_service_pb2.MatchRequest()
request.deployed_index_id = DEPLOYED_INDEX_ID
request.restricts.append(namespace)
@anjanak8303 2 ปีที่แล้ว ⁺¹
@@ml-engineer This worked!! I had not looked at the proto file in much detail, thank you so much😃
@ml-engineer 2 ปีที่แล้ว ⁺¹
@@anjanak8303 perfect. I add this to the article I hope we can help more people having the same question.
@akarshjainable 2 ปีที่แล้ว ⁺¹
where did you mention the schema of the data file(the one with input embedding vector)?
@ml-engineer 2 ปีที่แล้ว
What do you mean with schema? The Matching engine does not need a schema as we just provide the embeddings.
Can you rephrase 🙂 in case I misunderstood your question.
@akarshjainable 2 ปีที่แล้ว ⁺¹
@@ml-engineer aah got it , so the embedding input vector file has to be in the format{"id":"string","embedding":[vector]}
@ml-engineer 2 ปีที่แล้ว
are you referring to this section of the video?
th-cam.com/video/KMTApM5ajAw/w-d-xo.html
@akarshjainable 2 ปีที่แล้ว
@@ml-engineer yes precisely.
@ml-engineer 2 ปีที่แล้ว
Yes exactly. Alternative file formats are CSV or AVRO
@nooralsmadi5017 2 ปีที่แล้ว ⁺¹
Hi ,
How can I make it work from outside the network?, I mean send a request and get a response from out side the network ?
@ml-engineer 2 ปีที่แล้ว ⁺²
To send your request to the Matching Engine you need to be "inside" of the network. This can be complicated if you want to integrate it into an service that is running outside of that network.
There is one simple approach that I really like. You can implement a Cloud Run Service that is part of the VPC network and takes your requests. This Cloud Run service can be also reached from outside the network.
I have implemented exactly that in one of my other articles
medium.com/google-cloud/recommendation-systems-with-deep-learning-69e5c1772571
@LucasGomide ปีที่แล้ว ⁺¹
Great content. Can u tell me about some alternatives? I am studying some options such as using pgvector with some model to generate embedding VS matching engine.
I would like to understand pros /cons about those approaches
@ml-engineer ปีที่แล้ว ⁺¹
Hi Lucas
Pinecone is also a highly recommended product. Or you can go open source with Faiss or Annoy but this requires you to take care of the infrastructure yourself.
If you want a similarity search I recommend to either go with Matching Engine or Pinecone.
@kadapa-rl6jg 2 ปีที่แล้ว ⁺¹
Hi,
Can you please help me understand how to orchastretate vertex AI through cloud composer
@ml-engineer 2 ปีที่แล้ว ⁺¹
Hi
I have written a comparison article between Cloud Composer and Vertex AI Pipelines to orchestrate ML pipelines.
medium.com/google-cloud/vertex-ai-pipelines-vs-cloud-composer-for-orchestration-4bba129759de
In general, if you want to use Vertex AI's capabilities as part of Cloud Composer, you can simply use the Vertex AI SDK as part of your composer tasks.
Though I would highly recommend switching to Vertex AI Pipelines:
th-cam.com/video/gtVHw5YCRhE/w-d-xo.html
@kadapa-rl6jg 2 ปีที่แล้ว
@@ml-engineer my requirement is to orchastretate vertex ai pipeline through cloud composer via Terraform code
@majidalikhani2765 2 ปีที่แล้ว ⁺¹
Hey what is the parameter that decides the number of neighbours returned? I tried changing num_neighbours to no avail. it only returns 10 neighbours
@ml-engineer 2 ปีที่แล้ว ⁺¹
Hi Majid
You can define the number of neighbors you you want to retrieve when calling the matching endpoint
response = my_index_endpoint.match( deployed_index_id=DEPLOYED_INDEX_ID,
queries=..., num_neighbors=NUM_NEIGHBOURS )
@majidalikhani2765 2 ปีที่แล้ว
@@ml-engineer But in this tutorial you don't query this way. Instead match_service.proto is used which has a field num_neighbours = 3. But always returns 10 neighbours
@ml-engineer 2 ปีที่แล้ว ⁺¹
@@majidalikhani2765 yes Google changed the way to get matching results, since o released the video. No need for complex .proto file handling anymore. Just us the SDK in the same way like creating the index much easier.
@ml-engineer 2 ปีที่แล้ว ⁺¹
Will add the new way to the notebook in the next few days and publish an additional video.
@majidalikhani2765 2 ปีที่แล้ว ⁺¹
@@ml-engineer Google's documentation very poor smh. I got it working via the sdk. thanks
@akarshjainable 2 ปีที่แล้ว ⁺¹
Can I do a batch prediction on index, if Yes , Do I need a vpc network for that?
@ml-engineer 2 ปีที่แล้ว
You need a VPC network this is a requirement to run queries against the index.
Batch prediction over the complete index is not possible. This is due to the nature of the index you only get the k_nearest neighbors.
@akarshjainable 2 ปีที่แล้ว ⁺¹
probably getting a bit greedy here, do you have plans to upload tutorial on two tower?
@ml-engineer 2 ปีที่แล้ว
no worries, love all the comments here on youtube.
Yes I release an article next week
It's a deep dive on how to use the two-tower algorithm + Matching Engine + Vertex AI Pipelines to build a Deep Learning Recommendation Engine.
@akarshjainable 2 ปีที่แล้ว ⁺¹
@@ml-engineer Thanks a ton
@ml-engineer 2 ปีที่แล้ว
The article is published
medium.com/google-cloud/recommendation-systems-with-deep-learning-69e5c1772571
@AyushMandloi ปีที่แล้ว ⁺¹
What is need to endpoints ?
When u will be uploading more videos ?
@ml-engineer ปีที่แล้ว
Hi Ayush
what do you mean with your endpoint question?
Recording 4 new videos about Generative AI on Google Cloud at the moment will be released in the next weeks.
@elijahdecalmer613 2 ปีที่แล้ว ⁺¹
you are a legend
@ml-engineer 2 ปีที่แล้ว
¯\_(ツ)_/¯
@elijahdecalmer613 2 ปีที่แล้ว
Excuse me, you briefly mention that there are workarounds to simulate real time indexing. Could you explain the options for this? Or point me to some docs. Beginner trying to work it out for a project :)
@ml-engineer 2 ปีที่แล้ว ⁺¹
The feasibility of the solution depends on the number of new vectors you get between the indexing updates.
You store the vectors that need to be indexes in the next index update round in Memorystore for fast millisecond access.
Build a for example Cloud Run application that takes the vectors from Memorystore and calculate the distance yourself (it's just simple math). The same Cloud Run application also calls the Matching Engine. And in the end you combine the results if the distance is in your desired range.
On long term I hope for quicker index updates using GPUs.
@ml-engineer 2 ปีที่แล้ว
@@elijahdecalmer613 Google added streaming support which makes it easier to get new vectors into the index.
@ahmedmansouri2054 ปีที่แล้ว ⁺¹
@@ml-engineer if I want to update the new indexes in real-time can I just add new files in the GCS folder where your vector data is stored or do I have to add it Programmatically?
@niladrishekhardutt 2 ปีที่แล้ว ⁺¹
Great tutorial! How does the deny list work?
Let's say I have a class fruit which will ONLY have deny list tokens (no allow) such as "apple", "mango", etc. How do I filter out "mango" in the query (search all fruits except mango)?
I have tried the following method but it does not work as expected
json
{"id": "1", "embedding":[0.002792,0.000492], "restricts": [{"namespace": "fruit", "deny": ["mango"]}]}
query
deny_namespace = match_service_pb2.Namespace()
deny_namespace.name = "fruit"
deny_namespace.deny_tokens.append("mango")
request.restricts.append(deny_namespace)
@ml-engineer 2 ปีที่แล้ว
Hello Niladri
thanks a lot.
(Anjana in the comments had a similar question about allow tokens.)
Your JSON and query are definitely correct. I don't see any issues here.
Did you make sure to update the index after adding the restricts filter into the JSON?
@niladrishekhardutt 2 ปีที่แล้ว
@@ml-engineer Hey
Thanks for the quick reply. Yes, I have completely overwritten the index twice now (just to be sure) but it still doesn't seem to work. Is there any requirement for the token to be on the allow list as well?
@ml-engineer 2 ปีที่แล้ว ⁺¹
Deny alone is without allow possible.
see documentation:
cloud.google.com/vertex-ai/docs/matching-engine/filtering#denylist
{} // empty set matches everything
{red} // only a 'red' token
{blue} // only a 'blue' token
{orange} // only an 'orange' token
{red, blue} // multiple tokens
{red, !blue} // deny the 'blue' token
{red, blue, !blue} // a weird edge-case
{!blue} // deny-only (similar to empty-set)
See the following description:
When a query denylists a token, matches are excluded for any datapoint that has the denylisted token. If a query namespace has only denylisted tokens, all points not explicitly denylisted, match, in exactly the same way that an empty namespace matches with all points.
So the issues has to be somewhere else
@niladrishekhardutt 2 ปีที่แล้ว
@@ml-engineer Unfortunately, this does not seem to be working :(
I have looked at my JSON multiple times now and tried different variations but it still fails. Do you have any ideas?
@federicoph3407 2 ปีที่แล้ว ⁺¹
Hi @Nilandri, did you solve your problem?
If yes, can you explain how please?
If no, I got the same problem with the allow-list-tokens, I opened an issue on github and on googlecloudcommunity.
Thank you in advance!
@ML Engineer

ต่อไป

เล่นอัตโนมัติ

Vertex AI Pipelines - Setup Monitoring and Alerting