18 Months of Pgvector Learnings in 47 Minutes (Tutorial)

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ธ.ค. 2024

ความคิดเห็น • 22

  • @NatColley-t4z
    @NatColley-t4z หลายเดือนก่อน +5

    Excellent, excellent, excellent. It does even more than I had merely hoped for. Forgive me, postgres, I should never have doubted you.

  • @theointechs
    @theointechs หลายเดือนก่อน +2

    Massivelly underrated video! So much valuable info, thank you so much!

    • @TimescaleDB
      @TimescaleDB  13 วันที่ผ่านมา

      Thanks! Glad you found it helpful.

  • @gauthamvijayan
    @gauthamvijayan 8 วันที่ผ่านมา

    With this single video - I was able to understand what I need to become a AI Engineer by leveraging PostgresSQL extensions and vector databases and then to consume them in my React/React Native Applications.
    Thanks a ton for making these videos.
    The instructor needs a raise for putting everything so well together.

    • @TimescaleDB
      @TimescaleDB  5 วันที่ผ่านมา

      That's awesome - thanks for sharing! Glad we could help.

  • @dbanswan
    @dbanswan 2 หลายเดือนก่อน +3

    Amazing video, learnt a lot. Will make time to read timescale blog regularly.

    • @TimescaleDB
      @TimescaleDB  2 หลายเดือนก่อน +1

      Thanks! much appreciated

  • @BruntPixels1234
    @BruntPixels1234 2 หลายเดือนก่อน +9

    You should do more tutorials like these

    • @TimescaleDB
      @TimescaleDB  2 หลายเดือนก่อน

      What additional topics would you like to see? Let us know and we can make it happen.

  • @ram8849
    @ram8849 หลายเดือนก่อน

    Hi, your presentation gives me a clear idea of vector DB (I am new to it). May I ask a question about the example in 18:03
    If I understand whats going on correctly, you are encoding every row's columns (or their combinations) to vector data type, and then the same to the verbose text query using llm model text-embedding-3-small, so you can compare them with cosine similarity and output the top result, therefore we can get data in the columns and send them WITH the original verbose text query to the llm as usual.
    1. Is this the idea of what RAG does?
    2. If so, since what is being stored in the row/columns are raw data (string/int whatever), for example a date could be a expiry date of a password/member since/birthday/etc anything, should we embedded the original data directly, or turn them verbose first BEFORE encoding to vector, in order to get a better result? Or depends on the situations? For example
    sex, age, verbose description, embedded(verbose description), embedded([sex, age])
    m, 18, "this is a man in age 18", [0.01, .......], [.........val in vector]

  • @awakenwithoutcoffee
    @awakenwithoutcoffee 2 หลายเดือนก่อน +1

    lovely like usual , there is indeed allot to learn but were getting closer :) Bless you,
    ps. Regarding storing structured and unstructured data in the same table : are you using the technique to store complete structured tables inside a JSONB ?
    We thought about this approach but dropped it in favor of separating structured from unstructured data to prevent mis-matching and allow for better isolation/scaling.
    Still experimenting but currently our set-up creates 1 table per structured document and infer the SCHEMA dynamically on upload + the embedding. Than the Agent decides on run-time which tables to query on. Unstructured documents can be bundled together more easily but placing all document types together can give false-positive search results ?

  • @renobodyrenobody
    @renobodyrenobody หลายเดือนก่อน +1

    Mmm... Old school engineer here, spent more 30 years with db systems. And now I understand I don't want a black box RAG system but I want to implement AI stuff with PG ! For me there is a little thing you can do better: add some examples for retrieving data without and with the vectors, especially when there is a where clause. Other than that, your video is a big source of inspiration. Thanks a lot.

    • @TimescaleDB
      @TimescaleDB  13 วันที่ผ่านมา +1

      Thanks for the feedback!

  • @renobodyrenobody
    @renobodyrenobody หลายเดือนก่อน +1

    Well, after trying the whole thing I think the caveat here is to use pgai that depends on OpenAI: this is not local, you have to pay for the tokens, your data are going away and it is a black box. So I found another way, coding some functions locally in Postgres to use ollama locally with local models: no privacy leak or data leak, no token cost. This is what I understood but I am a rookie.

    • @TimescaleDB
      @TimescaleDB  13 วันที่ผ่านมา +1

      pgai supports Ollama so you can use local models for greater privacy and lower cost. Check out the Github repo for more. The example used in the video is with OpenAI, but pgai also supports Ollama, Cohere, and Anthropic models too.

  • @afaha2214
    @afaha2214 หลายเดือนก่อน +1

    what is the postgres sql client being used? looks like supabase

    • @jroy3427
      @jroy3427 หลายเดือนก่อน +1

      PopSQL, it was acquired by Timescale a few months ago

    • @TimescaleDB
      @TimescaleDB  13 วันที่ผ่านมา

      It's PopSQL

  • @renobodyrenobody
    @renobodyrenobody หลายเดือนก่อน

    Also, where is StreamingDiskANN come from? Seems only IVFFLAT and HNSW are here but diskann SQL Error [42704]: ERROR: access method "diskann" does not exist! Ha, got it: pgvectorscale !

    • @TimescaleDB
      @TimescaleDB  13 วันที่ผ่านมา

      Correct, install pgvectorscale and you can access the StreamingDiskANN index

  • @SageRap
    @SageRap 2 หลายเดือนก่อน +1

    Appreciate the video. Just FYI, you're pronouncing the word "build" like "bulled" throughout the video, but most native speakers pronounce it like "billed"

  • @orenmizr
    @orenmizr หลายเดือนก่อน

    give me more videos like this please : )