- 41
- 66 185
Denis Magda
เข้าร่วมเมื่อ 14 ธ.ค. 2011
The place where application developers master databases
Most and Least Obvious PostgreSQL Extensions for Developers
Postgres is known for its broad ecosystem of extensions that expand the database's capabilities beyond the core RDBMS use cases. In this video, we look into the top five Postgres extensions for application developers.
We'll start with the most apparent extensions and go down to the least obvious ones.
0:00 Intro and application template
3:20 pgvector
6:48 plv8 and pgcompute
13:11 foreign data wrappers
14:12 pgmq
17:32 pg_anonymize
Follow this step-by-step guide to explore the extensions on your own:
github.com/dmagda/DevMastersDb/blob/main/postgres/top_five_postgres_extensions_for_developers.md
Interested in diving deeper into databases? Stay connected with me here:
* Twitter: denismagda
* Medium: medium.com/@magda7817
We'll start with the most apparent extensions and go down to the least obvious ones.
0:00 Intro and application template
3:20 pgvector
6:48 plv8 and pgcompute
13:11 foreign data wrappers
14:12 pgmq
17:32 pg_anonymize
Follow this step-by-step guide to explore the extensions on your own:
github.com/dmagda/DevMastersDb/blob/main/postgres/top_five_postgres_extensions_for_developers.md
Interested in diving deeper into databases? Stay connected with me here:
* Twitter: denismagda
* Medium: medium.com/@magda7817
มุมมอง: 368
วีดีโอ
Will Postgres Use My Index or Not?
มุมมอง 3628 หลายเดือนก่อน
Will Postgres use my index? This simple question doesn't have a straightforward answer. The answer is, "It depends." In this video, we take a question from the Postgres subreddit and explore a few scenarios. Plus, at the end, I’ll provide some excellent resources for a deeper dive into the indexing capabilities of Postgres. 0:00 Question from the Postgres subreddit 1:14 Experimenting with 100 r...
Building and Monetizing ChatGPT Plugins With Spring AI and PostgreSQL pgvector
มุมมอง 5938 หลายเดือนก่อน
In this hands-on session, you will learn how to create custom ChatGPT plugins using Spring AI with PostgreSQL pgvector and launch the plugins on the OpenAI marketplace. We'll start with the basics, developing a RAG-based gen AI application that uses an LLM, PostgreSQL pgvector, and Spring AI to search for information requested by users. Then, we'll introduce capabilities that will let the AI pe...
Three Killer PostgreSQL Extensions for Gen AI Apps
มุมมอง 7138 หลายเดือนก่อน
Learn how to create generative AI applications that perform the vector similarity search within the database from start to finish using three extensions of the PostgreSQL ecosystem. We'll use PgCompute to create and execute application logic on the database. This logic will use pg_vectorize to generate an embedding for user prompts, and then pgvector will use the embedding to perform the vector...
PostgreSQL Internals in Action: Shared Memory and Buffers
มุมมอง 1.6K9 หลายเดือนก่อน
PostgreSQL uses shared memory to expedite access to the database by caching data in memory and to coordinate the execution of requests from multiple standalone Postgres backends/sessions. In this episode, we're looking into shared buffers, one of the areas of shared memory that holds table data for read and write operations. 0:00 Shared memory overview 5:43 Shared buffers in action Want to repr...
Distributed PostgreSQL Essentials for Developers: Hands-on Course
มุมมอง 7709 หลายเดือนก่อน
Learn the essential capabilities of distributed PostgreSQL deployments by building a scalable and fault-tolerant movie recommendation service where users tap into the generative AI capabilities of PostgreSQL and OpenAI to search for their favorite movies, series, and live events. The course comprises five chapters. You'll start by deploying the app on a single-server PostgreSQL instance and the...
Why Has Figma Reinvented the Wheel With PostgreSQL?
มุมมอง 3.2K9 หลายเดือนก่อน
The Figma team released an article explaining how they created a custom sharding solution for their PostgreSQL deployment. They evaluated several options from the PostgreSQL ecosystem that make the database horizontally scalable and distributed, but still went forward with their own implementation. Let's analyze why Figma decided to reinvent the wheel and what the future looks like. Will the co...
Using Distributed PostgreSQL as a Kubernetes Backend
มุมมอง 8969 หลายเดือนก่อน
Kubernetes uses etcd to store all cluster information, configuration of resources, runtime data, and other deployment-specific data. If you ever need to replace etcd with another database, you can use Kine, which serves as an external etcd endpoint, translating Kubernetes etcd requests into SQL queries for an underlying relational database. In this video, you will learn how to deploy Kubernetes...
Creating AI Data Analyst With DBeaver
มุมมอง 5109 หลายเดือนก่อน
In this video, I'll walk you through the steps of creating an AI data agent and analyst using DBeaver Team Edition. The analyst will understand user questions in various languages, generate SQL queries based on those questions, and execute these SQL queries on databases to retrieve relevant business data. 0:00 Requirements for the AI data analyst 1:12 Exploring the AI Chat of DBeaver 5:04 Overv...
Generating Embeddings with Spring AI: Step-by-Step Guide with Pitfalls
มุมมอง 1.3K10 หลายเดือนก่อน
Imagine you have some data and are eager to build your first generative AI application with Spring AI. How exactly do you generate embeddings? Which data should you use for the similarity search? Which embedding model should you choose? This video walks you through the process of creating a custom embedding generator using Spring AI. You'll see all the steps without shortcuts and learn how to a...
PostgreSQL Internals in Action: MVCC
มุมมอง 1.9K10 หลายเดือนก่อน
PostgreSQL uses MVCC (Multi-Version Concurrency Control) to execute transactions and queries in parallel over a consistent view/snapshot of data. Each user record may exist in multiple versions simultaneously, with each version visible to a particular set of transactions. You'll learn how MVCC functions internally by running various hands-on experiments. 0:00 Introduction to PostgreSQL MVCC 0:5...
Streaming the Super Bowl: The Art Of Scaling Across Multiple Cloud Regions
มุมมอง 28910 หลายเดือนก่อน
The 2024 Super Bowl streamed live on Paramount , a fast-growing streaming platform that attracts millions at peak times to watch their most-anticipated events, movies, and series. To manage the large volumes of traffic and data, the Paramount Tech team had to learn the “art of scaling” across multiple cloud regions. Since the migration to a multi-region architecture, the streaming platform has ...
Custom GPTs: Enhancing AI with Your APIs and Data
มุมมอง 63211 หลายเดือนก่อน
OpenAI has introduced support for custom GPTs, allowing you to create and launch your augmented version of the Large Language Model (LLM) using third-party data and APIs. In this video, you'll learn how to create a custom GPT from scratch and integrate it with your own RESTful API endpoint and vector database. 0:00 Sample application overview 0:55 Testing the application APIs locally 2:35 Check...
AI Meets SQL: Developing AI Database Agent in 10 Minutes
มุมมอง 8K11 หลายเดือนก่อน
Learn new tech by building a simple AI-powered SQL agent for your favorite SQL database. This agent utilizes LangChain to create a flow that takes user questions in plain English, then uses an LLM (Large Language Model) to generate a SQL request. It executes the request on your database and then uses the LLM again to respond as a human would, or to convert the response into a JSON object for yo...
PostgreSQL pgvector for Python developers: Practical Guide
มุมมอง 7K11 หลายเดือนก่อน
PostgreSQL pgvector for Python developers: Practical Guide
Spring AI With PostgreSQL pgvector: Building Generative AI Apps in Java
มุมมอง 9K11 หลายเดือนก่อน
Spring AI With PostgreSQL pgvector: Building Generative AI Apps in Java
Getting Started With PostgreSQL in Docker: From Setup to Sample App
มุมมอง 997ปีที่แล้ว
Getting Started With PostgreSQL in Docker: From Setup to Sample App
Getting Started With PostgresML: Text Translation, Sentiment Analysis and More Within Postgres
มุมมอง 1Kปีที่แล้ว
Getting Started With PostgresML: Text Translation, Sentiment Analysis and More Within Postgres
Generate Database Schema From Raw Data With DBeaver...And Discover Something About Pablo Escobar
มุมมอง 411ปีที่แล้ว
Generate Database Schema From Raw Data With DBeaver...And Discover Something About Pablo Escobar
Database Connection Pooling: Why It Matters? Essential OS-Level Insights
มุมมอง 1.6Kปีที่แล้ว
Database Connection Pooling: Why It Matters? Essential OS-Level Insights
Microsoft Azure AI Extension for PostgreSQL
มุมมอง 720ปีที่แล้ว
Microsoft Azure AI Extension for PostgreSQL
PostgreSQL as a Vector Database: Part 3, Making Postgres Distributed and Scalable
มุมมอง 1.1Kปีที่แล้ว
PostgreSQL as a Vector Database: Part 3, Making Postgres Distributed and Scalable
PostgreSQL as a Vector Database: Part 2, Using HNSW Index
มุมมอง 2.5Kปีที่แล้ว
PostgreSQL as a Vector Database: Part 2, Using HNSW Index
Amazon Time Sync Service and Distributed Databases
มุมมอง 585ปีที่แล้ว
Amazon Time Sync Service and Distributed Databases
PostgreSQL as a Vector Database: Part 1, Getting Started With pgvector
มุมมอง 11Kปีที่แล้ว
PostgreSQL as a Vector Database: Part 1, Getting Started With pgvector
Low-Latency Design Patterns for Multi-Region Relational Databases
มุมมอง 569ปีที่แล้ว
Low-Latency Design Patterns for Multi-Region Relational Databases
Sample Data Generation With Built-In Database Capabilities
มุมมอง 133ปีที่แล้ว
Sample Data Generation With Built-In Database Capabilities
Best Way to Count the Total Number of Rows in a Table: COUNT(*) vs. COUNT(1)
มุมมอง 285ปีที่แล้ว
Best Way to Count the Total Number of Rows in a Table: COUNT(*) vs. COUNT(1)
Thank you
Great video
Glad you liked it!
So why dont we use gpu as cpu ? 😂
Really informative video, keep up the good work✴
Thanks, glad you found it helpful!
ok I fall in love with the course ! thank you man !
Glad you liked it, buddy!
I really like how you’re combining both theoretical concepts and practical concepts. Looking forward to more videos like this.
Thanks a lot for your feedback! It matters a lot to me and encourages to keep working on the content 👍
Excelet! thanks alot!
You're welcome!
Awesome!
Keep it up man! nice video
I’ll do my best to produce more videos in the future. As of now, I have to focus on a book about Postgres ;)
Thank you man.... It's a great help
Glad it helped, my friend!
It's very helpful, thank you brother
You’re welcome, my friend!
cool videos! how can i contact you for a consultation?
Can I ask you a couple of questions on Telegram?
Super cool! Thanks for the video! I wonder - would it be possible to have english prompts in a QGIS environment which connects to your postgres database via PostGIS? i.e you make the prompt (in written english) and the results of the prompt appear within your QGIS screen... 🤔
@@KayleighBarrow-z9b thanks for the feedback! I’m not aware of the QGIS capabilities but you can certainly run an instance of Postgres with both PostGIS and pgvector extensions. QGIS will be able to connect and then you can explore if the tool allows to write and send the prompts to Postgres
Thanks for the information.
excelent content, thanks for sharing❤🔥
Thanks, such feedback matters a lot to me!
Thanks a lot, very helpful video with crisp and to the point information. One query i had, would be helpful if you can advise. Isn't a shared memory where multiple database backend processes handling respective client connections and using this shared memory to storedl client database data a security risk ? Considering that in case of some mishandling of shared memory, one compromised backend process can read or modify data of another database handled by another backend process ?
If you're talking about a database object that is created by the CREATE DATABASE command, then the access is controlled by the roles created for applications. For instance, if database_app1 and all its data can be accessed only by app1, then there needs to be a role that has required level of access to the database. All other roles and other apps will be prohibited from connecting and accessing database_app1. See these docs for details (if it's new to you): www.postgresql.org/docs/current/user-manag.html Shared memory acts transparently for the applications and the application layer has no access to it, it's just internal Postgres server's component. If the role used by app1 for database_app1 access is hacked (for instance, bad actors now know credentials of the role), then the bad actors can access database_app1 only and steal the data. But those actors won't have access to other database...unless the hacked role had superuser privileges. Overall, don't use the default "postgres" role at the application layer because the role has superuser access to the database server. Create application specific roles with fine-grained permissions for every app and you'll have a piece of mind in production.
Very well explained. 👏
@@savithweraniyagoda1297 thanks, glad you liked it!
Thanks bro, beautifully explained!
Thanks buddy 👍 Glad you liked it
dude, your eyes you must be high as a kite
Thanks a lot
you're welcome, I'm glad you like the video!
Awesome content. Thanks man
Thanks! Glad you liked it my friend
Thanks for the great work demystifying pgvector, I was very worry as I had to work on a new project with pgvector but now dependencies seems quite clear as the business scenarios you exposed. Thanks!
Glad you found it helpful! Have fun building gen AI apps with pgvector!
Thank you in advance! I'm really starting my journey in distributed databases but I don't have any knowledge in AI fields , will the course suit my case?
The app from that course uses vector similarity search, but still the primary focus is scalability, high availability, and geo-distribution of databases. Check it out: th-cam.com/video/rqJBFQ-4Hgk/w-d-xo.htmlsi=7o-ARIbp09LHl_RR If you're interested in exploring Postgres as a vector database, then take a look at a collection of my videos about pgvector: th-cam.com/video/RFnZB76KVWk/w-d-xo.html
Great, Thank you
You’re more than welcome!
quite interesting topic! it would be interesting to see dynamic query rewriter for SQL queries + PgVector search, specifically for RAG applications. The downside of using traditional filtering/search seems to be automatic query construction that is aware of the existing database options to filter on. To scale to different types of databases (different clients) we might need to customize our code to match domain specific knowledge which ideally we wouldn't need to since this is labor expensive. Automatic schema creation for optimal querying seems to be a topic that needs more research .
Im getting this error while running the cell where we are establishing a connnection with psycopg. OperationalError: connection to server at "localhost" (::1), port 5432 failed: FATAL: password authentication failed for user "postgres" - i havent changed the password, db or host in the code. Any idea why this is happening or how to fix this? Sorry Im a beginner and doesnt have much ideas to troubleshoot.
Could you confirm the database container started successfully in Docker? Check the preceding "Start PostgreSQL With pgvector" section that starts the database using a docker compose command and then tries to connect to it.
@@DevMastersDb i searched a bit online had tried docker ps in the terminal, which shows that the container is running, its also showing as running in docker desktop, then i thought maybe its because i have pgadmin installed on my device that this happens and tried changing the port to 5433 in dockerfile, but still the error persisted.
@@AI.ML.DataScientistWork does this command work for you? docker exec -it postgres-pgvector psql -U postgres Just want to make sure you can connect from within the Postgres container using the psql tool.
@@DevMastersDb I tried a lot of things but the issue was never resolved. Finally I uninstalled the local installation postgresql and then the docker instance connected without any hiccups both on pgadmin as well as in the code. Any Idea what could have been the reason?
@@AI.ML.DataScientistWork I have no idea, buddy, what was happening. Do you run on Windows? Might it have been something Widows specific? stackoverflow.com/questions/71984505/postgres-local-installed-instance-interfered-with-docker-instance
Thank you Denis from Brazil 😉.
You're more than welcome! ;)
It seems like the Vector extension is no longer available in azure.extensions. I tried this with my work instance and then setup my own personal instance and it's not available in either. Would be interested if anyone knows anything about it as I haven't found this topic in any forums
I'm loving these videos, thank you so much!
Hey, thanks a lot for sharing feedback! It means a lot to me.
hi denis..need more videos on azure ai use cases in postgres flexible server. please assist
Hey, sounds good. I’ll keep this in mind. Any specific use cases?
@@DevMastersDb sentiment analysis
Great explanation. Thank you
Glad it was helpful!
A very good video. Clearly explained. Have you thought about becoming a teacher? Thank you very much!!
Thanks, I'm glad you liked it! Teaching is one of the main reasons I work as a Developer Relations professional. In this role, I can spend a lot of time sharing my knowledge with others and being paid for that ;)
What is your recommendation for very large vector column/table? The value of ‘m’ and ef_construction
It depends on the target recall (search accuracy/relevancy) and performance for the similarity searches. The greater the `m`, the better recall, but the index build time increases, and query performance might also be impacted. Increasing the `ef_construction` can lead to better recall, but will also increase index build time. I would start with the defaults (m=16 and ef_construction=64) and increase/adjust them once you find the right balance between recall, query performance and index build time. Note, the ef_construction needs to be at least double of m.
Great job done here. Thanks a lot for sharing this.
Many thanks!
Thanks for this video! Got some really promising results in just a few hours! Quick question -- I don't need the context to have the full table ["products", "users", etc]. For my use-case I only need it to have context for the user. (i.e. products for user 1005, purchases for user 1005, etc) If I provide the full tables in include_tables, it very quickly reaches the token limit. Is there a way to dynamically reduce the amount of context when initializing the Langchain Database agent?
Excellent! glad you managed to get it working on your end 👍 Try to instruct the LLM to retrieve that context dynamically. For instance, you can say to execute the query "select name, price from products where id = {id}" setting the id as a parameter. Then, the LLM can perform this request over the database and pull a user-specific context. Also, LangChain support various tools/actions that let LLM pull info from other sources or perform various actions: python.langchain.com/v0.1/docs/modules/tools/
@@DevMastersDb That makes sense, the issue is (unless I'm misunderstanding), passing entire tables to the LLM, regardless of whether the LLM knows that it should filter down to a subset of data, seems to take a lot of tokens. i.e. We're providing a lot of context, and then asking the LLM to disregard most of it (as opposed to providing the narrow context in the first place). As a result, if I add more than two tables, I get the error: {'error': {'message': "This model's maximum context length is 4097 tokens, however you requested 10494 tokens. Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}} I'm extremely new to this stuff (just a day or two), so I could totally be missing something! One thing I'm going to try next is to create a View with the data after I've applied some joins and filters to it and then pass the view in include_tables instead, so I'm providing just the minimum context the model would need.. not sure if that'll work, or is even the right way of thinking about it 🤔
@@anuragangara2619 how many tables do you folks have? First things, yes, ask the agent to look into those which are necessary by defining them in the "include_tables". The LLM will pull only the metadata of those tables and not the actual data. So, it should fit in the context window. And then in your system message for the LLM you also say that use this query to pull the actual data for a selected user. If that data set doesn't fit into the context, then try to filter by the user_id/product_id and some time frame. Anyway, take it slowly and learn LangChain capabilities by playing with a few tutorials. You'll definitely get it figured!
@@DevMastersDb Hmm, odd -- only 5 or 6 tables; personal project, so just me. I'll look into that then! Thanks so much for the great video, really enjoyed this!
@@anuragangara2619 make sure that LLM doesn’t pull data, enable the “verbose” mode to see what happens. Also, start with 1-2 tables and then add more, see when it breaks. It might have to discover the root cause. Glad you liked the video, thanks for feedback!
Can I achieve the same setup using a local Postgres and Azure OpenAI? Any guidance would be highly appreciated. I'm attempting this on Windows, so my Postgres is running on Windows. I'm interested in using Azure OpenAI embedding to achieve the same result. I watched your video where you demonstrated using an Azure Postgres instance to achieve this.
Yes, it should be doable. You need to use an Azure OpenAI SDK (or another framework that supports Azure embedding models) to generate embeddings. Here is a quick tutorial by Microsoft for Python: learn.microsoft.com/en-us/azure/ai-services/openai/tutorials/embeddings Btw, what's your programming language?
@@DevMastersDb thank you for your reply. Using Python. Let me go through link. Thanks again man. Really appreciate it.
@@debarghaya excellent!
@@DevMastersDb I have multiple tables and these tables have different columns. I would like to generate vector for multiple tables. What is your recommendation for vector storage. Same table with new vector column or new table specifically for the vector?
@@debarghaya Postgres stores vectors with a large number of dimensions in the internal TOAST (The Oversized-Attribute Storage Technique) tables. Thus, from the storage perspective the vector columns can be either in your existing tables with other data or in separate tables. I would store the vector columns in the original tables if I need to filter data by the vector columns and other columns frequently. For instance, if most of the times your queries look like "select * from table1 where vector_column <=> ... and other_column = 5 or another_column = 4" then it makes sense to store vectors in the original tables with other data. Otherwise you would need to join the original tables with the tables dedicated for the vectors.
I have set the following the application properties for the root CA DATABASE_CA_CERT_FILE= C:\Users\deb\Documents\pgvector-azure\DigiCertGlobalRootCA.crt Getting this error ❯ node .\backend\embeddings_generator.js node:internal/process/promises:289 triggerUncaughtException(err, true /* fromPromise */); ^ Error: self-signed certificate in certificate chain at TLSSocket.onConnectSecure (node:_tls_wrap:1674:34) at TLSSocket.emit (node:events:519:28) at TLSSocket._finishInit (node:_tls_wrap:1085:8) at ssl.onhandshakedone (node:_tls_wrap:871:12) { code: 'SELF_SIGNED_CERT_IN_CHAIN' Any clue is highly appreciated. Using Azure Public so relevant cert is used.
Try to add the "rejectUnauthorized:false" flag to the SSL settings of the database connection parameteres. ssl: { rejectUnauthorized: false, ca: fs.readFileSync('path_to_your_root_certificate').toString()}, But get a proper certificate before you go in prod :)
Denis is a legend!!!!! 😂. Slow and understandable!
Thanks! Let’s learn it slow and gradually:)
Hi Denis, is there a way we could use PostgresML to specify it text if addressing certain topic, we could go to another table. This way we enhance the power of PostgresML and Spring AI
Hey, sure, even though Spring AI doesn't support PostgresML natively (and the PostgresML team doesn't have a client library for Java yet), you still can use Spring JDBC Client (or another driver/framework) to work with PostgresML using the SQL APIs. Btw, this is a PostgresML video that I shot in the past: th-cam.com/video/Gtz883daf1Y/w-d-xo.html
@@DevMastersDb Thanks I figured out that langchain is a better option for me. As I can convert natural language to SQL. However I need to figure how to convert python to GPT. Then how to use GPT
@@ifeolu8501 check out LangChain4j if Java is better option for you
Thanks a lot!
You're welcome!
Thanks Denis for putting this great tutorial together. One of the best topics I encountered in PGDay Chicago 2024. It was a pleasure being a cohost during your session.
Hey, thanks for sharing your feedback and picking my session among the others! Staying in touch.
Promo'SM
Love your videos!
So glad! Thanks for support!
Thanks Denis for sharing this. My only suggestion will be to use the --net host option instead of -p 5432:5432. I ran some benchmarking and the first option is much more performant as it doesn't use the docker network isolation for communicating with the container.
Yes, agree. I always use a dedicated network when need containerized applications talk to a Postgres instance in Docker.
Another simple and awesome video. Great work pal! I can't wait for the next episode 🍿
Thanks a ton, my friend! I’ll do my best to get the good stuff coming 👍
:D
Hey, man awsome content! do you mind telling where to learn about postgres multitenancy? a client recently asked me to have his data separate from the main database i use for everyone and im confused on how to do this.
Hey, thanks for sharing your feedback! Does the client just want to ensure his data is not visible to other clients, or has the client explicitly asked to segregate his data into separate database tables? Overall, consider these options: 1. Shared tables for clients. All customer data is stored in the same tables, but there is a special column such as "tenant_id" that lets you filter out one client's data from another's. You need to set up row-level security to ensure that when an application requests data belonging to tenant_id=5, the data of other tenants won't be visible. 2.Schema per client. Each client has his own set of tables stored in a client-specific schema (CREATE SCHEMA client1, CREATE SCHEMA client2, etc.). The application needs to specify the schema while querying or updating data of a specific client - "select balance from client5.account" or "select balance from client100.account". Distributed Postgres is useful for multi-tenant use cases if you expect to have many tenants and will need to scale. It's also useful if you need to pin each tenant's/client's data to a specific location (zone, region, database server).
what model is this using? Can we do this with Azure openai?
I used OpenAI GPT 4. Absolutely, you can use other models including Azure OpenAI