@Kingromstar Same. I'm a relative newbie to the tech world. But I've known postgresql is perhaps the best generic DBs out there. At some point, you'd want to move to self-hosted ScyllaDB, especially if you need sub-millisecond ops at scale. But, if you're building something at that scale, you'd know the details (if and when). For most starter projects, postgresql is the best choice. What I've also learnt is that it's equally important to vacate your cloud provider timely, otherwise you'll be paying through your nose to do so later.
@@shubhamkukrety6355ScyllaDB should NOT be your go-to at scale. Nothing wrong with it per se, but once you get to the scale where something like Postgres hits its limits-which is a bigger scale than 99.9% of projects ever reach-you should not be blindly reaching for any one off-the-shelf solution. At high scale you should be choosing the niche solution that most closely fits your requirements, and those requirements will NOT be what you expected at the start. Be flexible and bend to where the numbers lead you.
MariaDB v PostgreSQL would be a great comparison I think, not many opting to run MySQL over MariaDB these days, and I imagine it should be a bit of a closer race.
It would be more correct to compare MariaDB vs open source Citus Data(PostgreSQL + extension) or its commercial version Azure Cosmos DB for PostgreSQL.
Testing SQLite against any of those would be very interesting. I know that SQLite is a different "animal", and it does not scale horizontally, but I found in most of my use cases that it provides more than enough performance (actually much more), and it is incredibly easy to operate and maintain (less moving pieces). In addition, there are already solutions for availability and replication which are simple. So, I am using most of the time SQLite in production.
@@boot-strapper Yes, it is becoming a very attractive proposition. However, given that for "typical" applications (e.g. web with a db backend) the performance for writes with SQLite is limited by the disk (no network hop between the application an the DB), I do not normally require horizontal scaling for performance reasons. Below 100 tx/sec (writes), SQLite blows away the "traditional" DBs, and if I hardly need more than that. Availability is another thing, but it is also relatively simple to achieve with SQLite these days.
@@jesusruiz4073 oh yeah I agree. I wrote a mock data service that generated mock data. It was able to generate 3million records and insert them into SQLite in
It is tricky, because in reality you should utilize all db features (extensions in pg, OTLP tables in Azure SQL and whatnot). And realistic benchmark is challenging to make :)
@@IanWilkinson-SgtWilko how? The imperative part is almost like Oberon, Modula or Ada. wordy? yes, but comparatively easy to read. the SQL part looks like normal, standard SQL. not some completely arbitrary other thing. eg: Oracle uses “ for identifiers, not strings, like actual SQL, only ‘ is for strings and || is for concatenation. mssql is like a hipster trying to make it all different just because… if you limit yourself to SQL then targeting Oracle, postgres, Firebird and even sqlite to an extent will be fairly easy. mssql is just different for no reason other than the hipster dialect they bought back then (sybase)
The test seems a bit too one sided, the main architectural difference in MySQL vs Postgres is MySQL write operations (write, update, delete) are amortized so data is always "balanced", while Postgres postpones it until VACUUM and it is prone to degradation, when vacuum starts under load. It would be nice to show how these databases will compare under mixed workload, when records are inserted, updated and deleted at the same time, and data that is being changed should be in the index to test tuple bloating. Also can you publish schemas for databases?
That can be seen in the IO/s in the graph on right side, second from the bottom. MySQL is doing a hard fsync to disk or something along the lines. PGsql doing 1/3 of the disk write, and you can't cant cheat on disk writes. They either happen or not. So PGsql is running with a less durable data model in this test
@@opensourcedev22 Not true. Postgres writes to disk on each commit. It uses a copy on write model that lets it write asynchronously to disk before committing, but the commit step at the end of the transaction is synchronous and atomic. The default isolation level in postgres is much stronger than the default in mysql, since Postgres read committed is Monotonic Atomic views + causal consistency, while mysql repeatable read does not actually fulfill the ISO sql read committed standard in all situations while still being able to deadlock
@@BosonCollider No need to copy paste chatGPT into here brother, just look at the 3:40 mark. He's running 290 Q/s, and pgsql is doing 309 IO/s... And MySQL is at 1.4k I admit that personally, I am unsure what low level pgsql is doing in this specific teat, but disk IO can't be cheated unless it's not being used. So, are you honestly willing to admit pgsql does 300 Q/s and with ONLY 309 IO/s ? I think a database configuration is needed here with all settings posted to be sure
Am I the only one who didn't see MySQL config? What is the table engine for MySQL? Not sure if this is still the case nowadays, but MySQL used to come with really tiny default parameters for INNODB engine. And the first thing you do with MySQL is to set those params like innodb_buffer_pool_size, etc. (there are a bunch of them) I. e. nobody uses MySQL out of the box. You have to set those params to reflect your load. The easiest way were running tuning-primer or MySQLTuner script after running the workload. You had to do this several times.
@@michalszalapski The query cache not only wouldn't help here, but it got removed in MySQL 8 because it was terrible. It's been disabled by default since MySQL 5.6 anyway, for that same reason. It would not be a thing in MySQL 9.
@@neothermic1 It would help here and ye, thats why Im not using mysql since 5.6. Never though ppl still use mysql... Mariadb query_cache works very well and is mandatory for high load clusters(unless u have no clue how to maintain it).
@@michalszalapski From mariadb's docs: "It does not scale well in environments with high throughput on multi-core machines, so it is disabled by default." - It would then raise the question as to if it can help here; these are about the same reasons it was disabled and removed from mysql itself. But I think we're in agreement that without seeing the config of the mysql server, it's being tested at a huge disadvantage as the defaults are awful.
Already commented about some technical stuff, now about the design: for testing SQL databases I would like to see a slightly more complex set of operations, UPDATE and DELETE are missing, having more JOINs would be more realistic, where clauses that don't use an index or different types, sorting and aggregation would also help to highlight different strength and weaknesses. Maybe something like a simplified online shop with shopping cards (adding/removing items, changing amounts), orders and reports like the top 100 customers or articles in a given time frame. And also very important: transactions. In other words some kind of CRUD app that does everything in SQL. A lot of work and preparation though.
thank you for the feedback. i'm working on improving this test based on all the feedback i got. the suggestion about an online shop is very interesting.
2 Years ago i was tested on bare metal mysql vs postgresql using 0.9 TB file with real events. Nodejs was used, both databases used default configs. There was ~90% of selects, rest was IUD. With low level of concurrency mysql was winner, with > 30 workers postgre on a top. I played with database structure and other things, and found, than there is no clear winner or loser. Summary: both mysql and postgre have own advantages/and disadvantages, and database selection is depends on many factors.
Did you also consider fine tuning the database configs, since the default configurations are not optimized at all. Think about innodb_buffer_pool_size for Mysql. And shared_buffers in PostgreSQL.
@@moverecursus1337 This is fundamental difference between database product, and database project. In case of a product, developer might assume, that database will run on a dedicated server to make the best use of expensive per core licenses. In case of a project, such assumption cannot are not justified, database might run on a dedicated server, or it might be colocated with web application. In the latter case there must be a way to limit database memory usage, so application server won't starve.
In my experience with MySQL, charsets have a huge impact in database size and performance if not correctly specified. i.e utf8mb4 vs latin1 will show a big difference in database size, also in performance if the client is using different charsets than the server, it slow down every operation involving text because it needs a "translation".
I suppose it is calculated from VPS free space, like current_disk_usage - predefined_disk_usage at start of tests, so maybe rotating some large logs caused this or postgres autovacuum of some previous data? although postgres autovacuum does not free disk space by default, right?
I guess he takes a baseline and calculates size by subtracting base from current value. Around 7:35 we can see the graph goes low and then goes up. It means the baseline may be wrong. But the minimum value was around -110 MB but end value was 4.9 GB which makes it 5 GB - postgres vs 18.7 GB - mysql
Flaw of the benchmark environment set up. The graph jumps to negative few hundred megabytes for postgres, so clearly it wasn't doing `df` on the $PGDATA directory, but used a constant value subtracted from the current free space. Thus it could have been anything on the server affecting the results, or postgres log rotation, or dirty postgres data folder having some leftover data in pg_wal/ that got removed naturally. As others said, autovacuum doesn't reclaim free space.
@@hypnoticlizard9693 Why do you think like that? As far as I know, SQL Server is more advanced. I watched CMU database lessons in TH-cam, the professor was saying that. However, I'd use PostgreSQL if I had a choice as well, cause no license cost.
I'm not DBA at all. But there were stand many times (not by me, of course), that MySQL optimized more for reads and for OLTP operations when Postgres for OLAP ones. In addition, MySQL is better for horizontal scaling; and Postgres for vertical scaling. It looks like the real difference between these DBs will be clear when we'll have a huge amount of data, loaded in DB (so we already apply some scaling, i.e. replicating, sharding, etc.), as well as mixed requests (i.e. SELECT/INSERT/UPDATE/DELETE at the same time). For such scenarios as in the test (store for analytics records) choosing either of these two bases looks the same and is equally wrong - something like MongoDB or Cassandra looks more preferable since it would scale for used operations much better.
A lot of those statements are parroted from benchmarks over a decade old. I haven't seen anything recent where MySQL can match Postgres. Horizontal scaling was the most recent thing that Postgres changed the story on, but read performance has been better in Postgres for a number of years now in my experience.
@@DanWolf-codeNerd i think any of us would love to use open source and fast Postgres over MySQL if Postgres beats MySQL in literally ANY aspect. We just need some numbers from exhaustive tests, that prove the claims. Do you have such numbers, or links to such tests? (Surely, a 10-minute test for one simple insert; then one simple select, as this one, is not enough.)
Mongo is a very different DB since you do not use SQL and as soon as you need to join different tables together MongoDB is quite a lot harder to use efficiently. I have been using it a lot and for the right use cases its blindingly fast but my opinion is that if you have anything but separate lists of objects and need to use only a single type of DB, go with an SQL. It not necessarily the fastest in many cases BUT it lacks any serious weak spots that can derail a project. If you need to do more adhoc queries over multiple different data sets, mongo can be very hard since you most likely need to load things into memory and do the work in your own code and except for small databases where everything fits in memory, that is very hard to get any performance in. On the rest I totally agree, MySql are both good database with slightly different strengths which even for quite large projects are going to be almost irrelevant. For innodb you really need to make sure of the settings.
@@AlexanderBorshak I don't, but I also don't need benchmarks. I deal with an enormous throughput of data in my job, with compute and database combined costing over $100k/month on AWS for just my team, but if we had to scale upwards, we would be able to continue to scale for a very long time. MySQL might be able to as well, but since I know Postgres has so much headroom, a theoretical difference doesn't mean much. If you were to work for a larger company than mine, you could probably go even further and write your own custom modules if you eventually ran out of runway for either database. Both are extremely mature. I have had experience with MySQL performance. I'm certain that was probably entirely or nearly entirely about poor design in that case, not about the database.
You didn't say anything about the tuning of the database servers. Tuning it well is primordial for performance ! You need to tune memory, and disk buffering among other things.Also you said nothing about the indexes that were used for the tests. Without those details, the comparison is pretty meaningless because default tunings usually aren't production tunings but development tunings. They are very conservative and limit the performance greatly. In general, you should ask for advice from the general public before performing your benchmarks, because sometimes, you make very naive assumptions that are simply incorrect and completely distort the results.
@@flamingspinach Well no...even untuned, the test is relevant ! It means that you if you can't or won't put the effort in "tuning" go for Postgres it gives you best likelihood for best performance with out-of-the-box config
@@dextrowim "It means that you if you can't or won't put the effort in "tuning" go for Postgres it gives you best likelihood for best performance with out-of-the-box config" No it does not - it only tells you that on that particular hardware and OS, with his specific DB setup and that particular workload Postgres is overall better. But there are hundreds of unaccounted variables at play here. If you use the default settings than ANY claims you make about performance are already wrong.
Those are 2 completely different databases with different usecases. Try doing complicated joins in mongodb or any relational data stuff, you simply shouldn't do that.
@@qizott6442 Not really. The question is not 'how good mongodb can do relational stuff', it's 'can postgres handle JSON as good as mongodb'. PG is very flexible and it's interesting to see how it compares to mongo.
@@qizott6442 relational stuff is completely doable with mongo aggregates, shouldn't !== cannot, and a lot of projects that have picked mongo initially, because it's somehow easier, end up with complex "joins" in mongo. It's well known that mongo's performance is terrible at this. But it's interesting to see how postgres can beat mongo in mongo's game.
I've seen a benchmark from PG developers where they demonstrated that their db beats mongo on JSON operations. That was a response to a benchmark published by mongo where they run PG installed from Ubuntu with its default settings.
@@antonkuranov I guess that's why benchmark from 3rd party on this would be interesting. Truth being told I wouldn't be surprised if PG really did beat mongodb at it's own game ;P
Love PostgreSQL. For best performance, it's recommended to configure PostgreSQL with "ALTER SYSTEM SET various parameters" - eg.shared_buffers, effective_cache_size, work_mem, wal_buffers, random_page_cost, max_worker_processes and a few more. If you don't care about ACID and want maximum write performance you could also disable WAL or create tables as UNLOGGED.
How the MySQL is configured? How big the buffer pool size and redo log size you used for your test? These two are the most important config for high perf innodb instance. The default values are very low for production usage. When running mysql instance, the buffer pool should be accounted for as much of RAM as possible. From 60% to 80% of the RAM (bigger RAM the percentage should be bigger).
This! I see no mention of the configuration that's been done to either database server. There's a few more too, such as what was innodb_write_io_threads set to? Did you set innodb_io_capacity and innodb_io_capacity_max to roughly match what your storage medium is? What *is* your storage medium? (HDD? 2.5" SSD? M2 NMVE? Enterprise U2 drive? these differ drastically!) The defaults are terrible, but the defaults are run-everywhere defaults. You ALWAYS want to configure a mysql config to where you're deploying the server to.
For the read iops graph for postgres being stuck at 0: - check the query - maybe it is measuring write ops instead, especially if the second dashboard was created by copying + tweaking the first dashboard - super easy to forget to replace "write" with "read" - maybe postgres is at 0 because it managed to fit the entire dataset into the page cache? Depending on how memory is measured, it may not include the OS page cache, which contain the contents of recently-used files. If every read for postgres was a page cache hit, the reads would never go to the disk, and you'd get 0 IOPS. 1KB records * 64M would start to fit in memory on a large VM, especially if the DB compresses the data before writing it to disk (which, if pg compresses and MySQL doesnt, also explains the large difference in DB size and write IOPS!) The page cache also isnt cleared by restarting the DB - the cache is in the OS after all, and is just caching file contents, for every process. To clear this, you'd have to run a command to drop the page cache, or reboot the VM in between tests. Or - before the read test, try writing at least 8-10x the VM's RAM in records (calculate it by adding the size in bytes of each field on every table, then multiplying by the number of records - not by looking at disk stats!) to ensure that the cache cant be big enough to fit the entire data set, regardless of compression
Nice benchmark! A couple of things: 1 - I believe that Postgres uses a process per connection. Are you making sure to measure the combined CPU and memory usage of the 'main' process along with all of the connection processes? The reason that this is relevant is that MySql is thread based, so the main process should show the overall CPU/Memory whereas with Postgres you might only be looking at the DB process which is kind of cheating since the connection processes also use CPU/Memory. 2 - It would be cool to see a Java benchmark using the latest Java as well as the latest JDBC drivers for each. Be sure to use a connection pool like Hikari if you do.
I love seeing benchmarks between databases. We need more of them. Folks should keep in mind that schema design and use cases make a bigger difference than engine in most cases, just like data structures and algorithm choice makes a bigger difference than the programming language. There are also so many "shortcuts" that each engine implements for different scenarios. It's still good to see comparisons with apples-to-apples. It's a shame the proprietary db vendors don't allow performance comparisons like the open source DBs allow.
Database is very sensitive to removing records from the database. So additionally to insert and fetch will be good to deleting and updating operations.
i didn’t use dedicated disks for data storage and used the main os disk. when i installed mysql and postgres, the differences were 6 and 4 gigs, so i just subtracted that value from the graph before the test. i'm working on an improved version of this benchmark based on the feedback i received.
using a binary copy with pgx will allow you to insert jaw dropping numbers of rows per second. and it only re-evaluates indexes once when you say complete. it’s slightly slower cousin allows you to stream csv directly into or out of the psql cli client.
It could have, except MVCC wasn't really used if it was just 80M individual inserts on autocommit. Until tuples are updated or deleted, or fillfactor is configured for a table, MVCC has near zero overhead costs. (The longer answer is that ever tuple has around 16+8B of invisible metadata in table heap, another I think 26B for page header (one page by default = 8kB) and some more in the visibility map file.)
Small follow-up question: What about PostgreSQL vs MySQL vs MariaDB? Yes, MariaDB is used a lot now (including myself).. So it would be nice to include it in the test.
MongoDB vs PostgreSQL. It would be really cool to see two different tests where first mongo tries to work with relational data by running joins, and then second model the data for PostgreSQL to be used as a document database with bson.
>If you have any suggestions on how I can improve the test design There are a couple other tests I'd be interested in seeing: 1. Index efficiency 2. "WHERE IN" clause efficiency 3. Multi-join efficiency 4. Handling of paged data 5. Text operations To handle these I might add a third table called "messages". Each message would have a customer_id (which customer sent the message) and a text field containing the message contents. Message contents should be a VARCHAR with a max length greater than your page size (e.g. for Postgres, the max length should exceed 8192). Next I would add indexes to the "name" column of customers and the "contents" column of messages. I would not add an index to the events table for this test, as we don't want to interfere with the write throughput tests and you are writing to this table for those tests. To test (1) - index efficiency - simply write new customers, then do COUNT queries based on the name. This will test both the index implementation (write throughput with indexing) as well as any special efficiency (e.g. if they record the cardinality for use in COUNT queries) To test (2) - "WHERE IN" clauses - I'd write a clause that's trivially improved using this index, to see if the database can take advantage of it. Something like "SELECT COUNT(*) FROM events JOIN customer ON ... WHERE customer_id IN (SELECT id FROM customers WHERE name = ...)" To test (3) - "multi-joins" - I would intentionally put the lowest-cardinality table first in a query, as this can often interfere with index utilization. Something like "SELECT contents FROM messages JOIN customers ON ... JOIN events ON ... WHERE action = CREATE_USER" To test (4) - paged data - I would write and read data from the "messages" field, taking care that the reads should be indexable. This means the read query should be something like "SELECT * FROM messages WHERE contents LIKE prefix%" instead of "SELECT * FROM messages WHERE contents LIKE %postfix" To test (5) - text operations - I would try searching for customers by the "LOWERCASE" of their name
Thank goodness you blurred the names on those graphs not to spoil anything! I cannot possibly figure out which is which otherwise. Their names are too similar, and neither has associated colors.
I'm assuming mysql was using InnoDB engine? MyISAM is practically deprecated but it'd still be interesting to see this comparison using it and also vs Aria engine from mariadb
I think you should be careful with default configs. InnoDB by default has very conservative configs. Good for testing but not sure its good for production. @@AntonPutra
Thank u for this comparison, even though most of us probably already knew postgres was going to win. I'm not sure if u already did this but I'd love to see how Postgres compares to MongoDB, the two GOATs of the DB world. 😀
Hi, Test should use full capabilities of the db, therefore postgres procedures and functions should be used to reflect the maturity levels of the used databases. To decrease the time of tests would recommend ram disk. Would be nice to see oracle xe vs postgres, with both using db objects.
This is not 100% clean benchmark as in Ubuntu packages these DBs has default settings optimized not for performance, but for low resource consumption. Playing with the configuration may significantly change the result.
mysql uses repeatable read isolation level by default and postgres uses read committed.we might need to re evaluate this once again!The go sql client and infact postgres client would be using autocommit mode by default which is a transaction at the db level!So even for same query read or write mysql would need to do more work theoretically compared to postgres
I would be very interested in a non-bias comparison between FirebirdSQL and Postgress made in the same way as you did betweeN postgres and MySQL. I use FirebirdSQL on a daily bases and I didn't find any problems with it. I would like to know if changing the database in the future makes sense from some point of view. Thank you .
I am missing some important details. Did you use the innodb storage engine and what was the setting for innodb flush at trx commit? And was a similar configuration used for postgres regarding transaction safety?
yes, i used innodb. as far as i understand, foreign keys are only supported with that storage. in this particular test, i mostly used defaults, and i have an updated video with full CRUD and configuration settings on git. please take a look. Round 2 - github.com/antonputra/tutorials/tree/main/lessons/216
something funny is goin on with MySQL - the downward curve in iops while the inserts/second is increasing - implying inserts got easier as the inserts/second increased. The downward curve coincides roughly with capping out the connection count and the cpu spike at 21:50. Towards the end, the MySQL cpu also decreased, even though that was the highest QPS achieved.
Hey Anton, I have been watching your benchmarking series from the start and love them! I might have an idea back on the language performace... it would be interesting to see the power of fully-fledged JVM vs something like Go or Node? Last comparison was using Java compiled to binary without the JVM. The memory usage will be massive but I wonder if the JVM at peak performance (after warm up) can outperform other GC languages in terms on Number of Requests before failure and the Response Time. Something like Amazon Corretto or even OpenJDK would be interesting
It is a really good test! Both B+Tree database and a really big difference between I.O. IMO is more related with the storage engine strategy that postgres has implemented
I was afraid to click on this video because I thought it might disagree with my cognitive bias in favor of Postgres. I'm pleased with the affirmation. Thank you!
@@AntonPutra I was thinking something like scaling up gradually from 10-1000+ connections, just to quantify the overhead of idle connections. It’s a useful metric imo because in serverless environments # of clients connected can scale up quickly
i like the visualization you do, it's a great job. but i think the load scenario is too simple to judge the DBMS performance. pay attention to tpc-c testing and hammerdb.
Interresting, thanks. I'm interested in knowing how Postgres was configured with regards to random_page_cost and shared_buffers. And did you change defaults for values like checkpoint_completion_target file synchronization related parameters?
Nice benchmark. What exactly happened with PG insert latency spike at the end 6:25? Note this is for a rather small record size, I wonder how they will scale with a large record size. I also wonder how they compare with more complicated join queries, compare their query plans, when they can no longer complete everything in memory.
My current hell at work now is trying to get postgres to be fast doing large queries at scale (5k req/s) Postgres is great for small things but when you get to reasonably large sets of data running a lot of queries per second, and the queries contain joins, it's such a pain to try to trick postgres into using a sane query plan, with a mess of ctes to make it do things in the right order. You can make it fast but it's a lot of work. The biggest problem is one slow query (run a lot of times) can 100% the CPU on the server and take the whole thing down. pgbouncer is also essential because postgres does one process per connection. Would have used nosql but data is very relational with a lot of many to many relationships.
Have you considered or proposed sidestepping the problem partially with a materialized view as a cache? you probably have, but i thought i'd put it out there.
@@laundmo thanks for the suggestion! It could work but the materialised view can get stale, refreshing it can be expensive and a lot of the view would never be read, although honestly I haven't used them that much so it might be a skill issue. Usually just having good indexes solves the problem, but sometimes postgres comes up with crazy query plans that don't use the indexes you expect, and then does big nested loops etc, so you have to trick it into executing in the way you expect. I had one query where it needed to look up a row from table A by its primary key, then join with table B, but despite there being indexes it decided to read all the data from table B using an unexpected index then join it on all the data from table A, and then filter it down to almost nothing. In that case I fixed it by rewriting the query to use a CTE based on table A before joining with table B, and then it executed it in the order I was expecting and used the correct indexes. When it does work, it's really impressive what it can do in less than a millisecond though.
@AntonPutra could you upload the ddl for both mysql and postgres into the github repo? I did a similar benchmark on my own about 6 months ago, but focused on larger batches of operations and our results are wildly different.
The problem with testing databases is that you really really have to understand their configurations. What if one is slower, because it's configured to flush data to disk more often, so it loses less data when the server goes down? You're basically testing default configurations, not actually databases. Another problem is how little data you've inserted. It probably fits into RAM. So again, not a realistic test. You have to insert gigabytes to make it meaningful.
Nice work. It would be nice to know the versions used and a version that use the pgx driver. Also I wonder, how would they compare used with Vitess (yet another Go pearl) for MySQL and Citus for PG (and maybe TimeScaleDB and ParadeDb extensions as well). It is good the source code is in Go, I think it is allowed to get the source and use maybe even to modify it a bit (for pgx for example)?
Hi, next time (sorry, I am not that fluent with go, so I cannot create PR) can you do 20% writes + 80% reads on the same run? So we can see how good the process is when load is mixed? rand(0,1) more than 0.8 = write a row, increase max id, else do a read rand(1, max_id). In this way it will sometimes do more writes and sometimes more reads and it will be more distributed. Also: could you test using auto increment/identity vs library like snowflake and using external id? how does auto increment mechanism slows down writes. And all that in mysql vs pg vs maria ;) Please :)
Can you also show the percentage per "request per second", or simply change the x axis from time to req/s and average over time the values. It would make result easier to interpret.
I would love to learn how to setup this kind of tracking for my test apps. I’m assuming this is Prometheus and Grafana? Would love a tutorial video on these!
🔴 To support my channel, I'd like to offer Mentorship/On-the-Job Support/Consulting (me@antonputra.com)
surrealdb vs other databases
SQLite/libSQL vs Postgres, the in-process nature should make a difference for many use-cases.
Spring boot with virtual threads
timescale db vs clickhouse db
Javalin vs go
That's the most advanced open source database for you. I love postgresql.
Leave the "open source" away
@@twei__ why?
i'm working on an improved version of this benchmark based on the feedback i received
marry it
This is shocking. I never knew that Postgres was such a clear winner for most of the time.
I knew it, that's why I hate using tech stacks that are "bad"
@Kingromstar Same. I'm a relative newbie to the tech world. But I've known postgresql is perhaps the best generic DBs out there. At some point, you'd want to move to self-hosted ScyllaDB, especially if you need sub-millisecond ops at scale. But, if you're building something at that scale, you'd know the details (if and when). For most starter projects, postgresql is the best choice.
What I've also learnt is that it's equally important to vacate your cloud provider timely, otherwise you'll be paying through your nose to do so later.
Frankly speaking it is not shocking, knowing what kind of test coverage and query optimizations postgres has
@@shubhamkukrety6355ScyllaDB should NOT be your go-to at scale. Nothing wrong with it per se, but once you get to the scale where something like Postgres hits its limits-which is a bigger scale than 99.9% of projects ever reach-you should not be blindly reaching for any one off-the-shelf solution.
At high scale you should be choosing the niche solution that most closely fits your requirements, and those requirements will NOT be what you expected at the start. Be flexible and bend to where the numbers lead you.
@@mileselam641 True. This is the caveat of "if" I didn't delve much into. Pardon my ignorance.
MariaDB v PostgreSQL would be a great comparison I think, not many opting to run MySQL over MariaDB these days, and I imagine it should be a bit of a closer race.
It would be more correct to compare MariaDB vs open source Citus Data(PostgreSQL + extension) or its commercial version Azure Cosmos DB for PostgreSQL.
Is MariaDB that different from MySql though ?
I'm also curious to know.
I'm also curious to know.
@@jonnyso1 Not that much. Pretty much identical except licensing.
Testing SQLite against any of those would be very interesting. I know that SQLite is a different "animal", and it does not scale horizontally, but I found in most of my use cases that it provides more than enough performance (actually much more), and it is incredibly easy to operate and maintain (less moving pieces).
In addition, there are already solutions for availability and replication which are simple. So, I am using most of the time SQLite in production.
There are many tools that let you horizontally scale it now.
@@boot-strapper Yes, it is becoming a very attractive proposition.
However, given that for "typical" applications (e.g. web with a db backend) the performance for writes with SQLite is limited by the disk (no network hop between the application an the DB), I do not normally require horizontal scaling for performance reasons.
Below 100 tx/sec (writes), SQLite blows away the "traditional" DBs, and if I hardly need more than that.
Availability is another thing, but it is also relatively simple to achieve with SQLite these days.
I'm also curious.
Also it would be great to test different "flavours" of SQLite. For example SQLite with LMDB backend.
@@jesusruiz4073 oh yeah I agree. I wrote a mock data service that generated mock data. It was able to generate 3million records and insert them into SQLite in
I think one that would interest me a lot is MS-SQL vs PostgreSQL. I personall have worked with both but I vastly prefer PostgreSQL
It is tricky, because in reality you should utilize all db features (extensions in pg, OTLP tables in Azure SQL and whatnot). And realistic benchmark is challenging to make :)
@@MikhailAksenovV and MSSQL EULA stating you cannot do so without M$'s written approval
mssql (or sybase) must be the only DBMS with a worse dialect than mysql. which is hard, the one of mysql is already really bad…
@@Robert-Goracle's pl-sql is worse than mssql's T-Sql.
@@IanWilkinson-SgtWilko how? The imperative part is almost like Oberon, Modula or Ada. wordy? yes, but comparatively easy to read.
the SQL part looks like normal, standard SQL.
not some completely arbitrary other thing. eg: Oracle uses “ for identifiers, not strings, like actual SQL, only ‘ is for strings and || is for concatenation.
mssql is like a hipster trying to make it all different just because…
if you limit yourself to SQL then targeting Oracle, postgres, Firebird and even sqlite to an extent will be fairly easy.
mssql is just different for no reason other than the hipster dialect they bought back then (sybase)
The test seems a bit too one sided, the main architectural difference in MySQL vs Postgres is MySQL write operations (write, update, delete) are amortized so data is always "balanced", while Postgres postpones it until VACUUM and it is prone to degradation, when vacuum starts under load. It would be nice to show how these databases will compare under mixed workload, when records are inserted, updated and deleted at the same time, and data that is being changed should be in the index to test tuple bloating.
Also can you publish schemas for databases?
That can be seen in the IO/s in the graph on right side, second from the bottom. MySQL is doing a hard fsync to disk or something along the lines. PGsql doing 1/3 of the disk write, and you can't cant cheat on disk writes. They either happen or not. So PGsql is running with a less durable data model in this test
Don't forget hot updates, are 10 times faster than regular in pg, MySQL time is over
@@opensourcedev22 Not true. Postgres writes to disk on each commit. It uses a copy on write model that lets it write asynchronously to disk before committing, but the commit step at the end of the transaction is synchronous and atomic.
The default isolation level in postgres is much stronger than the default in mysql, since Postgres read committed is Monotonic Atomic views + causal consistency, while mysql repeatable read does not actually fulfill the ISO sql read committed standard in all situations while still being able to deadlock
@@BosonCollider No need to copy paste chatGPT into here brother, just look at the 3:40 mark. He's running 290 Q/s, and pgsql is doing 309 IO/s... And MySQL is at 1.4k
I admit that personally, I am unsure what low level pgsql is doing in this specific teat, but disk IO can't be cheated unless it's not being used. So, are you honestly willing to admit pgsql does 300 Q/s and with ONLY 309 IO/s ?
I think a database configuration is needed here with all settings posted to be sure
@@opensourcedev22 There is no way Postgres is not fsyncing after every write. They may be doing vectored IO, batches or whatever.
Am I the only one who didn't see MySQL config? What is the table engine for MySQL?
Not sure if this is still the case nowadays, but MySQL used to come with really tiny default parameters for INNODB engine. And the first thing you do with MySQL is to set those params like innodb_buffer_pool_size, etc. (there are a bunch of them) I. e. nobody uses MySQL out of the box. You have to set those params to reflect your load.
The easiest way were running tuning-primer or MySQLTuner script after running the workload. You had to do this several times.
I was looking for this. Well spotted
Exacly. Also if he use right engine mariadb instead of mysql. And enable query_cache it would crush postgress.
@@michalszalapski The query cache not only wouldn't help here, but it got removed in MySQL 8 because it was terrible. It's been disabled by default since MySQL 5.6 anyway, for that same reason. It would not be a thing in MySQL 9.
@@neothermic1 It would help here and ye, thats why Im not using mysql since 5.6. Never though ppl still use mysql... Mariadb query_cache works very well and is mandatory for high load clusters(unless u have no clue how to maintain it).
@@michalszalapski From mariadb's docs: "It does not scale well in environments with high throughput on multi-core machines, so it is disabled by default." - It would then raise the question as to if it can help here; these are about the same reasons it was disabled and removed from mysql itself.
But I think we're in agreement that without seeing the config of the mysql server, it's being tested at a huge disadvantage as the defaults are awful.
Pls also tests MySQL vs MariaDB.
The Results should be pretty simular but it would be interesting to see
and maybe percona
Agree!!
up! this is something that bothered me when i was still using WordPress
will do!
love your benchmarks, queep the good work!
thank you!
Already commented about some technical stuff, now about the design: for testing SQL databases I would like to see a slightly more complex set of operations, UPDATE and DELETE are missing, having more JOINs would be more realistic, where clauses that don't use an index or different types, sorting and aggregation would also help to highlight different strength and weaknesses.
Maybe something like a simplified online shop with shopping cards (adding/removing items, changing amounts), orders and reports like the top 100 customers or articles in a given time frame. And also very important: transactions. In other words some kind of CRUD app that does everything in SQL. A lot of work and preparation though.
like the idea, would be nice if it can be implemented
there are plenty of standard workloads for these tests like TPC-C
thank you for the feedback. i'm working on improving this test based on all the feedback i got. the suggestion about an online shop is very interesting.
true, the test must reflect real world cases, so the best test design would be based on an application and reflect its transactions
I knew that PostgreSQL was booming, now I understand why. Thanks for this work.
These videos are always so clean. Thanks for sharing this, it's really valuable information for scaling projects.
thank you!
2 Years ago i was tested on bare metal mysql vs postgresql using 0.9 TB file with real events. Nodejs was used, both databases used default configs. There was ~90% of selects, rest was IUD. With low level of concurrency mysql was winner, with > 30 workers postgre on a top. I played with database structure and other things, and found, than there is no clear winner or loser. Summary: both mysql and postgre have own advantages/and disadvantages, and database selection is depends on many factors.
i'm working on an improved version of this benchmark based on the feedback i received
Results may vary if we set proper db parameters in config files
I liked this comparison between databases! I like the idea of seeing more databases on these videos.
thanks, will do! i need to improve this one based on the feedback i received
Did you also consider fine tuning the database configs, since the default configurations are not optimized at all. Think about innodb_buffer_pool_size for Mysql. And shared_buffers in PostgreSQL.
"i'm now 😊 it seems that using default settings for database testing was not the best idea.
If you need fine tuning the database right after installing it, it's probably not the right and best database to choose
@@moverecursus1337 This is fundamental difference between database product, and database project.
In case of a product, developer might assume, that database will run on a dedicated server to make the best use of expensive per core licenses.
In case of a project, such assumption cannot are not justified, database might run on a dedicated server, or it might be colocated with web application. In the latter case there must be a way to limit database memory usage, so application server won't starve.
Thank you for the colors. This is very clear to see the graph.
thanks!
Thanks!
thank you for your support!
In my experience with MySQL, charsets have a huge impact in database size and performance if not correctly specified. i.e utf8mb4 vs latin1 will show a big difference in database size, also in performance if the client is using different charsets than the server, it slow down every operation involving text because it needs a "translation".
So just check if database / table / row charset and collation is the same as the go client.
That's because UTF8 uses 2 bytes/character vs 1 byte/character for Latin1.
thanks for the feedback
Why postgres at 4:05 had negative value in disk usage?
Must be freeing up space
@@jricardoprog No, that's not possible. It's clearly labels as "Database size", so it's not possible to say, that space was freed up...
I suppose it is calculated from VPS free space, like current_disk_usage - predefined_disk_usage at start of tests, so maybe rotating some large logs caused this or postgres autovacuum of some previous data? although postgres autovacuum does not free disk space by default, right?
I guess he takes a baseline and calculates size by subtracting base from current value. Around 7:35 we can see the graph goes low and then goes up. It means the baseline may be wrong. But the minimum value was around -110 MB but end value was 4.9 GB which makes it 5 GB - postgres vs 18.7 GB - mysql
Flaw of the benchmark environment set up. The graph jumps to negative few hundred megabytes for postgres, so clearly it wasn't doing `df` on the $PGDATA directory, but used a constant value subtracted from the current free space. Thus it could have been anything on the server affecting the results, or postgres log rotation, or dirty postgres data folder having some leftover data in pg_wal/ that got removed naturally. As others said, autovacuum doesn't reclaim free space.
Really love your benchmark series.
thank you!
Please check postgres vs sql server.
lol
Oracle will be cool as well
I want this as well. Forced to use sqlserver at work and wanna hate my life even more by seeing the comparison lol
I'd love to see this benchmark too. I've had people tell me MsSQL is faster than PostgreSQL but I have my doubts.
@@hypnoticlizard9693 Why do you think like that? As far as I know, SQL Server is more advanced. I watched CMU database lessons in TH-cam, the professor was saying that. However, I'd use PostgreSQL if I had a choice as well, cause no license cost.
I'm not DBA at all. But there were stand many times (not by me, of course), that MySQL optimized more for reads and for OLTP operations when Postgres for OLAP ones. In addition, MySQL is better for horizontal scaling; and Postgres for vertical scaling. It looks like the real difference between these DBs will be clear when we'll have a huge amount of data, loaded in DB (so we already apply some scaling, i.e. replicating, sharding, etc.), as well as mixed requests (i.e. SELECT/INSERT/UPDATE/DELETE at the same time). For such scenarios as in the test (store for analytics records) choosing either of these two bases looks the same and is equally wrong - something like MongoDB or Cassandra looks more preferable since it would scale for used operations much better.
A lot of those statements are parroted from benchmarks over a decade old. I haven't seen anything recent where MySQL can match Postgres. Horizontal scaling was the most recent thing that Postgres changed the story on, but read performance has been better in Postgres for a number of years now in my experience.
@@DanWolf-codeNerd i think any of us would love to use open source and fast Postgres over MySQL if Postgres beats MySQL in literally ANY aspect. We just need some numbers from exhaustive tests, that prove the claims. Do you have such numbers, or links to such tests? (Surely, a 10-minute test for one simple insert; then one simple select, as this one, is not enough.)
Mongo is a very different DB since you do not use SQL and as soon as you need to join different tables together MongoDB is quite a lot harder to use efficiently.
I have been using it a lot and for the right use cases its blindingly fast but my opinion is that if you have anything but separate lists of objects and need to use only a single type of DB, go with an SQL.
It not necessarily the fastest in many cases BUT it lacks any serious weak spots that can derail a project.
If you need to do more adhoc queries over multiple different data sets, mongo can be very hard since you most likely need to load things into memory and do the work in your own code and except for small databases where everything fits in memory, that is very hard to get any performance in.
On the rest I totally agree, MySql are both good database with slightly different strengths which even for quite large projects are going to be almost irrelevant.
For innodb you really need to make sure of the settings.
@@AlexanderBorshak I don't, but I also don't need benchmarks. I deal with an enormous throughput of data in my job, with compute and database combined costing over $100k/month on AWS for just my team, but if we had to scale upwards, we would be able to continue to scale for a very long time. MySQL might be able to as well, but since I know Postgres has so much headroom, a theoretical difference doesn't mean much. If you were to work for a larger company than mine, you could probably go even further and write your own custom modules if you eventually ran out of runway for either database. Both are extremely mature. I have had experience with MySQL performance. I'm certain that was probably entirely or nearly entirely about poor design in that case, not about the database.
thanks for the feedback. i'm working on improving this test
thank you, makes me feel better with my postgres preference
😊
You didn't say anything about the tuning of the database servers. Tuning it well is primordial for performance ! You need to tune memory, and disk buffering among other things.Also you said nothing about the indexes that were used for the tests. Without those details, the comparison is pretty meaningless because default tunings usually aren't production tunings but development tunings. They are very conservative and limit the performance greatly.
In general, you should ask for advice from the general public before performing your benchmarks, because sometimes, you make very naive assumptions that are simply incorrect and completely distort the results.
@marcsfeh OP's point was that he should ask for advice before releasing a video with potentially mis-analyzed data in it.
@@flamingspinach Well no...even untuned, the test is relevant ! It means that you if you can't or won't put the effort in "tuning" go for Postgres it gives you best likelihood for best performance with out-of-the-box config
@@dextrowim "It means that you if you can't or won't put the effort in "tuning" go for Postgres it gives you best likelihood for best performance with out-of-the-box config"
No it does not - it only tells you that on that particular hardware and OS, with his specific DB setup and that particular workload Postgres is overall better.
But there are hundreds of unaccounted variables at play here. If you use the default settings than ANY claims you make about performance are already wrong.
my mistake was not paying much attention to the configuration settings. i'm collecting feedback right now and will update this benchmark.
How about postgres jsonb vs mongodb?)
Those are 2 completely different databases with different usecases. Try doing complicated joins in mongodb or any relational data stuff, you simply shouldn't do that.
@@qizott6442 Not really. The question is not 'how good mongodb can do relational stuff', it's 'can postgres handle JSON as good as mongodb'. PG is very flexible and it's interesting to see how it compares to mongo.
@@qizott6442 relational stuff is completely doable with mongo aggregates, shouldn't !== cannot, and a lot of projects that have picked mongo initially, because it's somehow easier, end up with complex "joins" in mongo. It's well known that mongo's performance is terrible at this. But it's interesting to see how postgres can beat mongo in mongo's game.
I've seen a benchmark from PG developers where they demonstrated that their db beats mongo on JSON operations. That was a response to a benchmark published by mongo where they run PG installed from Ubuntu with its default settings.
@@antonkuranov I guess that's why benchmark from 3rd party on this would be interesting. Truth being told I wouldn't be surprised if PG really did beat mongodb at it's own game ;P
chill, I was already sold on Postgres lol
i'm working on an improved version of this benchmark based on the feedback i received
Love PostgreSQL.
For best performance, it's recommended to configure PostgreSQL with "ALTER SYSTEM SET various parameters" - eg.shared_buffers, effective_cache_size, work_mem, wal_buffers, random_page_cost, max_worker_processes and a few more.
If you don't care about ACID and want maximum write performance you could also disable WAL or create tables as UNLOGGED.
thanks, i'm working on an improved version of this benchmark based on the feedback i received
How the MySQL is configured?
How big the buffer pool size and redo log size you used for your test?
These two are the most important config for high perf innodb instance.
The default values are very low for production usage.
When running mysql instance, the buffer pool should be accounted for as much of RAM as possible. From 60% to 80% of the RAM (bigger RAM the percentage should be bigger).
This! I see no mention of the configuration that's been done to either database server. There's a few more too, such as what was innodb_write_io_threads set to? Did you set innodb_io_capacity and innodb_io_capacity_max to roughly match what your storage medium is? What *is* your storage medium? (HDD? 2.5" SSD? M2 NMVE? Enterprise U2 drive? these differ drastically!) The defaults are terrible, but the defaults are run-everywhere defaults. You ALWAYS want to configure a mysql config to where you're deploying the server to.
mysql sucks, it has always sucked. stop using it or deal with it
my mistake was not paying much attention to the configuration settings. i'm collecting feedback right now and will update this benchmark.
For the read iops graph for postgres being stuck at 0:
- check the query - maybe it is measuring write ops instead, especially if the second dashboard was created by copying + tweaking the first dashboard - super easy to forget to replace "write" with "read"
- maybe postgres is at 0 because it managed to fit the entire dataset into the page cache? Depending on how memory is measured, it may not include the OS page cache, which contain the contents of recently-used files. If every read for postgres was a page cache hit, the reads would never go to the disk, and you'd get 0 IOPS. 1KB records * 64M would start to fit in memory on a large VM, especially if the DB compresses the data before writing it to disk (which, if pg compresses and MySQL doesnt, also explains the large difference in DB size and write IOPS!)
The page cache also isnt cleared by restarting the DB - the cache is in the OS after all, and is just caching file contents, for every process. To clear this, you'd have to run a command to drop the page cache, or reboot the VM in between tests.
Or - before the read test, try writing at least 8-10x the VM's RAM in records (calculate it by adding the size in bytes of each field on every table, then multiplying by the number of records - not by looking at disk stats!) to ensure that the cache cant be big enough to fit the entire data set, regardless of compression
thank you for your feedback. i'm working on an improved version of this benchmark that will include all the feedback i've received so far
Nice benchmark! A couple of things:
1 - I believe that Postgres uses a process per connection. Are you making sure to measure the combined CPU and memory usage of the 'main' process along with all of the connection processes? The reason that this is relevant is that MySql is thread based, so the main process should show the overall CPU/Memory whereas with Postgres you might only be looking at the DB process which is kind of cheating since the connection processes also use CPU/Memory.
2 - It would be cool to see a Java benchmark using the latest Java as well as the latest JDBC drivers for each. Be sure to use a connection pool like Hikari if you do.
You are awesome!!! Please make a between PostgreSQL and Microsoft SQL Server!! I wasn't expected that.
I'd love to see that, however its unlikely, microsofts EULA states benchmarking results can't be disclosed without their approval
@@williamclark4143 True, but "Miachelsoft Sea Kwell DB "is a unknown database just leaving beta **wink wink**
thank you! will do!
@@AntonPutra 👏
I love seeing benchmarks between databases. We need more of them. Folks should keep in mind that schema design and use cases make a bigger difference than engine in most cases, just like data structures and algorithm choice makes a bigger difference than the programming language. There are also so many "shortcuts" that each engine implements for different scenarios.
It's still good to see comparisons with apples-to-apples. It's a shame the proprietary db vendors don't allow performance comparisons like the open source DBs allow.
thanks! i'll do more of them soon
Wow, I now love PostgresSQL. Definatly switching over from Mysql and MariaDB to PostgreSQL in my next project. Thanks.
glad it was useful 😊
Database is very sensitive to removing records from the database. So additionally to insert and fetch will be good to deleting and updating operations.
yes i'll do full crud test next time
Why did PostgreSQL disk usage go negative for a while?
i didn’t use dedicated disks for data storage and used the main os disk. when i installed mysql and postgres, the differences were 6 and 4 gigs, so i just subtracted that value from the graph before the test. i'm working on an improved version of this benchmark based on the feedback i received.
using a binary copy with pgx will allow you to insert jaw dropping numbers of rows per second.
and it only re-evaluates indexes once when you say complete.
it’s slightly slower cousin allows you to stream csv directly into or out of the psql cli client.
thanks for the feedback!
Thanks for doing this!
I would have guessed that Postgres would take more space due to MVCC. Weird.
It could have, except MVCC wasn't really used if it was just 80M individual inserts on autocommit. Until tuples are updated or deleted, or fillfactor is configured for a table, MVCC has near zero overhead costs.
(The longer answer is that ever tuple has around 16+8B of invisible metadata in table heap, another I think 26B for page header (one page by default = 8kB) and some more in the visibility map file.)
working on an improved version of this benchmark, including all the feedback i've received so far.
Do you have a video on how you setup everything from scratch, and how you collect metrics
i'll make a tutorial soon
Small follow-up question: What about PostgreSQL vs MySQL vs MariaDB? Yes, MariaDB is used a lot now (including myself).. So it would be nice to include it in the test.
yes, i'll do MariaDB, but first, i need to improve this benchmark
MongoDB vs PostgreSQL. It would be really cool to see two different tests where first mongo tries to work with relational data by running joins, and then second model the data for PostgreSQL to be used as a document database with bson.
thanks noted!
>If you have any suggestions on how I can improve the test design
There are a couple other tests I'd be interested in seeing:
1. Index efficiency
2. "WHERE IN" clause efficiency
3. Multi-join efficiency
4. Handling of paged data
5. Text operations
To handle these I might add a third table called "messages". Each message would have a customer_id (which customer sent the message) and a text field containing the message contents. Message contents should be a VARCHAR with a max length greater than your page size (e.g. for Postgres, the max length should exceed 8192).
Next I would add indexes to the "name" column of customers and the "contents" column of messages. I would not add an index to the events table for this test, as we don't want to interfere with the write throughput tests and you are writing to this table for those tests.
To test (1) - index efficiency - simply write new customers, then do COUNT queries based on the name. This will test both the index implementation (write throughput with indexing) as well as any special efficiency (e.g. if they record the cardinality for use in COUNT queries)
To test (2) - "WHERE IN" clauses - I'd write a clause that's trivially improved using this index, to see if the database can take advantage of it. Something like "SELECT COUNT(*) FROM events JOIN customer ON ... WHERE customer_id IN (SELECT id FROM customers WHERE name = ...)"
To test (3) - "multi-joins" - I would intentionally put the lowest-cardinality table first in a query, as this can often interfere with index utilization. Something like "SELECT contents FROM messages JOIN customers ON ... JOIN events ON ... WHERE action = CREATE_USER"
To test (4) - paged data - I would write and read data from the "messages" field, taking care that the reads should be indexable. This means the read query should be something like "SELECT * FROM messages WHERE contents LIKE prefix%" instead of "SELECT * FROM messages WHERE contents LIKE %postfix"
To test (5) - text operations - I would try searching for customers by the "LOWERCASE" of their name
thank you for your feedback! i'll see what i can implement in the next test from what you suggested!
Thank goodness you blurred the names on those graphs not to spoil anything! I cannot possibly figure out which is which otherwise. Their names are too similar, and neither has associated colors.
😊
What tuning did you apply for this workload? Is this on completely untuned default settings on everything for both databases?
I'm assuming mysql was using InnoDB engine? MyISAM is practically deprecated but it'd still be interesting to see this comparison using it and also vs Aria engine from mariadb
i used default settings
I think you should be careful with default configs.
InnoDB by default has very conservative configs. Good for testing but not sure its good for production.
@@AntonPutra
@@quangtung2912 Postgresql defaults are also designed “to run in a teapot”.
Thank u for this comparison, even though most of us probably already knew postgres was going to win. I'm not sure if u already did this but I'd love to see how Postgres compares to MongoDB, the two GOATs of the DB world. 😀
thanks, will do mongo soon
Hi, Test should use full capabilities of the db, therefore postgres procedures and functions should be used to reflect the maturity levels of the used databases.
To decrease the time of tests would recommend ram disk.
Would be nice to see oracle xe vs postgres, with both using db objects.
thanks for the feedback
This is not 100% clean benchmark as in Ubuntu packages these DBs has default settings optimized not for performance, but for low resource consumption. Playing with the configuration may significantly change the result.
my mistake was not paying much attention to the configuration settings. i'm collecting feedback right now and will update this benchmark
mysql uses repeatable read isolation level by default and postgres uses read committed.we might need to re evaluate this once again!The go sql client and infact postgres client would be using autocommit mode by default which is a transaction at the db level!So even for same query read or write mysql would need to do more work theoretically compared to postgres
So postgres allows more concurrency out of the box compared to mysql
@AntonPutra you might need to re evaluate this using same isolation level for both
thanks for the feedback!
Awesome helpful comparison
thank you!
I would be very interested in a non-bias comparison between FirebirdSQL and Postgress made in the same way as you did betweeN postgres and MySQL. I use FirebirdSQL on a daily bases and I didn't find any problems with it. I would like to know if changing the database in the future makes sense from some point of view. Thank you .
ok just added to my list
Gran trabajo! Gracias por todos los aportes!
thanks!
I guess I'm addicted to this channel now...
😊
I am missing some important details. Did you use the innodb storage engine and what was the setting for innodb flush at trx commit? And was a similar configuration used for postgres regarding transaction safety?
yes, i used innodb. as far as i understand, foreign keys are only supported with that storage. in this particular test, i mostly used defaults, and i have an updated video with full CRUD and configuration settings on git. please take a look. Round 2 - github.com/antonputra/tutorials/tree/main/lessons/216
What type of tabes did you create in MySql? MyISAM or InnoDB?
something funny is goin on with MySQL - the downward curve in iops while the inserts/second is increasing - implying inserts got easier as the inserts/second increased. The downward curve coincides roughly with capping out the connection count and the cpu spike at 21:50. Towards the end, the MySQL cpu also decreased, even though that was the highest QPS achieved.
i'm working on an improved version of this benchmark based on the feedback i received
Add tests for reading by secondary index. Postgres and MySQL do it in a different way.
And updates. It will trigger Postgres vacuum.
thank you for the feedback. i'll add a full CRUD test in the improved version of this benchmark
Did you do any price compairison? As PostgreSQL typically is more expensive from my hosting site
well, efficiency translates to cost. the more efficient your database or service, the fewer resources you need to run it
Seriusly loved this bench
thank you!
Can you record a video describing how do you configure monitoring setting agents configuring graphs?
yes will do in the future
What versions of the software were you testing?
The results are very interesting.
prometheus + grafana
Would be better if you could add events on the graph like added abc load etc, on the time axis. This could help understand capacity of the system
ok noted
Hey Anton, I have been watching your benchmarking series from the start and love them!
I might have an idea back on the language performace... it would be interesting to see the power of fully-fledged JVM vs something like Go or Node? Last comparison was using Java compiled to binary without the JVM.
The memory usage will be massive but I wonder if the JVM at peak performance (after warm up) can outperform other GC languages in terms on Number of Requests before failure and the Response Time.
Something like Amazon Corretto or even OpenJDK would be interesting
ok, thanks for the feedback. i'll come back to java soon
It is a really good test! Both B+Tree database and a really big difference between I.O. IMO is more related with the storage engine strategy that postgres has implemented
How about a comparison between MySQL vs. MariaDB?
noted!
Not sure, if this is a thing with the Go implementation: In Java I would use batched insert operations to speed things up as much as possible.
you can for insert, it was just a bad use case
@@AntonPutra I am not sure I understand what you mean.
I was afraid to click on this video because I thought it might disagree with my cognitive bias in favor of Postgres. I'm pleased with the affirmation. Thank you!
Nice test. I think might be tested on the join and aggregation commands.😊
thanks, working on an improved version
i really want to see MariaDB vs Postgres especially with the Aria storage engine in MariaDB
noted!
Very revealing benchmark, love your videos!
Would love to see a benchmark on nodejs ORMs
noted
I’d like to see how they compare when it comes to # of connections
not sure if i get the question. they have the same 100-connection default, which can be adjusted based on the hardware you use
@@AntonPutra I was thinking something like scaling up gradually from 10-1000+ connections, just to quantify the overhead of idle connections. It’s a useful metric imo because in serverless environments # of clients connected can scale up quickly
Great, thanks for the info. =)
my pleasure! i'm working on an improved version of this benchmark based on the feedback i received
i like the visualization you do, it's a great job. but i think the load scenario is too simple to judge the DBMS performance. pay attention to tpc-c testing and hammerdb.
i'm collecting feedback right now and will update this benchmark to include CRUD operations and some configuration changes.
Thanks for your effort!
my pleasure, working on improved verion
In the test design, beside inserts, there should be some updates, because the the way both databases handles updates can make some difference.
Interresting, thanks. I'm interested in knowing how Postgres was configured with regards to random_page_cost and shared_buffers. And did you change defaults for values like checkpoint_completion_target file synchronization related parameters?
my mistake was not paying much attention to the configuration settings. i'm collecting feedback right now and will update this benchmark.
Configurations, storage engine choice and mixed update/reads test should be considered for a more fairer test.
noted! currently working on an improved version of this benchmark, including all the feedback i've received so far.
I would love to see the comparison between Timescale and Influx.
noted!
Nice benchmark. What exactly happened with PG insert latency spike at the end 6:25?
Note this is for a rather small record size, I wonder how they will scale with a large record size.
I also wonder how they compare with more complicated join queries, compare their query plans, when they can no longer complete everything in memory.
My current hell at work now is trying to get postgres to be fast doing large queries at scale (5k req/s)
Postgres is great for small things but when you get to reasonably large sets of data running a lot of queries per second, and the queries contain joins, it's such a pain to try to trick postgres into using a sane query plan, with a mess of ctes to make it do things in the right order.
You can make it fast but it's a lot of work.
The biggest problem is one slow query (run a lot of times) can 100% the CPU on the server and take the whole thing down.
pgbouncer is also essential because postgres does one process per connection.
Would have used nosql but data is very relational with a lot of many to many relationships.
Have you considered or proposed sidestepping the problem partially with a materialized view as a cache? you probably have, but i thought i'd put it out there.
@@laundmo thanks for the suggestion! It could work but the materialised view can get stale, refreshing it can be expensive and a lot of the view would never be read, although honestly I haven't used them that much so it might be a skill issue.
Usually just having good indexes solves the problem, but sometimes postgres comes up with crazy query plans that don't use the indexes you expect, and then does big nested loops etc, so you have to trick it into executing in the way you expect.
I had one query where it needed to look up a row from table A by its primary key, then join with table B, but despite there being indexes it decided to read all the data from table B using an unexpected index then join it on all the data from table A, and then filter it down to almost nothing. In that case I fixed it by rewriting the query to use a CTE based on table A before joining with table B, and then it executed it in the order I was expecting and used the correct indexes.
When it does work, it's really impressive what it can do in less than a millisecond though.
thanks for your feedback!
It would be nice to compare SAP HANA (Database from SAP) vs Postgres.
thanks! noted!
I never thought there's that huge difference between PostgreSQL and MySQL
i’m working on an improved version of this benchmark based on the feedback i received
3:50 How could it be negative?
@AntonPutra could you upload the ddl for both mysql and postgres into the github repo? I did a similar benchmark on my own about 6 months ago, but focused on larger batches of operations and our results are wildly different.
yes will do for the next improved test
a compariso between db2, oracle and mssql would be fun. with InsertOrUpdate joins and stuff like that. stored procedures and so on. would be nice
noted!
what i really want to know is where did u get those grafana graphs 👀🤔
sure, i used the golang client available on my public github and visualized the metrics in grafana
Deno 2.0 has released now. Please compare bun and deno 2.0
yes coming in a week
It would be interesting to see how postgres performs with PGX and using the bulk insert techniques that posrgres has instead of the normal sql api.
Спасибо за видео!
poszalusta!
The problem with testing databases is that you really really have to understand their configurations. What if one is slower, because it's configured to flush data to disk more often, so it loses less data when the server goes down?
You're basically testing default configurations, not actually databases.
Another problem is how little data you've inserted. It probably fits into RAM. So again, not a realistic test. You have to insert gigabytes to make it meaningful.
thanks, i'm working on an improved version of this benchmark based on the feedback i received
🔴Actively looking for feedback on settings, test design, etc.
Nice work.
It would be nice to know the versions used and a version that use the pgx driver. Also I wonder, how would they compare used with Vitess (yet another Go pearl) for MySQL and Citus for PG (and maybe TimeScaleDB and ParadeDb extensions as well).
It is good the source code is in Go, I think it is allowed to get the source and use maybe even to modify it a bit (for pgx for example)?
Hi, next time (sorry, I am not that fluent with go, so I cannot create PR) can you do 20% writes + 80% reads on the same run? So we can see how good the process is when load is mixed? rand(0,1) more than 0.8 = write a row, increase max id, else do a read rand(1, max_id). In this way it will sometimes do more writes and sometimes more reads and it will be more distributed. Also: could you test using auto increment/identity vs library like snowflake and using external id? how does auto increment mechanism slows down writes. And all that in mysql vs pg vs maria ;) Please :)
thank you for the feedback, i'll see what i can do
I want to know about PHP versions performance. Thank for your videos!
thanks, will do php soon
Be good to know if this is Mariadb, regular oracle mysql or percona MySQL. All makes a difference. Great job on these videos !
thank you!
Nice benchmarks and nice video. For me the performance aspect isn’t the absolute most important one. Ergonomics matter more. Keep it coming though 🙏🏻
thank you!
pls scylladb vs mongo vs postgre
noted!
Can you also show the percentage per "request per second", or simply change the x axis from time to req/s and average over time the values. It would make result easier to interpret.
I would love to learn how to setup this kind of tracking for my test apps. I’m assuming this is Prometheus and Grafana? Would love a tutorial video on these!
yes, prom + grafana in k8s. i'll make a tutorial soon
@@AntonPutra I would love to learn it as well to setup this on k8s as I am beginner starting to learn k8s
Would love to see surreal db vs PostgreSQL
ok added