Hey @ArjanCodes, can you create video series in python instrumentation for observability i.e. metrics, log and trace at (application level, container &pod level and inter microservice ) I love watching your video
THX, Arjan. I know several of these databases. My first code written as part of a job was way back in 1985, it's been a while. I remember Oracle 5 struggling on PC server to join 4 tables containing almost no rows. I remember myself proposing relational DB will be inherently slow and die off quickly before ever making market impact. Technology advances, the world changes, and we all learn. I enjoyed your quick tour de chambre of databases. Good way to expand everybody's view on what a database really is = a technology to efficiently store and access data. Keep up your good work!
Your contents are awesome! Please Arjan, make a large scale real life program with Python. This could include database processes, file operations, performance improvement, computing and web. I will write this comment every video of you I watch :). Greetings
I think a good set of videos is to start with the topic. "geospatial databases" and then talk about the "geospatial" features in each database. (i.e. Redis, Tile38, PostgreSQL, and even DynamoDB with an extension) and then compare databases against each other, to help us decide at what point so we use a generalist database (Redis / PostgreSQL) with a geospatial feature, versus getting a specialized database like Tile38. I mention geo-spatial since that is my biggest need, but a network database is right behind that.
Influx DB looks VERY INTERESTING! We use RRD for this function and it has the most awful, clunky API you can possibly imagine. I think learning Flux Query Language would be easy-peasy-lemon-squeezy compared to navigating the tortuous documentation of RRD. :)
Spills to disk very well when you have bigger than memory data, not a strength of Polars. You can use all sorts of different languages with it, not just Python. Lots of people know SQL. It is a db with db features like constraints and indexes.
Nice video Arjan. I think session management with openAI is already implemented through the newish OpenAI Assistants API. Just use the same assistant with the same thread ID, and enjoy your key value store!
I would love to see a video about non-typical SQLite use cases. It's so flexible and lightweight and I feel like people are sleeping on it just because it's not for a client/server role. I started using it as a local K:V store because I didn't wanna bother with something like redis, and I'm quite impressed.
@ArjanCodes, man thank you so much for these contents you upload for us, very helpful, well described, and when you explain things, you make them look very easy, please keep up the amazing work
Thanks for this Video. I always like content that makes you reflect about architecture decisions. Another Database that seems interesting to me is ArangoDB
I don't like the implicit nature of duckDB. Constantly grabs objects that exist in a local scope. Polars on the other hand is much more stable because it is very explicit. I have had to fix data scientist's code many times because they didnt realise secondary effects of many duck db operations. Also duck db absolutely messes up the linter and static type checking tools.
@ArjanCodes - Would you mind exploring Mojo more, for those who are looking to harness the power and speed it can provide for Python users? There are many topics related like ownership, life cycles, traits, and pointers which are foreign concepts to many of us.
I have a project that coukd benefit from duckdb i think, data isnt important enough for long term storage, but good to see at a glance as a technician or team of technicians. Perfect
I don't understand why people say duckdb is cool ... feels just like sqlite but with the flexibility to work directly over dataframes or files ... but why would i use that instead of just loading the files with some specialized dataframe package like pandas, polars or vaex? It would be cool to see a video on it!
Can be quicker to than Polars and definitely is quicker than pandas. It is really useful when you work with team that are sql heavy/mixed and where there is a lot of legacy sql code to integrate. It's also lighter to setup (I sometime just use the cli or the exe). You can also take creative approach to your pipeline and apply the transformation that are clearer in sql using DuckDB and then continue using your dataframe package. I'm not saying it's a good idea but I did it for a few transformation and it worked really well. I feel like for some bigger than ram dataset it can be better than Polars and also is more mature for the moment if that makes sense. I also find that the "ergonomics" of DuckDB is really where it shine:DuckDB is the easiest way to use sql from python IMO not saying that other tools are difficult but DuckDB is dead simple.
Spills to disk very well when you have bigger than memory data, not a strength of Polars. You can use all sorts of different languages with it, not just Python. Lots of people know SQL. It is a db with db features like constraints and indexes rather than another dataframe lib.
I am using in prod right now as the key piece in a data lakehouse architecture for analytics. It’s soooo nice to have a one stop shop for writing SQL queries that pull from parquet, csv, a few live databases, with zero friction. And it’s super fast for analytical queries on medium sized data. You could do this with an ORM on top of a bunch of Python connectors and leverage polars or whatnot too, but it just feels simple clean and fast to have it in duckdb
I have never used guess it time to give it a try, can we get your views on using typesense in python projects using fastapi or postgres full text search.
These days, Postgres is very very good. You need a good reason not to use it. It is free, mature, scales, has good IDE support, good python support, extensions for everything, and great Docker packages. And if you want third-party support, it is easy to find at every level.
RE: Rediculous DBs - did you know Python has a built-in DB? No, not SQLite! It's called dbm. It's not even relational - it can just store dicts for you! 😂
Was that an official endorsement of hitting interns with mechanical keyboards??! Watch out, you'll get cancelled with talk like that! All joking apart, this was very timely and useful information for me. Thanks!
✅ Get the FREE Software Architecture Checklist, a guide for building robust, scalable software systems: arjan.codes/checklist.
Hey @ArjanCodes, can you create video series in python instrumentation for observability i.e. metrics, log and trace at (application level, container &pod level and inter microservice )
I love watching your video
Boring is good.
Came to say this 🫡
This. Just about everyone can use Postgres and MySQL.
If it ain't broke, don't fix it. But there are uses for these specialized db's.
@@TheEvertw but the selling is wrong, you don't use them because they are cool.
Yep this, boring is generally stable and reliable. So keep your sanity
I'd love to see a deeper dive on DuckDB!
Got it 😊
Me too!
agreed
same
Same here.
THX, Arjan. I know several of these databases. My first code written as part of a job was way back in 1985, it's been a while.
I remember Oracle 5 struggling on PC server to join 4 tables containing almost no rows.
I remember myself proposing relational DB will be inherently slow and die off quickly before ever making market impact.
Technology advances, the world changes, and we all learn.
I enjoyed your quick tour de chambre of databases. Good way to expand everybody's view on what a database really is = a technology to efficiently store and access data.
Keep up your good work!
I’m convinced… mission critical ChatGPT data storage, here I come!
I too ✨
We have to make sure all those shiny Nvidia cards are put into good use!
I know python because of you Corey. Thanks
You're going to need better than that to convince me not to use Postgres.
Your contents are awesome! Please Arjan, make a large scale real life program with Python. This could include database processes, file operations, performance improvement, computing and web. I will write this comment every video of you I watch :). Greetings
I think a good set of videos is to start with the topic. "geospatial databases" and then talk about the "geospatial" features in each database. (i.e. Redis, Tile38, PostgreSQL, and even DynamoDB with an extension) and then compare databases against each other, to help us decide at what point so we use a generalist database (Redis / PostgreSQL) with a geospatial feature, versus getting a specialized database like Tile38.
I mention geo-spatial since that is my biggest need, but a network database is right behind that.
and even in MS SQL Server!
As a note. "J" is pronounced as "jay". So I would think Neo4j is pronounced, neo-four-jay. The letter "G" is pronounced as "gee"
Differs in other languages ;)
I learned a few months ago that they are exactly the opposite way around in French. 🤷♂️
C is redundant in American and English, soft C is an "ess" aka S, hard C is "kay" aka K
Influx DB looks VERY INTERESTING! We use RRD for this function and it has the most awful, clunky API you can possibly imagine. I think learning Flux Query Language would be easy-peasy-lemon-squeezy compared to navigating the tortuous documentation of RRD. :)
Excellent.. You are always to the point, which I like most...
I'm very interested in more DuckDB content.
Great Video Thank you ! i am using PostgreSQL (with GIS extension) and Redis for cache. I d love to see comparison DuckDB vs SQL based
I can't wait for going deeper into the duckdb
Yes to DuckDB, but for me its about how does it differ from what can be done with Polars.
Spills to disk very well when you have bigger than memory data, not a strength of Polars. You can use all sorts of different languages with it, not just Python. Lots of people know SQL. It is a db with db features like constraints and indexes.
SQL and joins across data sources
Nice video Arjan. I think session management with openAI is already implemented through the newish OpenAI Assistants API. Just use the same assistant with the same thread ID, and enjoy your key value store!
It’s on!
I would love to see a video about non-typical SQLite use cases. It's so flexible and lightweight and I feel like people are sleeping on it just because it's not for a client/server role. I started using it as a local K:V store because I didn't wanna bother with something like redis, and I'm quite impressed.
Using SQLite for caching is a really great use case!
I would be interested in a deeper dive on DuckDB.
@ArjanCodes, man thank you so much for these contents you upload for us, very helpful, well described, and when you explain things, you make them look very easy, please keep up the amazing work
Thanks for this Video. I always like content that makes you reflect about architecture decisions. Another Database that seems interesting to me is ArangoDB
More about DuckDB - maybe a DuckDB vs Polars video? It feels their features heavily overlap, but I'm not sure.
I don't like the implicit nature of duckDB. Constantly grabs objects that exist in a local scope. Polars on the other hand is much more stable because it is very explicit. I have had to fix data scientist's code many times because they didnt realise secondary effects of many duck db operations. Also duck db absolutely messes up the linter and static type checking tools.
@ArjanCodes - Would you mind exploring Mojo more, for those who are looking to harness the power and speed it can provide for Python users? There are many topics related like ownership, life cycles, traits, and pointers which are foreign concepts to many of us.
I have a project that coukd benefit from duckdb i think, data isnt important enough for long term storage, but good to see at a glance as a technician or team of technicians. Perfect
Thanks Arjan Influxdb was exactly what I was looking for my testing analytics… great episode 👏👏👏
Loving duckdb for the simplicity of SQL based analytics on heterogeneous data sources
Hope you will soon prepare a tutorial on uv package manager
There seems to be some issue with signing up for your newsletter and guides. I tried a few times and it is not working. Can anyone else confirm?
duckdb is one of my new favorites. it takes the best of data frames and sql and mashes it together. Its awesome.
Fantastic video, both educational and wise
Glad you liked it!
18:29 no, only MongoDB Atlas supports vector similarity search
Neo4j is just fun to use.
Looking at the speed increase of SSD esp. over the last four years... consider using a DB at all.
what extension do you use for Python in vsc?
PostGIS vs Tile38
can i check benchmark about that?
Is there something like gdbm or newer available? Focus is on newer.
DuckDB is sooooo good
Thank you for this very useful video!
Glad you enjoyed it!
I don't understand why people say duckdb is cool ... feels just like sqlite but with the flexibility to work directly over dataframes or files ... but why would i use that instead of just loading the files with some specialized dataframe package like pandas, polars or vaex?
It would be cool to see a video on it!
Can be quicker to than Polars and definitely is quicker than pandas.
It is really useful when you work with team that are sql heavy/mixed and where there is a lot of legacy sql code to integrate.
It's also lighter to setup (I sometime just use the cli or the exe).
You can also take creative approach to your pipeline and apply the transformation that are clearer in sql using DuckDB and then continue using your dataframe package. I'm not saying it's a good idea but I did it for a few transformation and it worked really well.
I feel like for some bigger than ram dataset it can be better than Polars and also is more mature for the moment if that makes sense.
I also find that the "ergonomics" of DuckDB is really where it shine:DuckDB is the easiest way to use sql from python IMO not saying that other tools are difficult but DuckDB is dead simple.
Spills to disk very well when you have bigger than memory data, not a strength of Polars. You can use all sorts of different languages with it, not just Python. Lots of people know SQL. It is a db with db features like constraints and indexes rather than another dataframe lib.
Cool! Thanks for the answers! Will take a closer look at it.
I am using in prod right now as the key piece in a data lakehouse architecture for analytics. It’s soooo nice to have a one stop shop for writing SQL queries that pull from parquet, csv, a few live databases, with zero friction. And it’s super fast for analytical queries on medium sized data.
You could do this with an ORM on top of a bunch of Python connectors and leverage polars or whatnot too, but it just feels simple clean and fast to have it in duckdb
CockroachDB is also interesting thing to check :)
I have never used guess it time to give it a try, can we get your views on using typesense in python projects using fastapi or postgres full text search.
DuckDB is really very useful.
OrientDB is also very interesting
What about ArangoDB? a hybrid DB, RDBMS+Graph,Document, better than Neo4j IMHO. There is also one more interesting UnrealDB
I'll watch your duckdb video when done
Please create series on duckdb.
I guess Postgres can do most of these tasks using extensions of 😅
What about PocketDB?
Postgres x TimeScaleDB vs Influx ?
What about DNS as a database!
sqlite is all you need
me hearing Arjan pronouncing Milvus as Milfus:
What about Clickhouse?
Thanks
Thank you so much!
Neo4j has vector support.
Duckdb, pocketdb
Why Redis?! I have it already replaced with KeyDB.
BTW, Postgresql is getting a vector engine too...
curious why not Valkey?
These days, Postgres is very very good. You need a good reason not to use it. It is free, mature, scales, has good IDE support, good python support, extensions for everything, and great Docker packages. And if you want third-party support, it is easy to find at every level.
RE: Rediculous DBs - did you know Python has a built-in DB? No, not SQLite! It's called dbm. It's not even relational - it can just store dicts for you! 😂
The person who invented DuckDB is a quack. 🦆
I'll stick with "boring" postgres
RocksDB?
Keep it simple... CSV? 🤣
Flux is being deprecated for influxdb 3 fyi
True, but I wanted to stick to open source here, and that is still on version 2.
"new query language like flux"
- naaaaah... NO!
Don't use any of these. Just use Postgres
Did he really call it “Readis?”
Yes. I was referring to only half of the database interface. The other half is called Writis.
WTF? Who cares what's boring for you guys if they do their work well...
love your boozy demoes
What about xml databases?
Was that an official endorsement of hitting interns with mechanical keyboards??! Watch out, you'll get cancelled with talk like that!
All joking apart, this was very timely and useful information for me. Thanks!
Database education and Making Interns cry... LMAO!
Boring is not a valid argument.