Hey @ArjanCodes, can you create video series in python instrumentation for observability i.e. metrics, log and trace at (application level, container &pod level and inter microservice ) I love watching your video
THX, Arjan. I know several of these databases. My first code written as part of a job was way back in 1985, it's been a while. I remember Oracle 5 struggling on PC server to join 4 tables containing almost no rows. I remember myself proposing relational DB will be inherently slow and die off quickly before ever making market impact. Technology advances, the world changes, and we all learn. I enjoyed your quick tour de chambre of databases. Good way to expand everybody's view on what a database really is = a technology to efficiently store and access data. Keep up your good work!
I think a good set of videos is to start with the topic. "geospatial databases" and then talk about the "geospatial" features in each database. (i.e. Redis, Tile38, PostgreSQL, and even DynamoDB with an extension) and then compare databases against each other, to help us decide at what point so we use a generalist database (Redis / PostgreSQL) with a geospatial feature, versus getting a specialized database like Tile38. I mention geo-spatial since that is my biggest need, but a network database is right behind that.
Your contents are awesome! Please Arjan, make a large scale real life program with Python. This could include database processes, file operations, performance improvement, computing and web. I will write this comment every video of you I watch :). Greetings
Spills to disk very well when you have bigger than memory data, not a strength of Polars. You can use all sorts of different languages with it, not just Python. Lots of people know SQL. It is a db with db features like constraints and indexes.
Influx DB looks VERY INTERESTING! We use RRD for this function and it has the most awful, clunky API you can possibly imagine. I think learning Flux Query Language would be easy-peasy-lemon-squeezy compared to navigating the tortuous documentation of RRD. :)
I would love to see a video about non-typical SQLite use cases. It's so flexible and lightweight and I feel like people are sleeping on it just because it's not for a client/server role. I started using it as a local K:V store because I didn't wanna bother with something like redis, and I'm quite impressed.
Nice video Arjan. I think session management with openAI is already implemented through the newish OpenAI Assistants API. Just use the same assistant with the same thread ID, and enjoy your key value store!
I don't like the implicit nature of duckDB. Constantly grabs objects that exist in a local scope. Polars on the other hand is much more stable because it is very explicit. I have had to fix data scientist's code many times because they didnt realise secondary effects of many duck db operations. Also duck db absolutely messes up the linter and static type checking tools.
@ArjanCodes, man thank you so much for these contents you upload for us, very helpful, well described, and when you explain things, you make them look very easy, please keep up the amazing work
Thanks for this Video. I always like content that makes you reflect about architecture decisions. Another Database that seems interesting to me is ArangoDB
@ArjanCodes - Would you mind exploring Mojo more, for those who are looking to harness the power and speed it can provide for Python users? There are many topics related like ownership, life cycles, traits, and pointers which are foreign concepts to many of us.
I don't understand why people say duckdb is cool ... feels just like sqlite but with the flexibility to work directly over dataframes or files ... but why would i use that instead of just loading the files with some specialized dataframe package like pandas, polars or vaex? It would be cool to see a video on it!
Can be quicker to than Polars and definitely is quicker than pandas. It is really useful when you work with team that are sql heavy/mixed and where there is a lot of legacy sql code to integrate. It's also lighter to setup (I sometime just use the cli or the exe). You can also take creative approach to your pipeline and apply the transformation that are clearer in sql using DuckDB and then continue using your dataframe package. I'm not saying it's a good idea but I did it for a few transformation and it worked really well. I feel like for some bigger than ram dataset it can be better than Polars and also is more mature for the moment if that makes sense. I also find that the "ergonomics" of DuckDB is really where it shine:DuckDB is the easiest way to use sql from python IMO not saying that other tools are difficult but DuckDB is dead simple.
Spills to disk very well when you have bigger than memory data, not a strength of Polars. You can use all sorts of different languages with it, not just Python. Lots of people know SQL. It is a db with db features like constraints and indexes rather than another dataframe lib.
I am using in prod right now as the key piece in a data lakehouse architecture for analytics. It’s soooo nice to have a one stop shop for writing SQL queries that pull from parquet, csv, a few live databases, with zero friction. And it’s super fast for analytical queries on medium sized data. You could do this with an ORM on top of a bunch of Python connectors and leverage polars or whatnot too, but it just feels simple clean and fast to have it in duckdb
I have never used guess it time to give it a try, can we get your views on using typesense in python projects using fastapi or postgres full text search.
I have a project that coukd benefit from duckdb i think, data isnt important enough for long term storage, but good to see at a glance as a technician or team of technicians. Perfect
RE: Rediculous DBs - did you know Python has a built-in DB? No, not SQLite! It's called dbm. It's not even relational - it can just store dicts for you! 😂
These days, Postgres is very very good. You need a good reason not to use it. It is free, mature, scales, has good IDE support, good python support, extensions for everything, and great Docker packages. And if you want third-party support, it is easy to find at every level.
Was that an official endorsement of hitting interns with mechanical keyboards??! Watch out, you'll get cancelled with talk like that! All joking apart, this was very timely and useful information for me. Thanks!
✅ Get the FREE Software Architecture Checklist, a guide for building robust, scalable software systems: arjan.codes/checklist.
Hey @ArjanCodes, can you create video series in python instrumentation for observability i.e. metrics, log and trace at (application level, container &pod level and inter microservice )
I love watching your video
I'd love to see a deeper dive on DuckDB!
Got it 😊
Me too!
agreed
same
Same here.
THX, Arjan. I know several of these databases. My first code written as part of a job was way back in 1985, it's been a while.
I remember Oracle 5 struggling on PC server to join 4 tables containing almost no rows.
I remember myself proposing relational DB will be inherently slow and die off quickly before ever making market impact.
Technology advances, the world changes, and we all learn.
I enjoyed your quick tour de chambre of databases. Good way to expand everybody's view on what a database really is = a technology to efficiently store and access data.
Keep up your good work!
I’m convinced… mission critical ChatGPT data storage, here I come!
I too ✨
We have to make sure all those shiny Nvidia cards are put into good use!
I know python because of you Corey. Thanks
You're going to need better than that to convince me not to use Postgres.
Boring is good.
Came to say this 🫡
This. Just about everyone can use Postgres and MySQL.
If it ain't broke, don't fix it. But there are uses for these specialized db's.
@@TheEvertw but the selling is wrong, you don't use them because they are cool.
Yep this, boring is generally stable and reliable. So keep your sanity
As a note. "J" is pronounced as "jay". So I would think Neo4j is pronounced, neo-four-jay. The letter "G" is pronounced as "gee"
Differs in other languages ;)
I learned a few months ago that they are exactly the opposite way around in French. 🤷♂️
C is redundant in American and English, soft C is an "ess" aka S, hard C is "kay" aka K
I think a good set of videos is to start with the topic. "geospatial databases" and then talk about the "geospatial" features in each database. (i.e. Redis, Tile38, PostgreSQL, and even DynamoDB with an extension) and then compare databases against each other, to help us decide at what point so we use a generalist database (Redis / PostgreSQL) with a geospatial feature, versus getting a specialized database like Tile38.
I mention geo-spatial since that is my biggest need, but a network database is right behind that.
and even in MS SQL Server!
Your contents are awesome! Please Arjan, make a large scale real life program with Python. This could include database processes, file operations, performance improvement, computing and web. I will write this comment every video of you I watch :). Greetings
Excellent.. You are always to the point, which I like most...
Yes to DuckDB, but for me its about how does it differ from what can be done with Polars.
Spills to disk very well when you have bigger than memory data, not a strength of Polars. You can use all sorts of different languages with it, not just Python. Lots of people know SQL. It is a db with db features like constraints and indexes.
SQL and joins across data sources
Great Video Thank you ! i am using PostgreSQL (with GIS extension) and Redis for cache. I d love to see comparison DuckDB vs SQL based
18:29 no, only MongoDB Atlas supports vector similarity search
There seems to be some issue with signing up for your newsletter and guides. I tried a few times and it is not working. Can anyone else confirm?
Influx DB looks VERY INTERESTING! We use RRD for this function and it has the most awful, clunky API you can possibly imagine. I think learning Flux Query Language would be easy-peasy-lemon-squeezy compared to navigating the tortuous documentation of RRD. :)
I would love to see a video about non-typical SQLite use cases. It's so flexible and lightweight and I feel like people are sleeping on it just because it's not for a client/server role. I started using it as a local K:V store because I didn't wanna bother with something like redis, and I'm quite impressed.
Using SQLite for caching is a really great use case!
what extension do you use for Python in vsc?
Nice video Arjan. I think session management with openAI is already implemented through the newish OpenAI Assistants API. Just use the same assistant with the same thread ID, and enjoy your key value store!
It’s on!
I would be interested in a deeper dive on DuckDB.
I'm very interested in more DuckDB content.
More about DuckDB - maybe a DuckDB vs Polars video? It feels their features heavily overlap, but I'm not sure.
I don't like the implicit nature of duckDB. Constantly grabs objects that exist in a local scope. Polars on the other hand is much more stable because it is very explicit. I have had to fix data scientist's code many times because they didnt realise secondary effects of many duck db operations. Also duck db absolutely messes up the linter and static type checking tools.
PostGIS vs Tile38
can i check benchmark about that?
Is there something like gdbm or newer available? Focus is on newer.
I can't wait for going deeper into the duckdb
@ArjanCodes, man thank you so much for these contents you upload for us, very helpful, well described, and when you explain things, you make them look very easy, please keep up the amazing work
Thanks for this Video. I always like content that makes you reflect about architecture decisions. Another Database that seems interesting to me is ArangoDB
@ArjanCodes - Would you mind exploring Mojo more, for those who are looking to harness the power and speed it can provide for Python users? There are many topics related like ownership, life cycles, traits, and pointers which are foreign concepts to many of us.
Hope you will soon prepare a tutorial on uv package manager
Thanks Arjan Influxdb was exactly what I was looking for my testing analytics… great episode 👏👏👏
I don't understand why people say duckdb is cool ... feels just like sqlite but with the flexibility to work directly over dataframes or files ... but why would i use that instead of just loading the files with some specialized dataframe package like pandas, polars or vaex?
It would be cool to see a video on it!
Can be quicker to than Polars and definitely is quicker than pandas.
It is really useful when you work with team that are sql heavy/mixed and where there is a lot of legacy sql code to integrate.
It's also lighter to setup (I sometime just use the cli or the exe).
You can also take creative approach to your pipeline and apply the transformation that are clearer in sql using DuckDB and then continue using your dataframe package. I'm not saying it's a good idea but I did it for a few transformation and it worked really well.
I feel like for some bigger than ram dataset it can be better than Polars and also is more mature for the moment if that makes sense.
I also find that the "ergonomics" of DuckDB is really where it shine:DuckDB is the easiest way to use sql from python IMO not saying that other tools are difficult but DuckDB is dead simple.
Spills to disk very well when you have bigger than memory data, not a strength of Polars. You can use all sorts of different languages with it, not just Python. Lots of people know SQL. It is a db with db features like constraints and indexes rather than another dataframe lib.
Cool! Thanks for the answers! Will take a closer look at it.
I am using in prod right now as the key piece in a data lakehouse architecture for analytics. It’s soooo nice to have a one stop shop for writing SQL queries that pull from parquet, csv, a few live databases, with zero friction. And it’s super fast for analytical queries on medium sized data.
You could do this with an ORM on top of a bunch of Python connectors and leverage polars or whatnot too, but it just feels simple clean and fast to have it in duckdb
I have never used guess it time to give it a try, can we get your views on using typesense in python projects using fastapi or postgres full text search.
Thank you for this very useful video!
Glad you enjoyed it!
I have a project that coukd benefit from duckdb i think, data isnt important enough for long term storage, but good to see at a glance as a technician or team of technicians. Perfect
Fantastic video, both educational and wise
Glad you liked it!
What about Clickhouse?
Postgres x TimeScaleDB vs Influx ?
Why Redis?! I have it already replaced with KeyDB.
BTW, Postgresql is getting a vector engine too...
curious why not Valkey?
Loving duckdb for the simplicity of SQL based analytics on heterogeneous data sources
What about PocketDB?
Looking at the speed increase of SSD esp. over the last four years... consider using a DB at all.
duckdb is one of my new favorites. it takes the best of data frames and sql and mashes it together. Its awesome.
Please create series on duckdb.
What about ArangoDB? a hybrid DB, RDBMS+Graph,Document, better than Neo4j IMHO. There is also one more interesting UnrealDB
Neo4j is just fun to use.
What about DNS as a database!
I guess Postgres can do most of these tasks using extensions of 😅
DuckDB video +1 please
OrientDB is also very interesting
Thanks
Thank you so much!
CockroachDB is also interesting thing to check :)
Neo4j has vector support.
I'll watch your duckdb video when done
Flux is being deprecated for influxdb 3 fyi
True, but I wanted to stick to open source here, and that is still on version 2.
DuckDB is sooooo good
DuckDB is really very useful.
RocksDB?
me hearing Arjan pronouncing Milvus as Milfus:
What about xml databases?
RE: Rediculous DBs - did you know Python has a built-in DB? No, not SQLite! It's called dbm. It's not even relational - it can just store dicts for you! 😂
The person who invented DuckDB is a quack. 🦆
Did he really call it “Readis?”
Yes. I was referring to only half of the database interface. The other half is called Writis.
Duckdb, pocketdb
These days, Postgres is very very good. You need a good reason not to use it. It is free, mature, scales, has good IDE support, good python support, extensions for everything, and great Docker packages. And if you want third-party support, it is easy to find at every level.
Keep it simple... CSV? 🤣
sqlite is all you need
I'll stick with "boring" postgres
love your boozy demoes
"new query language like flux"
- naaaaah... NO!
Database education and Making Interns cry... LMAO!
Don't use any of these. Just use Postgres
WTF? Who cares what's boring for you guys if they do their work well...
Was that an official endorsement of hitting interns with mechanical keyboards??! Watch out, you'll get cancelled with talk like that!
All joking apart, this was very timely and useful information for me. Thanks!
Boring is not a valid argument.