This is a really helpful video. I am really early in my career and have been getting carried away really with learning big data techniques when my company rearely needs to build models with > 100gb of data. I have worried that scaling up is bad practice but this talk raised some interesting issues with that way of thinking. Thanks!
So when you…actually work with production data, big data doesn’t just die simply because you wanna make it so. When you load any data locally faster. This isn’t a duckdb thing it’s a server thing… Just like when load SQL or pandas locally. It’s not SQL itself that’s slow, it’s connecting to it. Because it’s a static API inferences through a driver vs a URL. Like if duckdb even had the option to connect to it remotely, it would become the next “thing that’s dead” apparently.
The correct analogy would be DuckDB querying data residing in a Data Lake. Also a DuckDB in a k8s pod on a remote cluster could be queried quite easily, and you can call it remotely accessible. Just a different way of doing things.
I mean he founded a company emerged from the idea he has been believing/growing for years. this felt like hearing his motives and idea behind the product
well said. the idea that the tools are bigger than the data they are processing is not new. eg: see "Scalability, at what cost!" by Mcsherry etc (2015).
To a cynical mind with no motivation other than to criticize, sure-fair assessment. For someone working in this domain and has problems to solve it is insightful and helpful and dare i say entertaining. Weird huh?
This is a really helpful video. I am really early in my career and have been getting carried away really with learning big data techniques when my company rearely needs to build models with > 100gb of data. I have worried that scaling up is bad practice but this talk raised some interesting issues with that way of thinking. Thanks!
I love the name mother duck. I feel it’s a respectful tribute to the female source of life and code.
So when you…actually work with production data, big data doesn’t just die simply because you wanna make it so.
When you load any data locally faster. This isn’t a duckdb thing it’s a server thing…
Just like when load SQL or pandas locally. It’s not SQL itself that’s slow, it’s connecting to it. Because it’s a static API inferences through a driver vs a URL. Like if duckdb even had the option to connect to it remotely, it would become the next “thing that’s dead” apparently.
The correct analogy would be DuckDB querying data residing in a Data Lake.
Also a DuckDB in a k8s pod on a remote cluster could be queried quite easily, and you can call it remotely accessible. Just a different way of doing things.
Great technology
Copying data on laptops is a privacy nightmare!! Because of GDPR you don't want to do this...
DuckDB can query data residing in a Data Lake. None of the data has to be on your laptop.
this felt like a product ad with extra steps...
I mean he founded a company emerged from the idea he has been believing/growing for years. this felt like hearing his motives and idea behind the product
well said. the idea that the tools are bigger than the data they are processing is not new. eg: see "Scalability, at what cost!" by Mcsherry etc (2015).
To a cynical mind with no motivation other than to criticize, sure-fair assessment.
For someone working in this domain and has problems to solve it is insightful and helpful and dare i say entertaining.
Weird huh?