Keep it up! What might be an useful video is explanation how to setup a Datalake house vs traditional SQL server, using mostly SQL Serverless pool. I know you done some ETL viddeos in the past, but Im still missing an end to end video. That means coming from source, bronze, silver to gold layer/PowerBI. Also a big one is, how do you deal with deletes in this architecture.
One other annoying behaviour I observed between Apache Spark Pools and Serverless SQL Pools relates to the shared metadata table schema, specifically to the string data types. If I create a table based on Delta files within a Lake DB and define a varchar column which has a length of 100 characters, when I explore that same table within the Serverless SQL Pool the varchar length is lost and is rounded up to a length of 8000. This is very inconvenient as we'd have to create views within Serverless that redfine the proper length. This kind of defeats the purpose of having a Lake Database in the first place, at least up to a certain extent. Hope MS will rectify this in the near future. Mind you, this is not the case when the Lake DB table is based on Parquet files, in which case the varchar length is persisted in the shared metadata between both pools.
Keep it up! What might be an useful video is explanation how to setup a Datalake house vs traditional SQL server, using mostly SQL Serverless pool. I know you done some ETL viddeos in the past, but Im still missing an end to end video. That means coming from source, bronze, silver to gold layer/PowerBI. Also a big one is, how do you deal with deletes in this architecture.
Nice idea, might be a longer video though and definitely something I'd do
@@DatahaiBI would be very appreciated 👍
Thank you Andy. Great video as always.
What I don’t understand is why you would create the lake database in the Serverless pools however (ie not in the spark notebook.
Love your videos btw!
Great video. Thanks.
Hi,
dataflow - adding new column to existing delta parquet sink with upsert gives error. Can someone please help on this?
One other annoying behaviour I observed between Apache Spark Pools and Serverless SQL Pools relates to the shared metadata table schema, specifically to the string data types. If I create a table based on Delta files within a Lake DB and define a varchar column which has a length of 100 characters, when I explore that same table within the Serverless SQL Pool the varchar length is lost and is rounded up to a length of 8000. This is very inconvenient as we'd have to create views within Serverless that redfine the proper length. This kind of defeats the purpose of having a Lake Database in the first place, at least up to a certain extent. Hope MS will rectify this in the near future. Mind you, this is not the case when the Lake DB table is based on Parquet files, in which case the varchar length is persisted in the shared metadata between both pools.
Good point, thanks. If I had my way I'd just have a single database type for both spark and serverless, keep everything consistent