Thanks so much @leslie-welch for bringing up the single-record bridge table, this looks like an absolute lifesaver for models with multiple m:m relationships! Any chance you could say a few more words about this approach: is the dim table here somewhat similar to a straightforward unpivot (i.e. "ListID1, PathID1, PathnameABC", "ListID1, PathID2, PathnameDEF", "ListID2, PathID1, PathnameABC", "ListID2, PathID3, PathnameXYZ", etc etc), meaning that there's a certain amount of row-level replication for PathID/Pathname; are those never featured in a unique-value DIM of their own?
What about multiple fact scenarios like sales orders and invoices. Parts of a sales order can be dispatched and invoices separately. Even sales order lines can be split with second dispatch for back-ordered quantity. There is a need to link sales order lines to invoice lines.
What is the relationship between the two? Is it always one or more invoices per sales order, or are there also sometimes multiple sales orders associated with one invoice?
@@coolblue5929 In this scenario I would look at incorporating the sales order fact into the invoice fact. It will of course create some replication at the row level of sales data, but if you can end up with a fact that is mainly dates and integers, and push as much of the text into dimensions, your DAX will be simpler and overall the report should be more performant.
Leslie mentioned generating a bridge table column from a CRC32 (presumably using the CHECKSUM() function). Is there a specific reason to use that function instead of HASHBYTES()?
@JBartlett_DMU Not that I am aware of, but I will ask the engineering team! I'll circle back and let you know what they say. The actual SQL function we are using in Databricks is CRC32() and it is wrapped around some array functions to get a distinct list and sort that distinct list. Things brings us down to the smallest number of records needed to meet the ListID need for our many to many relationships.
@@JBartlett_DMU gotcha. My understanding is that Databricks is primarily based on Spark SQL with some additional functions specific to Delta tables. I haven't worked with T-SQL to my knowledge so I don't know exactly how it differs.
Thanks so much @leslie-welch for bringing up the single-record bridge table, this looks like an absolute lifesaver for models with multiple m:m relationships! Any chance you could say a few more words about this approach: is the dim table here somewhat similar to a straightforward unpivot (i.e. "ListID1, PathID1, PathnameABC", "ListID1, PathID2, PathnameDEF", "ListID2, PathID1, PathnameABC", "ListID2, PathID3, PathnameXYZ", etc etc), meaning that there's a certain amount of row-level replication for PathID/Pathname; are those never featured in a unique-value DIM of their own?
What about multiple fact scenarios like sales orders and invoices. Parts of a sales order can be dispatched and invoices separately. Even sales order lines can be split with second dispatch for back-ordered quantity. There is a need to link sales order lines to invoice lines.
What is the relationship between the two? Is it always one or more invoices per sales order, or are there also sometimes multiple sales orders associated with one invoice?
What about shared dimensions?
@@leslie-welch in my case it’s always one sales order per invoice and sometimes multiple invoices for a sales order.
@@mrbartuss1 the shared dimensions filter via the sales orders.
@@coolblue5929 In this scenario I would look at incorporating the sales order fact into the invoice fact. It will of course create some replication at the row level of sales data, but if you can end up with a fact that is mainly dates and integers, and push as much of the text into dimensions, your DAX will be simpler and overall the report should be more performant.
Leslie mentioned generating a bridge table column from a CRC32 (presumably using the CHECKSUM() function). Is there a specific reason to use that function instead of HASHBYTES()?
@JBartlett_DMU Not that I am aware of, but I will ask the engineering team! I'll circle back and let you know what they say. The actual SQL function we are using in Databricks is CRC32() and it is wrapped around some array functions to get a distinct list and sort that distinct list. Things brings us down to the smallest number of records needed to meet the ListID need for our many to many relationships.
@@leslie-welch Ahhh, so it's a Databricks SQL thing! I think I was assuming you were talking about T-SQL.
@@JBartlett_DMU gotcha. My understanding is that Databricks is primarily based on Spark SQL with some additional functions specific to Delta tables. I haven't worked with T-SQL to my knowledge so I don't know exactly how it differs.