Hi Kahan, a question I have after watching many of your videos. What about a client's situation makes you think one tool would fit better than another? For example Snowflake vs BigQuery.
Hi Mike! Could you please clarify the following: After the developer makes some changes in the model and raises a PR so that his changes are reviewed/auto-tested in the QA/CI DB/Schema, and later merged to the Main branch, Is the QA/CI a replica of Prod DB(warehouse and marts) where it reads data from Staging and validates the changes prior getting merged to main? Thanks in advance!
In our setup we have multiple environments (DEV, QA, PROD), all seperate including the raw sources including the ETL. This doubles our costs at least. The setup that you showed eliminates the extra costs for processing and storage by using one environment, right? How do you deal with upgrades and changes in the raw datasource layer? For example a source system that has significant changes in its database schema after an upgrade? Just add another schema in the raw database?
Would you need separate dev schemas for the staging and marts? Let's say I want to develop a new mart. Would I put all of those models in the same dev schema before going to production?
I typically will do that. I like to keep all tables/views in a single Dev schema (ex. all Staging, Warehouse, Marts) to avoid excessive objects and keep it simple. The way I see it, nobody else is really looking at that schema so perfect separation & organization isn't as important. What's more important is that you can confirm models deploy, check the data, etc. Then once you move to "production", separate things out by specific schemas. Hope that helps!
Looking for help with your team's data strategy? → www.kahandatasolutions.com
Looking to improve your data engineering skillset?→ bit.ly/more-kds
Thanks for this Kahan. Please make a video implementing the workflow like you've done with the CI/CD. Thanks again.
Very clear and concise, thank you
Glad it was helpful!
A lot of good ideas from your videos has inspired me to improve my development flow.
I love it. Already doing but it's a good reminder
Hi Kahan, a question I have after watching many of your videos. What about a client's situation makes you think one tool would fit better than another? For example Snowflake vs BigQuery.
Hi Mike! Could you please clarify the following:
After the developer makes some changes in the model and raises a PR so that his changes are reviewed/auto-tested in the QA/CI DB/Schema, and later merged to the Main branch, Is the QA/CI a replica of Prod DB(warehouse and marts) where it reads data from Staging and validates the changes prior getting merged to main? Thanks in advance!
Its a very clear explanation
In our setup we have multiple environments (DEV, QA, PROD), all seperate including the raw sources including the ETL. This doubles our costs at least. The setup that you showed eliminates the extra costs for processing and storage by using one environment, right? How do you deal with upgrades and changes in the raw datasource layer? For example a source system that has significant changes in its database schema after an upgrade? Just add another schema in the raw database?
Would you need separate dev schemas for the staging and marts? Let's say I want to develop a new mart. Would I put all of those models in the same dev schema before going to production?
I typically will do that. I like to keep all tables/views in a single Dev schema (ex. all Staging, Warehouse, Marts) to avoid excessive objects and keep it simple. The way I see it, nobody else is really looking at that schema so perfect separation & organization isn't as important. What's more important is that you can confirm models deploy, check the data, etc. Then once you move to "production", separate things out by specific schemas. Hope that helps!