Your videos are awesome Clinton. I have seen LinkedIn Learning videos also but you have done better. And yes please number them then the whole tutorial will be great!! Once again thanks for sharing your knowledge.
If you look at the videos included in my channel, identify the video names that start with "Data Mart Design: ...". Then, you will see a sequence in the names.
@@clintondaniel5285 I am going through all of them. Can i ask you please ? does the Snowflake and Star Schema come before we extract the data from the source to the destination ? at what stage, do we need to design either snowflake or star schema ?
@@clintondaniel5285 Thank you Daniel for helping me. can i also ask you? how would you consolidate data sources (CVS, excel, files) within your organisation into one spource of truth? would you ask your organisation to switch to Access, for instance, to enter data in order to avoid redundancy ? if os, what about the old files in excel sheets ?
I've added the source code, source data, and documentation for creating the Crime Data Mart on GitHub here: github.com/cedanie2/ISM-4402. I'll see if I have time to work an example with multiple source files. Basically, you need to append the existing Staging table or you have to create additional staging tables. Then, your ETL package will have additional connection managers that point to the additional data source files. These connection managers will be used in separate Data Flows within SSIS to move the data to the additional staging tables or UPDATE an existing staging table.
@@clintondaniel5285 Thank you so much. This is very helpful. As for the additional staging tables (for multiple sources of data at multiple times), do we delete them after we uploaded the data to the data mart? Another question: if we append the staging table, instead of creating more staging tables, how to upload the appended data to the data mart?
@@lazykitten4356 Usually, in a production environment the staging tables are not useful after your data mart has been loaded with the most recent or initial data. I usually recommend two additional columns in each dimension that record a timestamp in one column and a status of the update in another column. For instance, status = I if the record was INSERT or status = U if the record was UPDATE. In terms of the appended data, take a look at various methods of "UPSERT" (INSERT vs. UPDATE example - using the MERGE command). Since the staging table would be empty (TRUNCATE) from the previous LOAD, then you have new data in the staging table. Then, your SSIS package should run an "UPSERT" (example - MERGE command) on the existing dimension. If there is a new dimension record, then execute an INSERT. If the record already exists with no change, then ignore. If the record exists but with updated changes, then UPDATE. If any record is UPDATE or INSERT, also UPDATE the "status" column and the "timestamp" column in your dimensions. In terms of FACT tables, different opinions from different developers. You can also take the "UPSERT" approach, but you can also just add a new record. If you just add an additional record to your FACT table and you have an updated FACT with 2 or more rows, then you will need to develop additional SQL logic on your reporting side that either MERGEs the 2 records or shows the timeline where the FACTs have changes. It all depends on your use case and requirements. If historical change is important in the FACTs and you don't want multiple records, then just create a separate FACT table that allows all changes. So for instance the normal FACT table may only have 1000 rows while the "historical" FACT table may have > 1000 rows.
@@clintondaniel5285 Another question, where did you learned all these techniques about staging tables and DW? Is there a book that you recommend? Thank you very much
Hi Clinton, this is really helpful.
one suggestion: it would be nice to add Video # number to follow one after another video.
Hello Clinton. Your lectures are very helpful for my lectures they are very easy to follow especially in this corona situation. Thank you
Thank you Clinton!
I've added the source code, source data, and documentation for creating the Crime Data Mart on GitHub here: github.com/cedanie2/ISM-4402
Your videos are awesome Clinton. I have seen LinkedIn Learning videos also but you have done better. And yes please number them then the whole tutorial will be great!! Once again thanks for sharing your knowledge.
Very good explanation
Thanks for your transfer knowledge👍
Why are we performing conversion while loading into staging?
Hey Daniel, i am looking to watch all your videos on sql ETL but am trying to figure out which one comes first 😉. Thank you for making those videos
If you look at the videos included in my channel, identify the video names that start with "Data Mart Design: ...". Then, you will see a sequence in the names.
@@clintondaniel5285 I am going through all of them. Can i ask you please ? does the Snowflake and Star Schema come before we extract the data from the source to the destination ? at what stage, do we need to design either snowflake or star schema ?
@@alphar85 Yes, you should be designing your data mart with a schema before you begin the ETL process.
@@clintondaniel5285 Thank you Daniel for helping me. can i also ask you? how would you consolidate data sources (CVS, excel, files) within your organisation into one spource of truth? would you ask your organisation to switch to Access, for instance, to enter data in order to avoid redundancy ? if os, what about the old files in excel sheets ?
Can you please share the dataset you used?..
Thanks
how can i proceed if the number of data source files are more than one?
I've added the source code, source data, and documentation for creating the Crime Data Mart on GitHub here: github.com/cedanie2/ISM-4402. I'll see if I have time to work an example with multiple source files. Basically, you need to append the existing Staging table or you have to create additional staging tables. Then, your ETL package will have additional connection managers that point to the additional data source files. These connection managers will be used in separate Data Flows within SSIS to move the data to the additional staging tables or UPDATE an existing staging table.
@@clintondaniel5285 Thank you so much. This is very helpful. As for the additional staging tables (for multiple sources of data at multiple times), do we delete them after we uploaded the data to the data mart? Another question: if we append the staging table, instead of creating more staging tables, how to upload the appended data to the data mart?
@@lazykitten4356 Usually, in a production environment the staging tables are not useful after your data mart has been loaded with the most recent or initial data. I usually recommend two additional columns in each dimension that record a timestamp in one column and a status of the update in another column. For instance, status = I if the record was INSERT or status = U if the record was UPDATE. In terms of the appended data, take a look at various methods of "UPSERT" (INSERT vs. UPDATE example - using the MERGE command). Since the staging table would be empty (TRUNCATE) from the previous LOAD, then you have new data in the staging table. Then, your SSIS package should run an "UPSERT" (example - MERGE command) on the existing dimension. If there is a new dimension record, then execute an INSERT. If the record already exists with no change, then ignore. If the record exists but with updated changes, then UPDATE. If any record is UPDATE or INSERT, also UPDATE the "status" column and the "timestamp" column in your dimensions. In terms of FACT tables, different opinions from different developers. You can also take the "UPSERT" approach, but you can also just add a new record. If you just add an additional record to your FACT table and you have an updated FACT with 2 or more rows, then you will need to develop additional SQL logic on your reporting side that either MERGEs the 2 records or shows the timeline where the FACTs have changes. It all depends on your use case and requirements. If historical change is important in the FACTs and you don't want multiple records, then just create a separate FACT table that allows all changes. So for instance the normal FACT table may only have 1000 rows while the "historical" FACT table may have > 1000 rows.
@@clintondaniel5285 Thank you so much. You at least give me ideas of what to look for :) I really appreciate it.
@@clintondaniel5285 Another question, where did you learned all these techniques about staging tables and DW? Is there a book that you recommend? Thank you very much