Thank you so much, I looked for just this kind of walkthrough, tutorial on this subject; you spoke so clearly and directly and stayed on topic that it was easy to follow. Thanks again, great job, I like the usage of staging tables I use that alot as well!!
Great Job on this video I am trained at SSMS, SSIS and SSAS seperately but putting it all together to get a DW from a source was a missing link for me. Keep up the great work :-)
Yes. For example, if you had a specific business formula, such as an interest rate calculation, you can calculate this value as a form of pre-processing during the ETL. If the pre-processing calculations are really complicated you can load the data in a staging table, then execute calculations using SQL or a scripting language (such as C# or Python). In other words, if there is a lot of computing involved, you do not want to perform this in one single ETL process. The calculated data can then be saved to another "secondary" staging table temporarily. Then, you can move the data from the "secondary" staging table to the final FACT table. I do this type of work a lot in staging tables. This would add some separation of pre-processed computing vs. the normal ETL processes.
Such a wonderful project approach on data mart.Can you please share the dataset because i would love to follow through on hand on ...it will be much appreciated./.Thx again in advance
Hello, great video. I know it's been a while, but can you perhaps speak on how you would take your data from a staging db and insert it into your fact table after all your dimensions are built, but this time you do not have any IDs for the dimensions in your staging table?
I will be posting an updated version of the data mart design process using SQLite later this year. There are a lot of examples where the data mart design is useful for localized analysis/data science projects rather than enterprise scalable or service oriented deployment. Instead of using a Microsoft tool, I will be showing how all of the data engineering (ETL) is done using Python and SQLite. I'll see if I can include a part of the FACT table example where the ID is not available in a staging table.
Quick questsions, we made Dimensions in previous videos and made tables to layout our data correctly and all those tables were empty. My question is how did you import the CSV file for Staging table in the beginning.
@@clintondaniel5285Thank you Clinton. Currently doing my Coursework from your videos and it took me almost 200 hours of videos lol. I still need to do OLAP CUBE and run some Queries such as; Show the top 5 Facilities and the total weight in TotalTonnes for each material in each of the four quarters e.g. April-June, July-September, October- December and January-March. ii. Rank the top 10 Materials and the TotalTonnes of waste processed by each of these for the 2016-17 year and for each of the two half-years April- September and October-March. For the second half year period show the number of places each Authority has moved (up/down the ranking) since the first half year period. iii. For each Authority show the WasteStreamType, TotalTonnes reported for the year, order the output in descending order of TotalTonnes. iv. Develop a chart that shows the monthly TotalTonnes processed for each OutputProcessType. Your answer should allow selection of the outputProcessType and the Authority in which the waste originated. Till Now I have loaded my tables USING your videos but still way to go. Ty so much for your videos
Hi, thanks for you vídeo si very good and I appreciate your time here. I would like to see how build without etl tool but you show all those queries and also Understood very well thanks
Thank you, helped a lot, very grateful, any videos on updating the fact table through querying in siss. Eg example if you were to add to the datawarehouse a new dataset and had to insert new fields into the fact table and then update that in visual studio. Is this difficult?
Good question. This is a common operation sometimes called an "UPSERT". Essentially, you check if data associated with an ID has changed. If it has, you execute an UPDATE. If the ID does not exist, then you INSERT. But, that's easy for me to just say vs. demonstrate. I plan on doing a video that addresses this scenario sometime in the future.
@@brianaelia Not really, there is a high cost to the MERGE command depending on the number of records. This is more of a methodology to track and update records.
Question -- I noticed that in your fact table you have each crime category (MurderAndNonEgligentManslaughter, ForcibleRape, Robbery, AggravatedAssault, etc.) each in their own column. Would you agree that this data set would be better if it were stored unpivoted? So that there was a new column called "CrimeCategory" and that column was populated with the values of MurderAndNonEgligentManslaughter, ForcibleRape, Robbery, AggravatedAssault, etc., along with another column called "NumberOfCrimes"? This way a user could ask a question like "how many total crimes were committed?" and the system could aggregate a single column (NumberOfCrimes). I understand that this data set is just being used for illustration, but maybe you can touch on when to convert columns to a single dimension and when to leave them as individual columns. Thanks.
Typically, in my experience, you would want to store the measurable FACTs as separate columns because they can have a different context. Yes, they are all crimes. But, someone may want to establish their own context on each column and would require the fields to be separated. If the context of a data analyst is to Pivot the columns so that they all fall under "CrimeCategory" and "NumberOfCrimes", then this could be easily done in a reporting tool such as Excel, Tableau, or PowerBI. But, there really is no right or wrong answer to your question. I typically architect my data marts to account for maximum context and allow for a user to define their own contextual rules.
Why would you keep all the values in a single FactTable? Is not a FactTable suppose to keep only agg values in order to limit the storage and the cots of DWH?
Thank you so much, I looked for just this kind of walkthrough, tutorial on this subject; you spoke so clearly and directly and stayed on topic that it was easy to follow. Thanks again, great job, I like the usage of staging tables I use that alot as well!!
Thank you for all the information shared, I’ve just finished the course project. Thanks!!
You are the man, thank you so much! You are saving Panda Jardins
Great Job on this video I am trained at SSMS, SSIS and SSAS seperately but putting it all together to get a DW from a source was a missing link for me. Keep up the great work :-)
Best video ever. You just helped me complete my project. Thanks for the video
Best tutorial of data marts in youtube
Very clear and straightfoward, thank you for sharing your knowledge!
is it a good way of doing warehousing ? I always hear that the staging should be raw, no calculation, no changing etc. but you added surrogate keys ?
Great video! thank you. Please continue uploading.
Thank you Sir. Could you please update more videos. And one suggestion please create playlists
I really enjoyed watching this video ::)
Hi, thanks for the tutorials. Can we include business logic or calculation in ETL while loading data from staging to fact ?
Yes. For example, if you had a specific business formula, such as an interest rate calculation, you can calculate this value as a form of pre-processing during the ETL. If the pre-processing calculations are really complicated you can load the data in a staging table, then execute calculations using SQL or a scripting language (such as C# or Python). In other words, if there is a lot of computing involved, you do not want to perform this in one single ETL process. The calculated data can then be saved to another "secondary" staging table temporarily. Then, you can move the data from the "secondary" staging table to the final FACT table. I do this type of work a lot in staging tables. This would add some separation of pre-processed computing vs. the normal ETL processes.
@@clintondaniel5285 Thank you for quick response helped in clarifying doubt.
Thanks Clint, very well explained. Do you have follow up videos on multi dimensional analysis?
Such a wonderful project approach on data mart.Can you please share the dataset because i would love to follow through on hand on ...it will be much appreciated./.Thx again in advance
Thanks for this nice video, would you do another one for how to do this project from beginning
Hello, great video. I know it's been a while, but can you perhaps speak on how you would take your data from a staging db and insert it into your fact table after all your dimensions are built, but this time you do not have any IDs for the dimensions in your staging table?
I will be posting an updated version of the data mart design process using SQLite later this year. There are a lot of examples where the data mart design is useful for localized analysis/data science projects rather than enterprise scalable or service oriented deployment. Instead of using a Microsoft tool, I will be showing how all of the data engineering (ETL) is done using Python and SQLite. I'll see if I can include a part of the FACT table example where the ID is not available in a staging table.
Great video. What’s the video 1?
Tx Clinton.
You are super!
Spot on like Clint Eastwood's shooting!
this is spot on. thank you so much!
thank you so much
Quick questsions, we made Dimensions in previous videos and made tables to layout our data correctly and all those tables were empty. My question is how did you import the CSV file for Staging table in the beginning.
Take a look here: th-cam.com/video/XVms0SA-6Xs/w-d-xo.html
@@clintondaniel5285Thank you Clinton. Currently doing my Coursework from your videos and it took me almost 200 hours of videos lol. I still need to do OLAP CUBE and run some Queries such as;
Show the top 5 Facilities and the total weight in TotalTonnes for each material in each of the four quarters e.g. April-June, July-September, October- December and January-March.
ii. Rank the top 10 Materials and the TotalTonnes of waste
processed by each of these for the 2016-17 year and for each of the two half-years April- September and October-March. For the second half year period show the number of places each Authority has moved (up/down the ranking) since the first half year period.
iii. For each Authority show the WasteStreamType, TotalTonnes
reported for the year, order the output in descending order of
TotalTonnes.
iv. Develop a chart that shows the monthly TotalTonnes processed for each OutputProcessType. Your answer should allow selection of the outputProcessType and the Authority in which the waste originated.
Till Now I have loaded my tables USING your videos but still way to go.
Ty so much for your videos
Hi, thanks for you vídeo si very good and I appreciate your time here. I would like to see how build without etl tool but you show all those queries and also Understood very well thanks
I've added the source code, source data, and documentation for creating the Crime Data Mart on GitHub here: github.com/cedanie2/ISM-4402
thanks a lot
Thanks for the detailed video
Hey, the relationship between state and city and all of these facts and dimensional tables, can these be considered part of a star schema design?
Yes, absolutely.
Also, depending on whether you build a hierarchy between state and city you can also consider a snow flake design
Hey Daniel !! Its a nice video can you please share the Database and queries to practice.
Thank you, helped a lot, very grateful, any videos on updating the fact table through querying in siss. Eg example if you were to add to the datawarehouse a new dataset and had to insert new fields into the fact table and then update that in visual studio. Is this difficult?
For newly create Data this looks good.
Can you let us know how it will work for updated records like Type1 and Type2 Attributes
Good question. This is a common operation sometimes called an "UPSERT". Essentially, you check if data associated with an ID has changed. If it has, you execute an UPDATE. If the ID does not exist, then you INSERT. But, that's easy for me to just say vs. demonstrate. I plan on doing a video that addresses this scenario sometime in the future.
@@clintondaniel5285 is this more like a merge command in sql?
@@brianaelia Not really, there is a high cost to the MERGE command depending on the number of records. This is more of a methodology to track and update records.
This is Great
That was really helpful, thank you!
you normally dont update staging tables or merge any id
thank you so much !
Question -- I noticed that in your fact table you have each crime category (MurderAndNonEgligentManslaughter, ForcibleRape, Robbery, AggravatedAssault, etc.) each in their own column. Would you agree that this data set would be better if it were stored unpivoted? So that there was a new column called "CrimeCategory" and that column was populated with the values of MurderAndNonEgligentManslaughter, ForcibleRape, Robbery, AggravatedAssault, etc., along with another column called "NumberOfCrimes"? This way a user could ask a question like "how many total crimes were committed?" and the system could aggregate a single column (NumberOfCrimes). I understand that this data set is just being used for illustration, but maybe you can touch on when to convert columns to a single dimension and when to leave them as individual columns. Thanks.
Typically, in my experience, you would want to store the measurable FACTs as separate columns because they can have a different context. Yes, they are all crimes. But, someone may want to establish their own context on each column and would require the fields to be separated. If the context of a data analyst is to Pivot the columns so that they all fall under "CrimeCategory" and "NumberOfCrimes", then this could be easily done in a reporting tool such as Excel, Tableau, or PowerBI. But, there really is no right or wrong answer to your question. I typically architect my data marts to account for maximum context and allow for a user to define their own contextual rules.
@@clintondaniel5285 Great info!
Why would you keep all the values in a single FactTable? Is not a FactTable suppose to keep only agg values in order to limit the storage and the cots of DWH?
Hi do you have the dataset?
thanks you,
Very good.
thank you this was very helpful
Can you give the data source link?
Neat!
Can you share me your previous videos
Nice
amazing snowflake schema video. no BS