There are 4 options - distkey , all , even, auto. Key is applicable to only first option. Rest 3 distribution styles does not need any key for data distribution.
Hey @adityaf17 I am working on hands-on tutorial however redshift is not free and incur cost. Do you think people will be ready to shell some coins for the hands on tutorials ? Or would you prefer to have video like me doing the actual work and you just watching it ?
Thanks for insightful tutorial. My only question is while going with distribution style key vs even will choosing key column distribute the rows and retrieve much faster than doing even distribution style as even will distribute evenly
Yes you are right. If you have distkey and you use that in the query, then it will return rows faster than even distribution style. You should be little careful while picking distkey column. Ideally it should be the one with unique values and used in the queries. Good luck.
Glad you liked it. I remember this video took the most time I have invested in any video till date. Do you have any recommendations for next set of videos.
Somehwat confused about your point that Redshift is a columnar database (so data is split by columns , rather than rows). Yet, in the 25:30 where you are talking about Distribution styles, the distribution is row based ?
Hi, thanks for asking the question. Redshift is columnar, so while storing data in storage blocks, it arranges all column values together. This is the actual physical storage. Distribution style determines which slice will get what portion of table data. If you chose distkey then each slice will get some portion of table data depending on the key value. Say 4 slices and 12 rows. So each slice may save 3 rows, but physical storage will have column values together. Distribution determines spreading rows among the slices and columnar determines actual way of storing data in blocks.
Hi Hafiz, Good question. Zero ETL is a concept that is gaining popularity especially in aws ecosystem. In simple terms, it means that you connect to source directly and read the data at run time rather than bringing in data from source through ETL process. In redshift, you can connect to different data sources like RDS , dynamodb and read the data directly via sql query. You don't have to create etl pipeline to bring in data from these sources. Additionally, you can read data directly from data lake via spectrum query. Also if other source has data in redshift then you can read data directly via datashare from your redshift. In short, zero etl means the capability to read data directly from the source without the need to build etl pipelines. I strongly believe this concept will gain even more popularity in the future.
@@ETLSQL Firstly, Awesome content as usual..! Can try to cover topics under CDC options in Redshift, Orchestrate the data movement to redshift, how Redshift can be integrated with other AWS systems, data recovery options, About redshift specturm and any PROD use cases.
Such a wholesome session, pity myself it took a while for me to access this- top notch elucidation. Would you care for a rejoinder on a query? As much as clarity you have brought to the viewers on data sources and its ancillaries, I would like to bring up a nuance - If the source system places the data on a SFTP server (Batch running daily or weekly) and a file-based rule system helps the data to reach to S3 bucket and eventual journey to Redshift? Is that scenario possible?
Yes it is very much possible. As soon as the file lands on s3 event is generated. This event can be used to trigger lambda function to run copy statement into redshift table. You can also create external table in redshift pointing to s3 path and as soon as file arrives you can see the data without loading as well.
Did you like this video? What else do you want to learn about AWS ? Drop a comment below.
Why is no 'distkey' mentioned for other distribution styles like all and even?
There are 4 options - distkey , all , even, auto. Key is applicable to only first option. Rest 3 distribution styles does not need any key for data distribution.
Best 30min I have spent in recent days! Add next video with more details.
Thanks for leaving a comment. Any specific topic would you like me to cover next ?
I like the way you put lesson, simple, easy and clear in understanding, Thanks, GS
Glad you liked it 👍
If you like this video, please drop a comment to share your reaction. ❤
Hands on tutorial on Redshift will be the best one.
Noted. I do plan to work on that one in the coming weeks.
Hey @adityaf17
I am working on hands-on tutorial however redshift is not free and incur cost.
Do you think people will be ready to shell some coins for the hands on tutorials ?
Or would you prefer to have video like me doing the actual work and you just watching it ?
I like the distribution style of the content in this video and the way you chose to present it
Glad you liked it. 👍
mind blown with some knowledge as beginner thanks a lot
Glad you liked it 👍
Superb... Good pitch
Enrolled to the course, Looking forward to gr8 and Informative content as always.
Hope you liked it
Excellent video i have ever watched on AWS Redshift, this is the best that Explained redshift in details
Glad you liked it ❤️
Very clear and concise introduction to aws redshift.
Glad you liked it
Thanks for insightful tutorial. My only question is while going with distribution style key vs even will choosing key column distribute the rows and retrieve much faster than doing even distribution style as even will distribute evenly
Yes you are right. If you have distkey and you use that in the query, then it will return rows faster than even distribution style.
You should be little careful while picking distkey column. Ideally it should be the one with unique values and used in the queries.
Good luck.
@@ETLSQL Thanks for clarification
Thanks for such nice video, please create a complete course on this.
Hey @aniketbahalkar223
Can you suggest few topics that I shall cover in the course
Hats Of you sir, keep making content like this, Clear explanation
Glad you liked it. I remember this video took the most time I have invested in any video till date.
Do you have any recommendations for next set of videos.
Somehwat confused about your point that Redshift is a columnar database (so data is split by columns , rather than rows). Yet, in the 25:30 where you are talking about Distribution styles, the distribution is row based ?
Hi, thanks for asking the question.
Redshift is columnar, so while storing data in storage blocks, it arranges all column values together. This is the actual physical storage.
Distribution style determines which slice will get what portion of table data. If you chose distkey then each slice will get some portion of table data depending on the key value. Say 4 slices and 12 rows. So each slice may save 3 rows, but physical storage will have column values together. Distribution determines spreading rows among the slices and columnar determines actual way of storing data in blocks.
Can same node slice share two different column values in case of same datatype?
Yes it can. But remember data is distributed using distkey only.
Is there any videos power BI + amazon redshift
Not sure about any video with power bi, generally teams prefer to use quicksight with redshift though.
In AWS Redshift cluster, what is zero ETL and how does it work, sir?
Hi Hafiz,
Good question. Zero ETL is a concept that is gaining popularity especially in aws ecosystem. In simple terms, it means that you connect to source directly and read the data at run time rather than bringing in data from source through ETL process. In redshift, you can connect to different data sources like RDS , dynamodb and read the data directly via sql query. You don't have to create etl pipeline to bring in data from these sources. Additionally, you can read data directly from data lake via spectrum query. Also if other source has data in redshift then you can read data directly via datashare from your redshift.
In short, zero etl means the capability to read data directly from the source without the need to build etl pipelines.
I strongly believe this concept will gain even more popularity in the future.
Hi, Can please post further videos on Redshift
Sure. Any specific topic on redshift?
@@ETLSQL Firstly, Awesome content as usual..!
Can try to cover topics under CDC options in Redshift, Orchestrate the data movement to redshift, how Redshift can be integrated with other AWS systems, data recovery options, About redshift specturm and any PROD use cases.
These are some good points. Noted.
Such a wholesome session, pity myself it took a while for me to access this- top notch elucidation. Would you care for a rejoinder on a query? As much as clarity you have brought to the viewers on data sources and its ancillaries, I would like to bring up a nuance - If the source system places the data on a SFTP server (Batch running daily or weekly) and a file-based rule system helps the data to reach to S3 bucket and eventual journey to Redshift? Is that scenario possible?
Yes it is very much possible. As soon as the file lands on s3 event is generated. This event can be used to trigger lambda function to run copy statement into redshift table.
You can also create external table in redshift pointing to s3 path and as soon as file arrives you can see the data without loading as well.
Thanks for awesome video !
Woo hoo. Thanks for the comment. 👍
good video, clear explanation of this topic
Glad you liked it 👍
Best one for Redshift!
Glad you liked it. 👍
Please provide a video on azure data factory like this with atleast ome example
Hey Mahesh,
I am not planning to cover azure as of now. Will focus on general concepts and aws.
Hope you find a suitable tutorial soon.
Thanks for your video
Glad you liked it 👍
Very clear tutorial. Thank you
Glad you liked it. 👍
Great 👍
Thanks
Thank you!
You're welcome!
nicely explained
Glad you liked it. Can you suggest me any relevant topic which I can cover next.