Hi, how to generate the integertype surrogate key for a column col1. Where we are getting duplicate values in col1. As sha2 gives good result but it's alpha numeric
monotonically_increasing_id() will give partition wise sequence values and if you have multiple partitions its wont follow same sequence numbers, it will generate random unique values. partition wise you can find sequence values.
Very useful and easily understandable pyspark scenario series. Great work and really appreciate your efforts.!!!!!
Thank you Sai Kumar
Thanks for this wonderful series. Really appreciate your efforts.
Thank you Pratik 👍
neatly explained...✌
Thank you
Great video. Thank you for sharing. 😊
That's great.....
Thank you Chandra 👍
Hi sir, nice explanation but monotonically increasing id will work if there is duplication in combined key to allocate a surrogate key ?
Hi, how to generate the integertype surrogate key for a column col1. Where we are getting duplicate values in col1. As sha2 gives good result but it's alpha numeric
From where can I download the sample file for practice.. Please share the link
.pls
when i am using monotonically_increasing_id it is generating random number instead of 0,1,2,3... Can you pleae help me with this
Verify data and is there any sorting applied?
@@TRRaveendra the data is non sortable
monotonically_increasing_id() will give partition wise sequence values and if you have multiple partitions its wont follow same sequence numbers, it will generate random unique values. partition wise you can find sequence values.
@@TRRaveendra thank u its a great help
How can we sum the one column data incremental basis
salary op
1 1
2 3
3 6
4 10
5 15
Use window function with sum
@TeckLake monotonicaly_increasing_id is not recommended. It might generate the same ID in the next iteration of job.
How to create unique key in next iteration should start from last max id