Hi Ruchika, I am Ana Lilia Armas Martínez, and I want to share the following with all you: The Modulus partition attracted my attention because it came from the modular arithmetic introduced in 1801 by Carl Friedrich Gauss, so I imagined that it must have good characteristics (perhaps not well known or widespread) to generate partitions. In my research I found that the Modulus partition is used in tables that have some key column defined as of integer type - say column idColumn - since the Modulus operation is performed using the numbers dumped there. How does this operation work? Let’s see: The Modulus operation is performed between each of these numbers in idColumn and the number of processing nodes, the Modulus operation simply returns the remainder of the division between two numbers, for example if we have idColumn = 1815 and n = number of processing nodes = 50, when dividing 1815/50 the remainder would be 15, that is, the row whose idColumn = 1815 will be sent to node number 15; If we have idColumn = 300, when dividing 300/50 the remainder would be 0, so the row with idColumn = 300 would be sent to node 0, if idColumn = 65, when dividing 65/50 the remainder would be 15, so the row with idColumn = 65 would be sent to node 15, that is, in the same set as idColumn = 1815, etc. In this case n = 50, so we would have the following sets: 0,1,2,3, ...., 48,49; this is because the remainders of the integer divisions by 50 can only be 0,1,2,3, ... 49. So we will have all our data distributed in these sets. Well, i found in the book InfoSphere DataStage for Enterprise XML Data Integration the following: Like Hash, the partition size of Modulus partitioning is equally distributed as long as the data values in the key column are equally distribubuted. Because Modulus partitioning is simpler and faster than Hash, it provides a performance improvement in situations where you have a single integer key column. There are more: For Hash partitioning, in the situation where the number of unique key values is low, you can get partition skew, where one partition receives a much large percentage of rows than other partitions. Skew negatively affects performance. One way of correcting this partition skew is to add an additional key column… Well, in my experience I have seen that most tables contain a key column of type integer so we would use the Modulus partition, and if that table does not have one, we create a column of this type and avoid possible problems by wanting to use the Hash partition. It is a small contribution to the great work you have done in making these videos, I love them. Thank you very much dear.
why do we require other partitioning technique when auto partitioning is available over here to choose best partitioning technique for any specific stage . So when if u choose hash or auto for aggregator stage (just for example) then performance should be same???????????
Hey Jagadeesh, thanks for your interest. Though I work full-time, I pull out some time to make these videos. Even then, I uploaded 29 videos in just 45 days. Please understand that!! :)
Dear Ma'am, thanks for your effort and it highly appreciable. I think way of explaining the video can improve a lot. It is very confusing the way you are explaining.
In explaining RCP many times you used word record, as per my knowledge you have to use column instead. For ex. you said you have 10 records and you want only 4 records to pass in next stage while you need to say that you have 10 columns and in target you want data of only 4 columns.....Please clear it. Thanks for video.
I thought I made it pretty clear in the video. Please do watch it one more time. I can't come up with the good words now to make it more than clear as in the video.
Yup...APT_DUMP_SCORE gives all the information regarding how the data is partitioned, nodal info, operator info, etc. I'll come that part once we finish all the basic stages. I'm thinking about making 'Advanced DS Tutorial Videos'. Hope you guys gonna appreciate me the same way!! :)
These videos are great. It's my lucky to find them here. It makes my day meaningful. Thank you.
Ruchika really awesome tutorial... very good..
Hi Ruchika, I am Ana Lilia Armas Martínez, and I want to share the following with all you: The Modulus partition attracted my attention because it came from the modular arithmetic introduced in 1801 by Carl Friedrich Gauss, so I imagined that it must have good characteristics (perhaps not well known or widespread) to generate partitions. In my research I found that the Modulus partition is used in tables that have some key column defined as of integer type - say column idColumn - since the Modulus operation is performed using the numbers dumped there. How does this operation work? Let’s see: The Modulus operation is performed between each of these numbers in idColumn and the number of processing nodes, the Modulus operation simply returns the remainder of the division between two numbers, for example if we have idColumn = 1815 and n = number of processing nodes = 50, when dividing 1815/50 the remainder would be 15, that is, the row whose idColumn = 1815 will be sent to node number 15; If we have idColumn = 300, when dividing 300/50 the remainder would be 0, so the row with idColumn = 300 would be sent to node 0, if idColumn = 65, when dividing 65/50 the remainder would be 15, so the row with idColumn = 65 would be sent to node 15, that is, in the same set as idColumn = 1815, etc. In this case n = 50, so we would have the following sets: 0,1,2,3, ...., 48,49; this is because the remainders of the integer divisions by 50 can only be 0,1,2,3, ... 49. So we will have all our data distributed in these sets. Well, i found in the book InfoSphere DataStage for Enterprise XML Data Integration the following: Like Hash, the partition size of Modulus partitioning is equally distributed as long as the data values in the key column are equally distribubuted. Because Modulus partitioning is simpler and faster than Hash, it provides a performance improvement in situations where you have a single integer key column. There are more:
For Hash partitioning, in the situation where the number of unique key values is low, you can get partition skew, where one partition receives a much large percentage of rows than other partitions. Skew negatively affects performance. One way of correcting this partition skew is to add an additional key column…
Well, in my experience I have seen that most tables contain a key column of type integer so we would use the Modulus partition, and if that table does not have one, we create a column of this type and avoid possible problems by wanting to use the Hash partition.
It is a small contribution to the great work you have done in making these videos, I love them. Thank you very much dear.
why do we require other partitioning technique when auto partitioning is available over here to choose best partitioning technique for any specific stage . So when if u choose hash or auto for aggregator stage (just for example) then performance should be same???????????
This is for beginners.
Very clear video ...... I like ur all video .... thnx for sharing ur knowledge .....
Thank you sou much Aarti!! I'm really glad you liked all my videos. Keep following my channel-TUTORIAL for more new videos!!!
Tutorial hello i'm eagerly waiting for u r uploads, wen will u upload.....
Hey Jagadeesh, thanks for your interest. Though I work full-time, I pull out some time to make these videos. Even then, I uploaded 29 videos in just 45 days. Please understand that!! :)
Mam - Can you please explain different scenarios where you need to use which partitioning technique, guess that is not very clear. Thanks !!
Excellent video. Admire your professionalism in explaining every details of Data Stage.
i m a new in datastage and this video help a lot thanks!
very good video but,set option for preserve partition is not clear..can you please tell us what this set option is for?
Honestly, I don't have a clear explanation for that. That option doesn't really make sense to me. I'm sorry & thanks for watching!! :)
hey I just love your voice.....truely
Dear Ma'am, thanks for your effort and it highly appreciable. I think way of explaining the video can improve a lot. It is very confusing the way you are explaining.
In explaining RCP many times you used word record, as per my knowledge you have to use column instead. For ex. you said you have 10 records and you want only 4 records to pass in next stage while you need to say that you have 10 columns and in target you want data of only 4 columns.....Please clear it. Thanks for video.
You are a datastage guru !! Amazing work....
What is the difference b/w same and entire partition?
I thought I made it pretty clear in the video. Please do watch it one more time. I can't come up with the good words now to make it more than clear as in the video.
Pls give us example for RCP, was bit bouncer, also when to use key based and key less partion with example
Please explain how to sort on multiple columns
RCP is not clear ..How its works based on columns/records??
Thanks for Videos.. I have a question, is RCP at record level or columns level?
Hash partitioning is well explained
Hanumanthu is fan of Rushika ❤️❤️❤️
fantastic video Ruchika ... Great job :) ..Keep it up!!
Great video!!!!!!!!!!!!
Hi, small doubt.
I hv a job seq---->Tx----->DS
How many processors it will take. I think we can find by using apt_dump_score.
Yup...APT_DUMP_SCORE gives all the information regarding how the data is partitioned, nodal info, operator info, etc.
I'll come that part once we finish all the basic stages. I'm thinking about making 'Advanced DS Tutorial Videos'. Hope you guys gonna appreciate me the same way!! :)
super
Thanks Amul. Good to see that you liked this video except RCP part. :)
HI, Do you have any video that shows how to work on CDC in datastage
to lengthy session, felt bore from middle. Divide it into 2 parts
wow
Or-k-straight.
please please please keep upload new video!!!
LOL (ambient noises) dogs are barking at 15:49
RCP is not clear
Ohhh!! I'll try repeating it in some other video & try to make it more clear this time. Thanks for bringing it to my notice. :)
Tutorial u r welcome
when is the next release....
Content was good but need better practice while explaining , many times it is annoying
Convey the concept clearly..always ur confusing
wowwwwwwwwwwwwwwwwwwwwwwwww
Rhema ggarac